DEV Community

# benchmark

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
DiffusionGemma 26B 登陸 M2 Max:MLX 吞吐量實測與 Context 極限挑戰

DiffusionGemma 26B 登陸 M2 Max:MLX 吞吐量實測與 Context 極限挑戰

Comments
3 min read
DiffusionGemma 26B 挑戰 GH200 效能極限

DiffusionGemma 26B 挑戰 GH200 效能極限

1
Comments
2 min read
Portrait Generation Benchmark Q1 2026: Flux.2 vs SDXL vs Proprietary

Portrait Generation Benchmark Q1 2026: Flux.2 vs SDXL vs Proprietary

Comments
3 min read
Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

1
Comments
9 min read
A UMAP With Arrows Is Not a Benchmark. This Is

A UMAP With Arrows Is Not a Benchmark. This Is

Comments
7 min read
Engineering CellFateBench: A Reproducible Python Benchmark for Single-Cell Genomics Reasoning

Engineering CellFateBench: A Reproducible Python Benchmark for Single-Cell Genomics Reasoning

Comments
8 min read
PostAll vs Manual Content Creation: A Developer's Performance Breakdown

PostAll vs Manual Content Creation: A Developer's Performance Breakdown

Comments
9 min read
Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

Comments
6 min read
Ideogram 4.0 is Good. Just Good.

Ideogram 4.0 is Good. Just Good.

Comments
2 min read
I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.

I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.

Comments
13 min read
We Benchmarked the Most Popular Code Search Tools. We Beat All of Them.

We Benchmarked the Most Popular Code Search Tools. We Beat All of Them.

Comments
11 min read
Multi-Shot vs Zero-Shot: When Adding Examples Actually Hurts Accuracy

Multi-Shot vs Zero-Shot: When Adding Examples Actually Hurts Accuracy

Comments
8 min read
Open-Source A3M Router Tops RouterArena Benchmark

Open-Source A3M Router Tops RouterArena Benchmark

Comments
1 min read
How does an AI agent pick from 686 skills in a second?

How does an AI agent pick from 686 skills in a second?

Comments
7 min read
LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)

LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)

Comments
5 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.