Benchmark

👋 Sign in for the ability to sort posts by relevant, latest, or top.

JH5

Jun 19

DiffusionGemma 26B 登陸 M2 Max：MLX 吞吐量實測與 Context 極限挑戰

#ai #benchmark #diffusiongemma #mlx

3 min read

JH5

Jun 19

DiffusionGemma 26B 挑戰 GH200 效能極限

#ai #nvidia #benchmark #llm

2 min read

Ricardo Ghekiere (runflow)

Jun 18

Portrait Generation Benchmark Q1 2026: Flux.2 vs SDXL vs Proprietary

#benchmark #portraits #flux2 #sdxl

3 min read

Rob

Jun 18

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

#modelshowdown #benchmark #ai #llm

9 min read

Oluwagbade Odimayo

Jun 16

A UMAP With Arrows Is Not a Benchmark. This Is

#benchmark #bioinformatics #rna #scientificsoftware

7 min read

Oluwagbade Odimayo

Jun 16

Engineering CellFateBench: A Reproducible Python Benchmark for Single-Cell Genomics Reasoning

#bioinformatics #genomics #benchmark #python

8 min read

Aakash Gour

Jun 15

PostAll vs Manual Content Creation: A Developer's Performance Breakdown

#showdev #benchmark #ai #webdev

9 min read

Rob

Jun 13

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

#modelshowdown #benchmark #ai #llm

6 min read

Igor Gridel

Jun 6

Ideogram 4.0 is Good. Just Good.

#ai #review #imagegeneration #benchmark

2 min read

Harrison Guo

Jun 1

I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.

#ai #benchmark #devtools #typescript

13 min read

Dayna Blackwell

May 25

We Benchmarked the Most Popular Code Search Tools. We Beat All of Them.

#ai #mcp #benchmark #devtools

11 min read

Gabriel Anhaia

May 24

Multi-Shot vs Zero-Shot: When Adding Examples Actually Hurts Accuracy

#ai #llm #prompt #benchmark

8 min read

Megha mukherjee

May 28

Open-Source A3M Router Tops RouterArena Benchmark

#opensource #llm #benchmark #ai

1 min read

Dmytro Klymentiev

May 23

How does an AI agent pick from 686 skills in a second?

#ai #benchmark #embeddings #claudecode

7 min read

Jangwook Kim

May 22

LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)

#benchmark #researchreproducibility #llmagents #paperpoc

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# benchmark

DiffusionGemma 26B 登陸 M2 Max：MLX 吞吐量實測與 Context 極限挑戰

DiffusionGemma 26B 挑戰 GH200 效能極限

Portrait Generation Benchmark Q1 2026: Flux.2 vs SDXL vs Proprietary

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

A UMAP With Arrows Is Not a Benchmark. This Is

Engineering CellFateBench: A Reproducible Python Benchmark for Single-Cell Genomics Reasoning

PostAll vs Manual Content Creation: A Developer's Performance Breakdown

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

Ideogram 4.0 is Good. Just Good.

I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.

We Benchmarked the Most Popular Code Search Tools. We Beat All of Them.

Multi-Shot vs Zero-Shot: When Adding Examples Actually Hurts Accuracy

Open-Source A3M Router Tops RouterArena Benchmark

How does an AI agent pick from 686 skills in a second?

LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)