Skip to content

DEV Community

AI/LLM Harness Series' Articles

Back to Tech_Nuggets's Series

Jun 3

What is an LLM evaluation harness? A deep dive into lm-eval-harness

#llm #ai #evaluation #opensource

7 min read

Jun 4

Building a domain-specific LLM evaluation set from scratch

#llm #ai #evaluation #opensource

8 min read

Jun 5

Speculative decoding: when and why it actually speeds up inference

#llm #ai #inference #performance

9 min read

Jun 6

KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break

#llm #ai #vllm #performance

8 min read

Jun 7

Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%

#llm #ai #infrastructure #vllm

9 min read

Jun 9

LoRA and QLoRA fine-tuning: what they actually do under the hood

#lora #qlora #finetuning #llm

7 min read

Jun 10

Flash Attention: what it does and why it matters

#llm #ai #deeplearning #transformers

8 min read

Jun 10

Flash Attention: what it does and why it matters

#llm #ai #deeplearning #gpu

8 min read

Jun 11

Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4

#llm #quantization #mlops #tutorial

7 min read

Jun 12

Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production

#llm #ai #machinelearning #opensource

9 min read

Jun 13

Mixture of Experts (MoE): what it actually does under the hood, and when it pays off

#llm #ai #architecture #opensource

8 min read

Jun 14

Structured output from LLMs: JSON mode, function calling, and grammar-constrained decoding

#llm #ai #python #tutorial

7 min read

Jun 15

The Model Context Protocol (MCP): what it is and how to build a server

#mcp #llm #ai #opensource

7 min read

Jun 16

RLHF vs DPO vs IPO vs KTO: which alignment method should you use

#llm #ai #alignment #opensource

8 min read

Jun 17

Tokenization under the hood: BPE, WordPiece, SentencePiece, and Unigram compared

#tokenization #llm #ai #nlp

9 min read

Jun 20

KV cache and PagedAttention: what they do and why they matter

#llm #ai #performance #opensource

8 min read