Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
AI/LLM Harness Series' Articles
Back to Tech_Nuggets's Series
What is an LLM evaluation harness? A deep dive into lm-eval-harness
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 3
What is an LLM evaluation harness? A deep dive into lm-eval-harness
#
llm
#
ai
#
evaluation
#
opensource
1
reaction
Comments
Add Comment
7 min read
Building a domain-specific LLM evaluation set from scratch
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 4
Building a domain-specific LLM evaluation set from scratch
#
llm
#
ai
#
evaluation
#
opensource
1
reaction
Comments
Add Comment
8 min read
Speculative decoding: when and why it actually speeds up inference
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 5
Speculative decoding: when and why it actually speeds up inference
#
llm
#
ai
#
inference
#
performance
1
reaction
Comments
Add Comment
9 min read
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 6
KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
#
llm
#
ai
#
vllm
#
performance
1
reaction
Comments
Add Comment
8 min read
Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 7
Prefix caching at scale: when it saves you 80% of prefill cost, and the eviction policies that quietly turn it into 5%
#
llm
#
ai
#
infrastructure
#
vllm
Comments
Add Comment
9 min read
LoRA and QLoRA fine-tuning: what they actually do under the hood
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 9
LoRA and QLoRA fine-tuning: what they actually do under the hood
#
lora
#
qlora
#
finetuning
#
llm
Comments
Add Comment
7 min read
Flash Attention: what it does and why it matters
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 10
Flash Attention: what it does and why it matters
#
llm
#
ai
#
deeplearning
#
transformers
Comments
Add Comment
8 min read
Flash Attention: what it does and why it matters
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 10
Flash Attention: what it does and why it matters
#
llm
#
ai
#
deeplearning
#
gpu
Comments
Add Comment
8 min read
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 11
Quantization formats compared: GGUF vs GPTQ vs AWQ vs NF4
#
llm
#
quantization
#
mlops
#
tutorial
Comments
Add Comment
7 min read
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 12
Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production
#
llm
#
ai
#
machinelearning
#
opensource
Comments
Add Comment
9 min read
Mixture of Experts (MoE): what it actually does under the hood, and when it pays off
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 13
Mixture of Experts (MoE): what it actually does under the hood, and when it pays off
#
llm
#
ai
#
architecture
#
opensource
1
reaction
Comments
Add Comment
8 min read
Structured output from LLMs: JSON mode, function calling, and grammar-constrained decoding
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 14
Structured output from LLMs: JSON mode, function calling, and grammar-constrained decoding
#
llm
#
ai
#
python
#
tutorial
Comments
Add Comment
7 min read
The Model Context Protocol (MCP): what it is and how to build a server
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 15
The Model Context Protocol (MCP): what it is and how to build a server
#
mcp
#
llm
#
ai
#
opensource
Comments
Add Comment
7 min read
RLHF vs DPO vs IPO vs KTO: which alignment method should you use
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 16
RLHF vs DPO vs IPO vs KTO: which alignment method should you use
#
llm
#
ai
#
alignment
#
opensource
Comments
Add Comment
8 min read
Tokenization under the hood: BPE, WordPiece, SentencePiece, and Unigram compared
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 17
Tokenization under the hood: BPE, WordPiece, SentencePiece, and Unigram compared
#
tokenization
#
llm
#
ai
#
nlp
Comments
Add Comment
9 min read
KV cache and PagedAttention: what they do and why they matter
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 20
KV cache and PagedAttention: what they do and why they matter
#
llm
#
ai
#
performance
#
opensource
1
reaction
Comments
Add Comment
8 min read
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account