FreshContext in agent workflows: judgment at the context handoff
After sharing FreshContext publicly, a few comments helped me sharpen where it fits.
One person asked whether I had tested it on larger datasets or production workloads.
Another pointed out something important about ingestion: if a table loses its headers, or a clause gets split in the wrong place, the context is already damaged before any judgment layer sees it.
That gave me a better way to think about the product boundary.
FreshContext is useful at the context handoff.
The simple shape is still:
```txt id="gpav98"
candidate context in
decision-ready context out
A caller provides candidate context. FreshContext evaluates it using signals like freshness, provenance, confidence, utility, and source profile. Then it returns a decision about how that context should be treated.
Examples:
```txt id="tvgx8u"
cite this
verify this
refresh this
use as background
watch this
exclude this
That is the part I want to keep clear.
The handoff problem
Agent workflows pass context around.
A retrieval step finds sources.
A planning step uses them.
A writing step turns them into text.
A review step checks the result.
Somewhere in that chain, source quality can disappear.
A source may begin as “relevant, but needs verification.”
After one summary, that warning is gone.
The next agent receives a clean paragraph and treats it as reliable.
That is the kind of problem FreshContext is trying to reduce.
The point is not to make the system more complicated. The point is to keep useful context judgment attached before another step uses it.
A simple flow could look like this:
```txt id="ly3xqz"
retrieval
-> candidate context
-> FreshContext judgment
-> agent reasoning
For larger workflows, the same idea can sit between agents:
```txt id="csh0xg"
research agent
-> FreshContext judgment
-> planning agent
-> writing agent
-> review agent
Not every step needs a judgment call. Some handoffs are low risk.
The useful question is:
```txt id="olec7s"
Before this context moves forward, do we know how much trust to place in it?
## Retrieval gives candidates. Judgment gives treatment.
Retrieval answers:
```txt id="pc4tou"
What might be relevant?
FreshContext asks:
```txt id="8tpxx7"
How should this context be treated?
A relevant source can still be stale.
A recent source can still be weak.
A useful source may only be good as background.
An official page may still need a date check.
A forum post may be useful signal, not proof.
This is why the output should be more than a relevance score.
A score can rank context.
A decision can guide the next step.
For example:
```txt id="50kg3l"
Decision: needs_verification
Reason: Relevant, but provenance is weak and the source date is unclear.
Action: Do not cite as primary until checked.
That kind of output is more useful to an agent than raw text alone.
Ingestion still matters
One community comment made this point well: judgment works better when ingestion preserves structure.
That means keeping things like:
```txt id="q0xnnt"
table headers
page numbers
section titles
source URLs
timestamps
author names
schema context
document boundaries
If that structure gets stripped away, FreshContext can still flag uncertainty, but it has less to work with.
A better pipeline is:
```txt id="t688h7"
structure-preserving ingest
-> context judgment
-> reasoning
That is a useful architecture because each layer has a job.
Ingestion keeps the source material intact.
FreshContext judges the candidate context.
The model or agent reasons with cleaner input.
That is also why I am trying not to rush the product into every adjacent category at once. The current value is in the judgment step.
Multi-data systems need different judgment rules
This becomes more important when a system uses many kinds of data.
A real workflow might pull from:
```txt id="eo5rwl"
PDFs
web pages
research papers
support tickets
GitHub issues
database rows
official docs
forum posts
internal notes
market signals
These sources should not all be treated the same way.
A research paper does not decay like a job post.
A support ticket does not carry the same evidence weight as official documentation.
A database row may look precise, while still missing the surrounding schema or business meaning.
A forum post may show useful community signal, while still being weak evidence.
FreshContext uses source profiles to handle that difference.
The goal is simple: judge different source types with different expectations.
Examples:
```txt id="9cn6tr"
academic_research
official_docs
jobs_opportunities
market_finance
social_pulse
The user-facing output should stay plain.
The internal rules can become more specific over time.
Where this can grow
The current public version is early.
It has a working evaluate_context path, source profiles, decision outputs, demos, arXiv proof work, reference adapters, and trust gates.
That is enough to test the shape of the idea.
It is not enough to claim large-scale production validation.
The next useful tests are practical:
```txt id="3prv3t"
larger candidate-context batches
more source profiles
latency and throughput checks
multi-agent pilot workflows
ingestion quality signals
decision benchmark sets
I do not want to add features just to make the system look bigger.
The better question is:
```txt id="65iw2l"
Does the judgment improve the next step?
If yes, the engine deserves to grow.
If no, more surface area will not help.
The current product boundary
For now, I am keeping the boundary simple.
FreshContext receives candidate context.
It evaluates the context.
It returns a decision-ready output.
That makes it useful as a checkpoint in RAG systems, agent workflows, and multi-data pipelines.
The most relevant use case I see right now is this:
```txt id="3kg0wn"
Keep agents honest about the sources they pass to each other.
If one agent hands context to another, the receiving agent should know whether that context is fresh, provenanced, citation-grade, uncertain, background-only, or in need of verification.
That is the job FreshContext is built around.
```txt id="xsbzp8"
candidate context in
decision-ready context out
That is the product identity I am keeping.

Top comments (5)
The handoff framing is right, but I'd turn it back on FreshContext's own output. The decision you emit —
needs_verification,cite this— is itself a piece of context that rides the same handoff, so it inherits the failure mode you're describing. "Relevant, but needs verification" doesn't disappear after one summary because it was badly worded; it disappears because the summary step had no obligation to carry it forward. Aneeds_verificationlabel attached as advice is exactly as droppable by that same step. So the verdict only survives if it's binding on the operation rather than advisory to the next agent: a "do-not-cite-as-primary" decision has to make the citation path refuse that source as primary, not annotate it with a note the writing agent is trusted to read. Otherwise the judgment layer relocates the trust-evaporation point one step downstream instead of closing it.Second, the decision is a point-in-time verdict that gets consumed as a standing fact. You compute freshness/provenance/confidence at the handoff and stamp
confidence: high— but across a long research → plan → write → review chain, that stamp is itself stale by the writing step, and nothing re-derives it. That reproduces your own problem one level up: instead of a clean paragraph treated as reliable, the next agent now receives a clean verdict treated as reliable. The fix is to make the decision carry what it was computed from — the signals and the timestamp — so a downstream step can ask "does this still hold?" instead of inheriting a judgment it can't re-check. Trust that can't be re-derived at the point of use is just a fresher-looking version of the warning that already evaporated.This is a strong point, and I agree with the core failure mode you’re describing.
A FreshContext decision cannot just be advisory prose that the next summarizer is trusted to preserve. If
needs_verificationis only a note attached to a paragraph, then the warning can evaporate the moment another agent rewrites or compresses the output.The decision has to travel as part of the operation contract, not only as natural language. In other words,
needs_verification,needs_refresh,cite_as_primary, etc. should remain structured routing decisions with provenance, timestamp, source profile, confidence signals, and the reason they were computed. The readable explanation is useful for humans, but the machine-readable verdict has to remain available downstream.That is the distinction I’m trying to make clearer in FreshContext:
Your framing is useful because it points to the next hard boundary: FreshContext should not only explain context quality; it should make context-handling obligations harder to drop as the workflow moves from research → plan → write → review.
Appreciate the critique. This is exactly the kind of edge case the project needs to handle if context integrity is going to be real infrastructure rather than just another annotation layer.
Agreed the verdict has to be structured rather than prose — but I'd push on one more step, because a structured
needs_verificationstill depends on the next agent honoring it. JSON instead of a sentence raises the cost of dropping the warning, but a rewriter can still flip the field. The version with teeth is makingsafe_to_citeunreachable except through an explicit verification event that references the originalneeds_verificationverdict's id. Then promotion isn't a property anyone can set — it's a new entry that has to point back at what it cleared, and at who cleared it.That flips the downstream obligation from "please carry this field forward" (still advisory, just machine-readable) to "you cannot reach the cite state without emitting the event that resolves it." And if that verification event is attributable to whoever actually ran the check, then a downstream agent silently promoting needs_verification → safe_to_cite leaves a visible hole in the chain: either no resolving event, or one that references nothing real. Promotion-without-verification becomes detectable rather than just discouraged.
That's the line you drew at the end — context integrity as infrastructure vs. a richer annotation layer. An annotation, however structured, is something the next agent is trusted to respect; infrastructure is something the next agent cannot route around without leaving a trace. The handoff stops being "honor this field" and becomes "you can't get to the next state without the event that earns it."
This is exactly the gap I've been sitting with since you posted this.
You're right that JSON raises the cost — it doesn't remove the problem. A rewriter can still flip the field and leave nothing behind. The verdict is structured but the chain isn't closed.
The event-sourcing framing is the honest version of what "enforcement" would actually mean here. Not a flag you carry forward, but a state you can only reach by emitting something traceable. needs_verification → safe_to_cite becomes a transition that leaves a hole if nothing resolved it.
I don't have this built yet. FreshContext currently returns the verdict and trusts the caller to check it. What you're describing is the next real thing — and I think you've named exactly why annotation layers keep collapsing into advisory no matter how structured the schema gets. The chain has to be closed at the architecture level, not the field level.
Going to write a separate post on this. Appreciate you pushing on it.
Right — and the reason it collapses to advisory is that the annotation and the value it certifies share one write path. Same actor, same transaction, so flipping both is atomic and leaves no gap to detect. "Closed at the architecture level" concretely means splitting those paths: the verdict stops being a field on the record and becomes a separate append-only event whose subject is the content hash of the record. Then a rewrite that flips the value yields a record whose hash no longer resolves to any verdict — the omission dangles instead of staying quietly consistent. needs_verification → safe_to_cite can only be reached by emitting that resolving event, and the hole is structurally visible if nothing did. Advisory becomes detectable not because the schema got stricter but because the certifier can no longer mutate the thing it certifies in the same breath. That's the half a field-level schema can't reach.