Target: prometheus/prometheus
Issue: prometheus/prometheus#11505
Pull request: prometheus/prometheus#18978
Field Lab record: Prometheus #11505
This field test was about a small but important remote-write boundary in Prometheus:
labels are required to be sorted lexicographically, but incoming remote-write labels could pass through a conversion path before that requirement was enforced.
That sounds narrow because it is narrow.
But this is exactly the kind of case Scarab is meant to make visible.
Not "the remote-write subsystem is broken."
Not "rewrite ingestion."
Not "let an agent go hunting."
The question was smaller:
where does repo truth say invalid input should stop, and where was the actual boundary allowing that truth to blur?
Field Lab record
The public Field Lab record for this case is here:
https://github.com/scarab-systems/scarab-field-lab/tree/main/field-tests/prometheus-prometheus-11505
The record includes the public issue, the public pull request, the repair scope, validation summary, and non-claims.
It contains public links, status, validation, and claim boundaries only. It does not contain SDS source code or non-public product material.
That distinction matters.
The public artifact is the field record, not SDS itself.
SDS result
SDS surfaced a boundary mismatch around remote-write label ordering.
The public issue already identified the expected rule:
remote-write labels are supposed to be sorted lexicographically.
The meaningful diagnostic result was not just "sort labels."
In fact, the repair does not sort the incoming data.
The result was that invalid remote-write input should be rejected before it can be made to look acceptable by a later conversion path.
That is the boundary.
The input contract belongs at the edge where the remote-write request is being interpreted.
If invalid input crosses that edge and then gets normalized, the system can lose the ability to distinguish valid caller behavior from repaired caller behavior.
Failure shape
The failure shape was subtle:
- Remote-write input arrives with label names in an invalid order.
- Prometheus has a public and internal expectation that labels are sorted.
- A conversion path can produce Prometheus labels in sorted form.
- That means invalid input can appear clean after conversion.
The concern is not that sorting is technically impossible.
The concern is that sorting at the wrong layer changes what the receiver is saying to the caller.
If a remote-write sender sends invalid data, the receiver should be able to reject that data as invalid input.
If the receiver silently normalizes it, the caller gets a different contract:
"send whatever order you want; the receiver will fix it."
That was not the boundary described by the issue.
Boundary
The boundary in this field test was:
remote-write request validity must be checked before incoming label data is converted into Prometheus' internal label representation.
That is a code boundary, but it is also a contract boundary.
On one side is the external remote-write request.
On the other side is Prometheus' internal representation.
The repair belongs at the crossing point.
Once the request has crossed into internal representation, a validation failure can become harder to prove because the data may already have been reshaped.
That is why the patch is intentionally placed before append behavior, and before the conversion path can erase the original ordering problem.
What changed
The patch adds explicit label-order validation for both supported remote-write paths covered by the repair:
- remote-write v1 label names are checked before v1 samples are appended
- remote-write v2 label references are checked before v2 samples are appended
- v1 unsorted-label series follow the existing invalid-label skip path
- v2 unsorted-label series follow the existing partial-write bad-request path
- regression coverage was added for both request shapes
The pull request is here:
https://github.com/prometheus/prometheus/pull/18978
The changed files are public:
storage/remote/write_handler.gostorage/remote/write_handler_test.go
The patch does not redesign remote-write ingestion.
It does not change remote-read.
It does not introduce a new validation framework.
It keeps the repair at the specific boundary identified by the field test.
Why this was not a broad rewrite
This is one of the recurring lessons from Scarab field testing:
when the boundary is clear, the repair should get smaller, not larger.
A code agent can produce a large patch very quickly.
That is not the hard part anymore.
The hard part is deciding what the patch is allowed to claim.
For this case, the claim is not:
"remote-write validation is now perfect."
The claim is:
"incoming remote-write labels that are out of lexicographic order are checked before they can be normalized into the internal label representation."
That is a much narrower statement.
It is also a more reviewable one.
Why the diagnostic result mattered
The practical bug is about label ordering.
The larger field-test value is about method.
Modern code agents are very good at moving through a repository and producing edits. That can be useful, but it can also make patches feel suspicious when the intent is unclear.
Scarab's position is different:
first find the truth boundary, then patch only that boundary.
In this case, the repo truth was that labels must be sorted.
The boundary failure was that remote-write input could cross into a representation where the original invalid ordering was no longer visible.
The patch follows from that.
Not because the agent guessed a fix.
Because the public issue, the code surface, and the validation boundary pointed to a narrow repair.
That is the shift I am testing in public:
code-agent work should not begin with "what can we generate?"
It should begin with "where did the repo stop enforcing what it already says is true?"
Validation
The repair was validated in a Linux arm64 container with the Prometheus test workflow used for the public PR.
Validation recorded in the Field Lab:
-
make testpassed -
make lintpassed -
go test ./storage/remote -count=1passed
At the time this draft was prepared on June 19, 2026, the public pull request was open and ready for review.
The public status is:
- PR open
- DCO passing
- Netlify deploy-preview successful or informational
- upstream review required
That status matters too.
This field report does not claim upstream acceptance.
It claims a public diagnostic record, a narrow repair, and a submitted PR.
Maintainers decide whether the patch belongs upstream.
Field test result
Result:
diagnostic proof and repair submitted.
The field test produced:
- a public issue-to-boundary record
- a narrow repair patch
- regression coverage for the repaired behavior
- a validation summary
- a public upstream PR
The patch is intentionally boring.
That is the point.
The interesting thing is not that a code change exists.
The interesting thing is that the repair boundary is explainable without exposing proprietary diagnostic logic or asking maintainers to trust a black box.
Public claim
This field test supports a narrow public claim:
SDS identified a remote-write label-order boundary in Prometheus where invalid input needed to be checked before conversion could normalize it, and a human-submitted repair was prepared for that boundary.
It does not claim:
- that Prometheus accepted the patch
- that all remote-write validation issues are resolved
- that Scarab repairs projects by itself
- that SDS source or product details are public
- that maintainers endorsed Scarab or the Field Lab
The Field Lab exists to keep those claims separate.
Disclosure
This field report was prepared with AI-assisted editing from public field-test notes, public issue and PR records, and the public Field Lab record. The diagnostic claim, repair boundary, and final wording were human reviewed.
Scarab Diagnostic Suite is proprietary. The Field Lab publishes public case records, issue links, validation summaries, and claim boundaries only.
SDS finds evidence. People make claims. Maintainers decide.
Top comments (0)