Scarab Systems

Posted on Jun 19

Scarab Diagnostic Field Test #033 - Prometheus Remote-Write Label Order Boundary

#prometheus #observability #discuss #ai

Target: prometheus/prometheus

Issue: prometheus/prometheus#11505

Pull request: prometheus/prometheus#18978

Field Lab record: Prometheus #11505

This field test was about a small but important remote-write boundary in Prometheus:

labels are required to be sorted lexicographically, but incoming remote-write labels could pass through a conversion path before that requirement was enforced.

That sounds narrow because it is narrow.

But this is exactly the kind of case Scarab is meant to make visible.

Not "the remote-write subsystem is broken."

Not "rewrite ingestion."

Not "let an agent go hunting."

The question was smaller:

where does repo truth say invalid input should stop, and where was the actual boundary allowing that truth to blur?

Field Lab record

The public Field Lab record for this case is here:

https://github.com/scarab-systems/scarab-field-lab/tree/main/field-tests/prometheus-prometheus-11505

The record includes the public issue, the public pull request, the repair scope, validation summary, and non-claims.

It contains public links, status, validation, and claim boundaries only. It does not contain SDS source code or non-public product material.

That distinction matters.

The public artifact is the field record, not SDS itself.

SDS result

SDS surfaced a boundary mismatch around remote-write label ordering.

The public issue already identified the expected rule:

remote-write labels are supposed to be sorted lexicographically.

The meaningful diagnostic result was not just "sort labels."

In fact, the repair does not sort the incoming data.

The result was that invalid remote-write input should be rejected before it can be made to look acceptable by a later conversion path.

That is the boundary.

The input contract belongs at the edge where the remote-write request is being interpreted.

If invalid input crosses that edge and then gets normalized, the system can lose the ability to distinguish valid caller behavior from repaired caller behavior.

Failure shape

The failure shape was subtle:

Remote-write input arrives with label names in an invalid order.
Prometheus has a public and internal expectation that labels are sorted.
A conversion path can produce Prometheus labels in sorted form.
That means invalid input can appear clean after conversion.

The concern is not that sorting is technically impossible.

The concern is that sorting at the wrong layer changes what the receiver is saying to the caller.

If a remote-write sender sends invalid data, the receiver should be able to reject that data as invalid input.

If the receiver silently normalizes it, the caller gets a different contract:

"send whatever order you want; the receiver will fix it."

That was not the boundary described by the issue.

Boundary

The boundary in this field test was:

remote-write request validity must be checked before incoming label data is converted into Prometheus' internal label representation.

That is a code boundary, but it is also a contract boundary.

On one side is the external remote-write request.

On the other side is Prometheus' internal representation.

The repair belongs at the crossing point.

Once the request has crossed into internal representation, a validation failure can become harder to prove because the data may already have been reshaped.

That is why the patch is intentionally placed before append behavior, and before the conversion path can erase the original ordering problem.

What changed

The patch adds explicit label-order validation for both supported remote-write paths covered by the repair:

remote-write v1 label names are checked before v1 samples are appended
remote-write v2 label references are checked before v2 samples are appended
v1 unsorted-label series follow the existing invalid-label skip path
v2 unsorted-label series follow the existing partial-write bad-request path
regression coverage was added for both request shapes

The pull request is here:

https://github.com/prometheus/prometheus/pull/18978

The changed files are public:

storage/remote/write_handler.go
storage/remote/write_handler_test.go

The patch does not redesign remote-write ingestion.

It does not change remote-read.

It does not introduce a new validation framework.

It keeps the repair at the specific boundary identified by the field test.

Why this was not a broad rewrite

This is one of the recurring lessons from Scarab field testing:

when the boundary is clear, the repair should get smaller, not larger.

A code agent can produce a large patch very quickly.

That is not the hard part anymore.

The hard part is deciding what the patch is allowed to claim.

For this case, the claim is not:

"remote-write validation is now perfect."

The claim is:

"incoming remote-write labels that are out of lexicographic order are checked before they can be normalized into the internal label representation."

That is a much narrower statement.

It is also a more reviewable one.

Why the diagnostic result mattered

The practical bug is about label ordering.

The larger field-test value is about method.

Modern code agents are very good at moving through a repository and producing edits. That can be useful, but it can also make patches feel suspicious when the intent is unclear.

Scarab's position is different:

first find the truth boundary, then patch only that boundary.

In this case, the repo truth was that labels must be sorted.

The boundary failure was that remote-write input could cross into a representation where the original invalid ordering was no longer visible.

The patch follows from that.

Not because the agent guessed a fix.

Because the public issue, the code surface, and the validation boundary pointed to a narrow repair.

That is the shift I am testing in public:

code-agent work should not begin with "what can we generate?"

It should begin with "where did the repo stop enforcing what it already says is true?"

Validation

The repair was validated in a Linux arm64 container with the Prometheus test workflow used for the public PR.

Validation recorded in the Field Lab:

make test passed
make lint passed
go test ./storage/remote -count=1 passed

At the time this draft was prepared on June 19, 2026, the public pull request was open and ready for review.

The public status is:

PR open
DCO passing
Netlify deploy-preview successful or informational
upstream review required

That status matters too.

This field report does not claim upstream acceptance.

It claims a public diagnostic record, a narrow repair, and a submitted PR.

Maintainers decide whether the patch belongs upstream.

Field test result

Result:

diagnostic proof and repair submitted.

The field test produced:

a public issue-to-boundary record
a narrow repair patch
regression coverage for the repaired behavior
a validation summary
a public upstream PR

The patch is intentionally boring.

That is the point.

The interesting thing is not that a code change exists.

The interesting thing is that the repair boundary is explainable without exposing proprietary diagnostic logic or asking maintainers to trust a black box.

Public claim

This field test supports a narrow public claim:

SDS identified a remote-write label-order boundary in Prometheus where invalid input needed to be checked before conversion could normalize it, and a human-submitted repair was prepared for that boundary.

It does not claim:

that Prometheus accepted the patch
that all remote-write validation issues are resolved
that Scarab repairs projects by itself
that SDS source or product details are public
that maintainers endorsed Scarab or the Field Lab

The Field Lab exists to keep those claims separate.

Disclosure

This field report was prepared with AI-assisted editing from public field-test notes, public issue and PR records, and the public Field Lab record. The diagnostic claim, repair boundary, and final wording were human reviewed.

Scarab Diagnostic Suite is proprietary. The Field Lab publishes public case records, issue links, validation summaries, and claim boundaries only.

SDS finds evidence. People make claims. Maintainers decide.

DEV Community