Engineering Post: The part with taste: classifying types and matching names

#csharp #dotnet #nuget #showdev

I’m reviving Munchausen, a C# NuGet package I started 9 years ago. This is part 5 of an 8-part series documenting both the development process and the engineering decisions behind bringing the project back to life.

This is the Engineering Post: the reasoning, trade-offs, API decisions, and technical choices behind this part of the project.

M4 is the brain. Given a member the user didn't configure, the inference engine decides what to generate. It runs in two parts: a structural classifier that asks what shape is this?, and a stage pipeline that asks what value belongs here?, both producing entries for an immutable plan model that the compiler will later freeze.
This milestone tests the part of my design that is most subjective.

Structural classification

Before any name matching, every member type is bucketed:

Scalar: primitives, string, Guid, enums, the date/time family, decimal, Nullable<T> of these.
Nested: a class/struct with discoverable construction.
Collection: T[], List<T>, the read-only and interface variants, Dictionary<K,V> and friends.
Unsupported: interfaces/abstracts without registration, exotic value types (nint, Half, Int128), pointers.

There's one deliberate special case: byte[] is a scalar, not a collection.
It infers as "16 random bytes," because nobody wants a List<byte>-style element
walk over a blob. The classifier checks for it explicitly before the array branch.

The semantic catalog

The heart of the milestone is a 44-row candidate table, the project's curated
"taste," captured in the design document:

new(new[] { "firstname", "givenname", "forename" }, null, Str, High, "Name.First"),
new(new[] { "email", "emailaddress", "mail" },      null, Str, High, "Internet.Email"),
new(new[] { "make", "manufacturer" }, VehicleHints, Str, High, "Vehicle.Make",
    NoHintConfidence: Low),

Member names are normalized (split on case/underscore/hyphen, lowercase, join) so
FirstName, first_name, and FIRST-NAME all become firstname. The matcher
then applies the documented rules: the member's value type must equal the
candidate's; an exact name match with a matching model hint is High confidence; an exact match with no hints uses the candidate's base; a hint-gated miss drops a level; a suffix match (CustomerEmail ends with email) drops one below the exact
result. The selected mode (Conservative/Balanced/Aggressive) then filters by confidence, and rejected candidates are recorded for Explain().

A real design fork

The catalog had an internal tension I had not noticed while designing it. The
general rule says a hint-gated miss drops one level (High → Medium). But the
Vehicle rows annotate make/model as "no hint: Low" (two levels) and
year as "no hint: a different generator at Medium." My rule and exceptions
disagreed, and the choice affects seeded output, so I stopped implementing and
revisited the intended user experience.

The resolution (the per-row notes win over the general rule) came down to a
user-experience argument worth repeating: the words make and model are
extremely vehicle-specific. If a Printer.Make confidently resolves to "Toyota"
under the default mode, that's the worst failure for a mock-data tool, output
that looks plausible but is nonsense and slips into fixtures unnoticed. Dropping
to Low means it falls back to generic lorem text under the default, which the user
sees and corrects. A visible false-negative beats an invisible false-positive.
So the candidate model grew optional NoHintConfidence/NoHintGenerator fields
to encode that decision explicitly.

The catalog is executable

A conformance test records all 44 expected rows independently from the catalog
code and compares them one-to-one against the implementation: names, hints, value
types, and base confidence. Drift in either direction fails the build. The
catalog is not just data; its intended shape is executable, the same approach I
will use for the diagnostic registry in M5.

Internal enums, on purpose

InferenceConfidence and InferenceSource are public types in the final API, but they belong to the report family that ships in M8. To keep M4 free of public surface, they live as internal mirrors here and get promoted later (a trick that works cleanly because a Munchausen.Inference type resolves an unqualified InferenceConfidence against the enclosing Munchausen namespace).

What's next: M5, the Definition Compiler

M4 produces inference decisions; M5 assembles them into a real, frozen plan.
The DefinitionCompiler pipeline, resolve expressions, detect rule conflicts, plan construction, infer, validate, compile the reachable child graph (including recursive types), wired to a diagnostic registry where every Build() failure carries a stable LIE code. It's where Build() finally stops throwing NotImplementedException and starts doing its job.