DEV Community

Cover image for Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane
Yoshihiro Hasegawa
Yoshihiro Hasegawa

Posted on

Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

The previous article is here: Mainframes, Monads, and Stack Machines: Extending Siunertaq to the Batch Layer

This is the second follow-up to Dhall-to-Effect and provably safe task orchestration and Mainframes, Monads, and Stack Machines. The short version of both: we model build graphs as norm-bounded quivers, verify them with Z3/Yices before any effect fires, and orchestrate steps using JCL-inspired ADTs backed by Dhall and Spring Batch + Pekko.

The previous posts worked at the graph level: vertices, edges, the shape of dependencies. This one goes one level down β€” to registers.

The insight that forced this descent was noticing that Ref[IO, Array[Byte]] is a dangerous type: anyone who calls .get holds a mutable alias to the array and can change its contents without going through the Ref. We needed something with the guarantees of a GPU register file: fixed width, typed, atomically updated, never aliased. That observation turned out to connect directly to SIMD/SIMT architecture β€” and from there, to complex Berkovich heights and braid group theory.

Here is how it unfolded.


πŸ—ƒοΈ The Problem with Ref[IO, Array[Byte]]

Consider the naive state representation for a Golay codeword stored in a Cats Effect Ref:

val codewordRef: Ref[IO, Array[Byte]] = ???

// Looks safe…
codewordRef.get.flatMap { arr =>
  arr(0) = 0xFF.toByte  // ← mutation outside the Ref boundary. Ref never sees this change.
  IO.unit
}
Enter fullscreen mode Exit fullscreen mode

This is the array-aliasing problem. Array[Byte] is mutable; Ref only protects against concurrent writes through the Ref, not against mutation of the value you extracted. The fix isn't synchronized β€” it's to make the inner value structurally immutable.

IArray[Byte] (Scala 3's immutable array wrapper) closes that gap. But there is a second problem: Ref[IO, IArray[Byte]] carries no information about which codeword this is, or how long it should be. The length constraint (exactly 23 bytes for a binary Golay word) has to live somewhere. In a GPU, the register width is a hardware invariant; in Scala, we encode it as an opaque type.


πŸ”’ opaque type as a Register Width Guarantee

opaque type GolayWord = IArray[Byte]

object GolayWord:
  def apply(bytes: IArray[Byte]): Either[String, GolayWord] =
    if bytes.length == 23 then Right(bytes)
    else Left(s"Golay(23) requires exactly 23 bytes; got ${bytes.length}")

  val zero: GolayWord = IArray.fill(23)(0.toByte)

  extension (w: GolayWord)
    def xor(other: GolayWord): GolayWord =
      IArray.tabulate(23)(i => (w(i) ^ other(i)).toByte)
    // xor is closed: result is always 23 bytes, so the invariant is preserved
Enter fullscreen mode Exit fullscreen mode

Outside the GolayWord companion, callers hold an abstract token. They cannot:

  • Construct a GolayWord of the wrong length.
  • Call IArray methods directly and accidentally truncate it.
  • Pass it where an arbitrary IArray[Byte] is expected without an explicit .bytes call.

This is identical to what a typed register does in hardware: the width is enforced at the boundary, and the CPU never sees a "34-bit float" just because someone miscounted.

A second opaque layer wraps the Ref itself:

opaque type GolayWordRef = Ref[IO, GolayWord]

object GolayWordRef:
  def make(initial: GolayWord): IO[GolayWordRef] = Ref.of[IO, GolayWord](initial)

  extension (ref: GolayWordRef)
    def xorWith(other: GolayWord): IO[Unit] = ref.update(_.xor(other))
    def hammingWeight: IO[Int]               = ref.get.map(_.count(_ != 0))
    // The raw Ref is never exposed; only algebra-preserving updates are permitted.
Enter fullscreen mode Exit fullscreen mode

The invariant is now structural: because GolayWord.xor is closed (it always returns 23 bytes), every permitted update through GolayWordRef preserves the length. You cannot break the 23-byte constraint without defeating the opaque type β€” which requires touching the companion object source.


πŸš€ SIMD Lane Counts: the Orbit Sizes

Now consider the Golay code's five weight classes: W0, W8, W12, W16, W24.

W0  β€” orbit size    1  (the zero codeword)
W8  β€” orbit size  759  (octads; Steiner S(5,8,24) blocks)
W12 β€” orbit size 2576  (dodecads)
W16 β€” orbit size  759  (complement octads)
W24 β€” orbit size    1  (the all-ones codeword)
Total: 4096 = 2ΒΉΒ²
Enter fullscreen mode Exit fullscreen mode

Each weight class is not a single value β€” it is a lane group. When the Frobenius morphism acts on W8, it acts on all 759 octads simultaneously. This is SIMD in the mathematical sense: one instruction (FVRole.Frobenius), 759 data elements, all updated in lockstep.

extension (gw: GolayWeight)
  def orbitSize: Int = gw match
    case GolayWeight.W8  => 759   // ← 759 SIMD lanes
    case GolayWeight.W12 => 2576  // ← 2576 SIMD lanes (the densest class)
    ...
Enter fullscreen mode Exit fullscreen mode

The SIMT side is the Ref[IO, State] cells β€” one per quiver vertex. Each cell is a thread register running its own fiber under Cats Effect:

class BSDQuiverManager(
  val leechRef:   Ref[IO, LeechState],    // Thread 0 register file
  val affineRef:  Ref[IO, AffineDualState], // Thread 1 register file
  val padicRef:   Ref[IO, PadicState],    // Thread 2 register file
  val selmerRef:  Ref[IO, SelmerState]    // Thread 3 register file
)
Enter fullscreen mode Exit fullscreen mode

IO.parTraverse over the BSDArrow list is the warp: all four fibers execute the same instruction stream (executeArrow) on their own registers, concurrently. The quiver morphism is the instruction; the Ref cells are the per-thread register file.

GPU Siunertaq
SIMD lane GolayWeight.orbitSize elements per weight class
SIMT thread register Ref[IO, State] per BSDVertex
Warp instruction BSDArrow[S, T] morphism
Warp execution IO.parTraverse over arrow list
Thread mask (predication) galoisHeight semistability bound
Register width guarantee opaque type GolayWord / opaque type PhantomCarabinerRef
IEEE 754 machine epsilon MachineConstants.machineEpsilonReal = 2⁻⁡²
Mantissa bits MachineConstants.valuationDepth = 52

machineEpsilonReal and valuationDepth are not decorative. The qAdicEquivalent tolerance check uses machineEpsilonReal as its floor, and the p-adic valuation tower truncates at depth 52 β€” the point where the floating-point representation runs out. The theory is literally grounded in the hardware.


🧡 The Braid Word Problem

BSDArrow is a phantom-typed enum:

enum BSDArrow[Src <: BSDVertex, Tgt <: BSDVertex](
  val src: Src, val tgt: Tgt, val role: FVRole, val effect: IO[Unit])
Enter fullscreen mode Exit fullscreen mode

A List[BSDArrow[? <: BSDVertex, ? <: BSDVertex]] is a braid word: a sequence of typed generators where each generator knows which strands it connects. Reading the list left-to-right and applying each generator is the JVM analogue of evaluating a braid group element one letter at a time.

The complication: BSDArrow has no unapply, so you cannot destructure it with a constructor pattern:

// ILLEGAL β€” the compiler rejects this
case BSDArrow(src, tgt, FVRole.Frobenius, _) => ...
// Error: BSDArrow cannot be used as an extractor; lacks unapply
Enter fullscreen mode Exit fullscreen mode

The fix is what we call a braid pop: extract only the field you need, and let the declared type bound discharge the existential automatically.

// Pop .role β€” it is a concrete FVRole; no existential type involved.
a.role match
  case FVRole.Frobenius    => ThresholdConstraint.FrobeniusGE(a.src, a.tgt)
  case FVRole.Verschiebung => ThresholdConstraint.VerschiebungLE(a.src, a.tgt)
// a.src has type ? <: BSDVertex; since BSDVertex is the upper bound,
// it is accepted wherever BSDVertex is expected β€” no cast, no ascription.
Enter fullscreen mode Exit fullscreen mode

Alternatively, a type-test pop fully resolves the existential by naming the concrete case:

case a: BSDArrow.TensorBang =>
  ThresholdConstraint.FrobeniusGE(a.src, a.tgt)
// a.src: Leech.type   β€” a singleton type, provably <: BSDVertex
// a.tgt: AffineDual.type β€” same; the compiler checks this at compile time
Enter fullscreen mode Exit fullscreen mode

The type-test form is used when exhaustiveness matters (e.g. BSDQuiverManager.executeArrow, which must cover all four cases). The .role match form is used when the semantic partition β€” Frobenius vs Verschiebung β€” is what matters (e.g. ThresholdProblem.fromArrows). Either way, no runtime cast is needed; the phantom types do the work.


πŸ“ Lifting to the Complex Plane

The discrete GolayWeight encodes five points on the Berkovich tree. To model continuous motion along the tree β€” which is what the IUT Θ-link and p-adic Frobenius actually do β€” we need a complex-valued position.

opaque type ComplexWeight = (Double, Double)
// re: Berkovich height h ∈ [0, 8]   (where does the point sit on the tree?)
// im: obstruction residual degree    (how far off the real axis?)
Enter fullscreen mode Exit fullscreen mode

This is the step from u8 (discrete weight) to c64 (complex float) in GPU register-type terminology. The width is still enforced by the opaque type; the algebra is now richer.

PhantomCarabiner holds one ComplexWeight and a β„€/4β„€ phase, and exposes three core operations:

complement   w ↦ 6 βˆ’ conj(w)    self-duality; h + hΜ„ = 6 on β„‚
verschiebung w ↦ w / 2          Witt-vector V for p = 2; contracts toward 0
thetaLink    w ↦ βˆ’wΒ·i           IUT Θ-link; rotates by βˆ’Ο€/2; period 4
Enter fullscreen mode Exit fullscreen mode

thetaLink has β„€β‚„ periodicity: apply it four times and you return to the start. This is exactly the Pauli group action, and it mirrors how a 90Β° rotation in a SIMD lane accumulates to a full 360Β° after four passes.


πŸ’ Berry Phase: the Observable That Survives Orbit-Averaging

After a long sequence of quiver morphisms, what can you actually read out of a PhantomCarabiner?

def berryPhaseAngle: Double  // arg(weight) ∈ (βˆ’Ο€, Ο€]
def weightNormSq:    Double  // |weight|Β² = reΒ² + imΒ²
Enter fullscreen mode Exit fullscreen mode

These two quantities are gauge-invariant: they do not depend on the path taken through the braid word, only on the accumulated holonomy.

  • weightNormSq is preserved by thetaLink (rotation, no scaling) and halved by verschiebungΒ².
  • berryPhaseAngle advances by exactly βˆ’Ο€/2 under each thetaLink and is unchanged by verschiebung.

In physics terms, berryPhaseAngle is the holonomy of the adiabatic connection β€” the phase your state picks up when you move it around a closed loop in parameter space. In GPU terms, it is the value in the accumulator register after the warp finishes: the only thing the CPU can measure once the thread's intermediate states are gone.

PhantomCarabinerRef wraps the mutable state and exposes only the gauge-invariant projections as reads:

opaque type PhantomCarabinerRef = Ref[IO, PhantomCarabiner]

extension (ref: PhantomCarabinerRef)
  def berryPhaseAngle: IO[Double] = ref.get.map(_.berryPhaseAngle)
  def weightNormSq:    IO[Double] = ref.get.map(_.weightNormSq)
  def applyThetaLink:  IO[Unit]   = ref.update(_.thetaLink)
  // The raw Ref is inaccessible; only algebra-preserving transitions are exposed.
Enter fullscreen mode Exit fullscreen mode

πŸ“ Two Heights, Two Semistability Conditions

One of the more subtle points in the codebase is that there are two height functions, used for two different purposes:

octadHeight(w)  = w / 3                     linear   [0, 8]; Berkovich tree position
galoisHeight(n) = 8 Β· log(n) / log(24)      logarithmic; GIT semistability mask
Enter fullscreen mode Exit fullscreen mode

octadHeight is the per-thread register value: where is this thread on the tree? It satisfies h(w) + h(24-w) = 8 (the fan functional equation), making the Golay route self-dual.

galoisHeight is the thread mask. Orbits whose "instruction count" grows faster than log are semistability-filtered out before the SMT solver even sees them β€” exactly as divergent SIMT threads are predicated off before the instruction is retired. The two functions are numerically different:

galoisHeight(8)  β‰ˆ 5.23   β€” representation-theoretic weight
octadHeight(8)   = 2.67   β€” geometric height on the Berkovich tree
Enter fullscreen mode Exit fullscreen mode

Conflating them (the mock galoisHeight(tau) = tau.toDouble in earlier versions of BSDQuiver.scala) is the equivalent of using the lane index as the accumulator value β€” a type error that the compiler cannot catch but the math will eventually surface.


πŸ”­ What's Next: Wiring the Register File into the Batch Layer

With PhantomCarabinerRef in place, the remaining step is to wire it into BSDQuiverManager so that berryPhaseAngle is tracked across arrow firings:

// Planned: each vertex Ref also holds a PhantomCarabiner
class BSDQuiverManager(
  val leechRef:   Ref[IO, (LeechState, PhantomCarabiner)],
  ...
):
  def executeArrow[S, T](arrow: BSDArrow[S, T]): IO[Boolean] = ???
  // After SMT confirms the norm bound is satisfiable:
  //   fire arrow.effect
  //   update the phantom register for tgt via thetaLink or verschiebung
  //   expose berryPhaseAngle for the monitoring layer
Enter fullscreen mode Exit fullscreen mode

Once that is in place, the batch layer gains a continuous diagnostic signal: the Berry phase accumulated across the job graph tells you how much the evaluation has rotated away from the real axis β€” in other words, how much p-adic obstruction the completed steps have introduced into the downstream norm bounds.

The mlir-bridge will then map BSDArrow decompositions to MLIR Affine Dialect norm constraints, JIT-compiling the verified bounds via LLVM IR. At that point, the stack machine in core is no longer just an evaluator β€” it is an IR generator, and the Dhall REPL becomes a live assembler for a JIT-compiled Berkovich-tree traversal.

The codebase is at github.com/Yoshyhyrro/Siunertaq. The Lean 4 specs for PhantomCarabiner, YangBaxterBanach, Carabiner, and MachineConstants are in the repository. PRs improving proof coverage β€” especially closing the open sorry on complement_theta_link_comm and yang_baxter_eq β€” are very welcome. πŸ™

Top comments (0)