# Intermediate Representation

This document defines the currently implemented Masterbelt intermediate representation (IR).

The IR is intentionally minimal at this stage. Future IR additions must extend this document before or together with implementation changes.

## Purpose

The IR is the boundary between the type checker and downstream consumers. Code generators, evaluators, and indexes consume the IR rather than the AST so they share a single normalized program model.

## Normalization

An IR module reflects the following normalizations performed by lowering:

- Every type expression is resolved through type declarations. Aliases do not appear in the IR; uses are replaced with their resolved target type.
- Every literal value is decoded. Integer radix prefixes and digit separators are interpreted. String escape sequences are decoded. Source text is not retained.
- Collection literal values record items in source order. Map literal duplicate keys are deduplicated using last-wins semantics: the first occurrence's position is kept and its value is replaced by the last occurrence's value.
- Doc comments attached to declarations are preserved as text lines with the leading `///` removed.
- Other comments (`//`, `/* */`) are not present in the IR.

## Projects

An IR project is the lowered form of an entire Masterbelt program rooted at one entrypoint.

- A project records the entrypoint's canonical path, every loaded module keyed by canonical path, and the module path order topologically sorted from no-dependency leaves to the entrypoint.
- Codegen targets walk the project's order to emit one output per module.

## Modules

An IR module is the lowered form of one Masterbelt source file.

- A module has a name carrying the source file's canonical path, a source span covering the whole file, an ordered list of constants, an ordered list of type declarations, and an ordered list of re-exports.
- Constants appear in the same order as their declarations in source. Grouped const declarations contribute one constant per item, in source order.
- Type declarations appear in the same order as their declarations in source. Each declaration carries its source identifier, its public flag, the resolved target type after substitution of any chain of declared types, and an ordered list of declared type parameters. For a non-generic declaration the parameter list is empty and the target is the final resolved type; for a generic declaration the parameter list names the declared parameters in source order and the target is the template body whose type variables refer to those parameters.
- Re-exports represent `pub { ... } from "..."` declarations: each entry carries the local name visible in this module, the canonical path of the foreign module, and the foreign symbol name. Re-exports do not introduce new values or types; they make the foreign symbol available under the listed local name in this module's public surface.
- `use` declarations without `pub` do not appear in the IR; their effect is to rewrite identifier references inside this module's expressions into cross-module references during lowering.

## Constants

A constant carries:

- A name string matching the source identifier.
- A public flag set by the `pub` declaration modifier.
- An ordered list of doc comment text lines preserved from the source.
- A checked type after declared-type resolution.
- A value matching that type.
- A source span covering the const item.

The constant's type and value are linked by construction: lowering is responsible for ensuring the value matches the declared type, including for union and generic types.

## Expressions

Every IR expression belongs to one of the following forms. The form is decided by the expression itself; consumers use a type switch to dispatch.

- Null expression. The single null literal.
- Bool expression. A decoded `true` or `false`.
- Integer expression. A decoded integer literal. The internal representation is a signed 64-bit integer at this stage. The Masterbelt `int` type currently maps to the host language's natural integer type when generating code. Future integer width types will extend this representation.
- String expression. A decoded string literal with escape sequences resolved.
- List expression. An ordered sequence of nested expressions. The element types are determined by the surrounding constant's type.
- Map expression. An ordered sequence of key/value entries. After last-wins deduplication, no two entries have equal keys.
- Product expression. An ordered sequence of named field initializers. Field initializers preserve the source order of the literal. The field name strings match the field names of the surrounding constant's product type; every field declared by that type appears exactly once. Field types are determined by the surrounding constant's type.
- Reference expression. A reference to another constant. The expression carries the referent's module and name; when the module field is empty the reference targets a constant declared earlier in the same module, and when it names a foreign module the reference targets a public symbol of that module. Same-module forward references are rejected during checking; cross-module references resolve through the import system.

Expressions do not carry their own type. Consumers derive types from the containing constant's declared type by walking it together with the expression tree, and from referenced constants when the expression is a reference.

An expression's source span identifies the source it was lowered from. This makes diagnostics emitted by downstream phases attributable to user source.

## Lowering

Lowering is the phase that produces an IR module from a checked source file. Its input is a checker result for one file; its output is one IR module and a list of diagnostics introduced by lowering itself.

Lowering operates only on a checker result whose declarations have already been type checked. The set of diagnostics emitted by previous phases is not re-emitted by lowering; lowering appends only diagnostics that are first detected at this phase.

### Module Identity

The lowered module's name is the source file's name. Its source span is the source file's span. The constants slice mirrors source order of declarations.

### Const Declarations

Each const item from each const declaration contributes one constant to the IR module, in source order. The constant carries:

- The source identifier name.
- The public flag of the enclosing const declaration. All items in a `pub const ( ... )` group are public.
- The checked type recorded by the checker for the item's binding. Type declarations have been resolved to their target type.
- The doc comment lines attached to the const item, in source order, with the leading `///` removed. When a const declaration has a leading doc comment and items also have their own doc comments, the declaration's lines precede each item's lines.
- The lowered expression value.
- The source span of the const item.

A const item whose checker-recorded type is the invalid type is not contributed to the IR. Its diagnostics have already been emitted by previous phases.

### Type Declarations

A type declaration contributes one entry to the module's TypeDeclarations list. The entry preserves the declaration's source identifier, public flag, doc comments, declared type parameters, and the target type produced by checker-time substitution. The declared name is retained so downstream consumers can re-introduce it in target output even though the constant lowerings carry the resolved target type rather than the declared surface name.

#### Anonymous Product Hoisting

Anonymous product types written inline inside a declared body (for example `type Monster = { skills: list<{...}> }`) are normalized during lowering by hoisting each one to its own TypeDeclaration. The synthetic declaration's name is derived from the path that reaches the anonymous product: the owner's declared name followed by the PascalCase form of each field, with `Key` and `Value` suffixes for map argument positions. Anonymous products nested inside an already-hoisted declaration recurse with the synthesized name as the new owner so the path stays scoped to a single declaration tree. Synthesized names that would collide with an existing top-level name receive a numeric suffix.

A synthesized declaration inherits the public flag of the owner so callers across modules can reach it, and carries the subset of the owner's type parameters that the hoisted body references. The original use site is rewritten to apply those parameters back, keeping the generic shape intact.

After hoisting, the IR contains only named product types. Codegen targets see the synthesized declarations as ordinary TypeDeclarations and emit them through their normal product-type code path.

### Literal Expressions

- `null` lowers to a Null expression.
- `true` and `false` lower to Bool expressions carrying the decoded value.
- An integer literal lowers to an Int expression carrying the decoded signed 64-bit value. The literal's recorded base (2, 8, 10, or 16) and digit text drive the decode. A literal whose magnitude does not fit in a signed 64-bit integer is reported as integer out of range and the item is not contributed to the IR.
- A string literal lowers to a String expression carrying its already decoded value.

### Collection Expressions

- A list literal lowers to a List expression whose items are the lowered element expressions in source order.
- A map literal lowers to a Map expression whose entries are the lowered key/value pairs in source order, after applying last-wins deduplication: for each duplicate key, the position of the first occurrence is preserved and the value of the last occurrence replaces all earlier values for that key. Key equality is structural over already-lowered key expressions.
- An empty collection literal lowers to an empty List or empty Map according to the surrounding constant's checked type after declared-type resolution.
- A product literal lowers to a Product expression whose field initializers are the lowered values in source order. The product type used to type-check each field value is taken from the literal's typed prefix when present and otherwise from the surrounding constant's annotation after declared-type resolution. Field name uniqueness, missing fields, and unknown fields have already been validated by the checker.

### Identifier References

An identifier reference in expression position lowers to a Reference whose name matches the referent's source identifier. The checker has already established that the referent is a const item declared earlier in source; the IR guarantee that references point only backward in the constants slice follows from that.

The IR does not eagerly inline references. Consumers that need a reference's value walk to the referent constant.

## For Statements

A for statement reaches the IR as a control-flow node nested inside a function body. The IR distinguishes three subject shapes so codegen can pick the matching native loop without re-deriving them.

A for statement carries:

- A subject shape — `list`, `map`, or `range`.
- For `list` and `map` shapes, the lowered subject expression and the subject's checked element type(s) after declared-type resolution.
- For the `range` shape, the lowered start and end expressions (the IR records the recognized counted form so codegen emits a counted loop directly).
- One or two bindings depending on the shape; each binding carries a name (an empty name signals a `_` skip), the binding's checked type, and a source span. Map subjects always carry two bindings in `(key, value)` order.
- A lowered function block for the loop body.
- A source span covering the whole statement.

Break and continue statements lower as bare nodes with only a source span. The IR does not resolve which loop they target; consumers walk the surrounding statement tree to attach them to the innermost loop the same way the checker did.

The `range(start, end)` recognition is performed during lowering and is purely a representation choice: the IR's `list<int>` element type is unchanged, so consumers that ignore the counted-form flag can still treat the subject as a list and remain correct.

## Match Statements

A match statement reaches the IR as a control-flow node nested inside a function body. The IR preserves the surface ordering of arms and the bindings introduced by each pattern.

A match statement carries:

- The lowered subject expression.
- The subject's checked type after declared-type resolution.
- An ordered list of arms, in source order.
- A source span covering the whole statement.

A match arm carries:

- An ordered list of one or more lowered patterns (the `|`-separated alternatives at one arm position).
- An optional lowered guard expression.
- An ordered list of bindings introduced by the pattern. Each binding carries a name, the binding's checked type, and the binding's source span.
- A lowered function block for the arm body.
- A source span covering the whole arm.

A match pattern is one of:

- A **type pattern** carrying the matched type after declared-type resolution and an optional binding entry referenced by name.
- An **enum pattern** carrying the enum type and the variant name.
- A **literal pattern** carrying the lowered literal expression.
- A **product pattern** carrying the matched product type, an ordered list of field sub-patterns (each with the field name and a nested pattern), and an optional whole-value binding entry referenced by name.
- A **wildcard pattern** carrying no payload.

The implicit identifier narrowing described in [types.md](types.md#match-statements) is normalized during lowering: a subject expression that is a plain identifier and a pattern without an explicit binding produce an implicit binding entry whose name is the identifier's name and whose type is the narrowed type. Downstream consumers therefore see explicit bindings on every arm where a narrowed local would be reachable.

Exhaustiveness, reachability, and binding-set agreement across alternatives are enforced by the checker; the IR records the arms verbatim and does not synthesize a catch-all arm.

## Master Validation

A master's `validation` section reaches the IR as a `MasterValidation` value. Its surface form, scoping, and execution model are defined in [masterdata/validation.md](../masterdata/validation.md); the IR records the lowered rules so the validation evaluator can run them at export time.

A `MasterValidation` carries:

- `Master` — the flattened codegen name of the owning master (the name used by code-generation output, for example `UserFriendships`).
- `VisiblePath` — the module-local dotted source path of the master (for example `User.Friendships`). The export driver rewrites the top segment of this path through the entry module's re-export aliases to obtain the entrypoint-visible path that validator configuration keys against (so an aliased re-export `pub { User as U }` yields `U.Friendships`).
- `Rules` — an ordered list of `MasterValidationRule` values in source order.

A `MasterValidationRule` carries:

- `Scope` — the rule's scope, one of `MasterValidationEach` or `MasterValidationAll`.
- `Name` — the validator's stable identifier, unique within the master across both scopes.
- `Body` — the lowered statement block.

The implicit bindings the evaluator supplies inside a rule body are determined by the scope rather than recorded as explicit IR bindings:

- For an `each` rule, `row` and `self` are bound to one post-filter record; both have the master's record type.
- For an `all` rule, `table` and `self` are bound to the master's post-filter relation; both have type `Relation<M>`, and iterating it yields the post-filter records in plan order. A rule may apply the master's scopes and stage operators to that relation.
- `self` is bound to the same value as `row` (in `each`) or `table` (in `all`).

## Master Scopes

A master's [scope](../masterdata/schema.md#scope-section) declarations reach the IR as an ordered list of `MasterScope` values on the owning master. The surface form is defined in [masterdata/schema.md](../masterdata/schema.md#scope-section); the IR records each scope so code generation can emit a relation method and the SQLite exporter can infer secondary indexes.

A `MasterScope` carries:

- `Master` — the flattened codegen name of the owning master.
- `Name` — the scope name, unique within the master.
- `Public` — the `pub` flag controlling generated-API exposure.
- `Indexed` — the `indexed` flag enabling SQLite index inference.
- `Parameters` — the lowered parameter list, in source order, with the same shape a lowered function uses.
- `Body` — the lowered scope body (a statement block or an arrow expression) whose result is the master's relation.

### Query Plan Shape

To support both code generation and index inference, the lowered body exposes the scope's relation **plan**: the ordered stages applied to `self`. Each stage is one of:

- `Where(predicate)` — a predicate tree whose leaves carry a `FieldRef` (the record field's source name), the operator (`eq`, `ne`, `lt`, `le`, `gt`, `ge`, `in`, `between`), and the operand (a literal, a scope-parameter reference, or a referenced const/static). Interior nodes are `and` / `or` / `not`.
- `OrderBy(ordering)` / `ThenBy(ordering)` — a `FieldRef` plus a direction (`asc` / `desc`).
- `Skip(n)` / `Take(n)` — a count operand.

A scope call inside a body inlines the callee scope's stages into the caller's plan (with the callee's parameters bound to the call arguments), so the plan is self-contained and free of scope-call nodes. The plan order preserves source order across `where` / `orderBy` / `thenBy` / `skip` / `take` and across inlined chains. This is the representation the [SQLite index inference](../masterdata/export-sqlite.md#secondary-indexes-from-indexed-scopes) reads; index inference is performed on the lowered plan, not on source syntax.

### Assert Statement

An `assert` statement lowers to an `Assert` node carrying:

- `Condition` — the lowered condition expression.
- `ExprText` — the condition's source text.

The failed assertion's expression and source span are retained rather than collapsed to a boolean result. This lets a future PowerAssert-style reporter display sub-expression values without a surface or IR change; the MVP evaluator uses only the boolean outcome.

## Symbols and Visibility

A top-level IR declaration is called a symbol. Every symbol has a name and a public flag derived from the source program's `pub` modifier. The set of symbol kinds is open: constants exist today; future kinds such as records, methods, callables, and type declarations will extend the same abstraction.

Every symbol declares which other symbols it references. References originate in:

- Identifier expressions inside value-position trees.
- Future: type references inside type-position trees (such as a field type that names a user-declared record).
- Future: call-site references inside callable bodies.

The IR provides a single reachability operation that walks symbols starting from the public roots and follows the declared references transitively. Downstream consumers use this operation to identify the set of symbols a public surface depends on. The current consumers are code generation targets, but any consumer that needs the same notion (linker, indexer, future tree-shaking analyses) shares it.

The reachability rule is consumer-neutral. The IR does not delete unreachable symbols; consumers choose how to act on the result.

## Effects

The IR carries an open set of effect tags that callable symbols may declare. The currently defined effects are `cancellable`, `failable`, and `asyncable`. Each is defined and documented in `codegen/model`.

Effect sets are ordered, deduplicated, and combined by union when a caller inherits a callee's effects. Downstream consumers may rely on the IR's effect set being canonical: equal effect sets compare equal positionally, with no duplicate members.

A function type carries its effect set as part of its identity (see [language/types.md](types.md)). A type declaration whose target is a function type therefore exposes the effect set through its target type. The IR has no callable value-level nodes yet, so an effect set today reaches targets only through type expressions; the EffectSet type is also reserved for future callable nodes that will record effects directly at IR construction time.

## Stability

The IR is an internal contract between checker and downstream consumers. Adding new value forms, new fields, or new normalization rules is a contract change and requires updating this document before or together with the implementation.
