#Intermediate Representation
This document defines the currently implemented Masterbelt intermediate representation (IR).
The IR is intentionally minimal at this stage. Future IR additions must extend this document before or together with implementation changes.
#Purpose
The IR is the boundary between the type checker and downstream consumers. Code generators, evaluators, and indexes consume the IR rather than the AST so they share a single normalized program model.
#Normalization
An IR module reflects the following normalizations performed by lowering:
- Every type expression is resolved through type declarations. Aliases do not appear in the IR; uses are replaced with their resolved target type.
- Every literal value is decoded. Integer radix prefixes and digit separators are interpreted. String escape sequences are decoded. Source text is not retained.
- Collection literal values record items in source order. Map literal duplicate keys are deduplicated using last-wins semantics: the first occurrence's position is kept and its value is replaced by the last occurrence's value.
- Doc comments attached to declarations are preserved as text lines with the leading
///removed. - Other comments (
//,/* */) are not present in the IR.
#Projects
An IR project is the lowered form of an entire Masterbelt program rooted at one entrypoint.
- A project records the entrypoint's canonical path, every loaded module keyed by canonical path, and the module path order topologically sorted from no-dependency leaves to the entrypoint.
- Codegen targets walk the project's order to emit one output per module.
#Modules
An IR module is the lowered form of one Masterbelt source file.
- A module has a name carrying the source file's canonical path, a source span covering the whole file, an ordered list of constants, an ordered list of type declarations, and an ordered list of re-exports.
- Constants appear in the same order as their declarations in source. Grouped const declarations contribute one constant per item, in source order.
- Type declarations appear in the same order as their declarations in source. Each declaration carries its source identifier, its public flag, the resolved target type after substitution of any chain of declared types, and an ordered list of declared type parameters. For a non-generic declaration the parameter list is empty and the target is the final resolved type; for a generic declaration the parameter list names the declared parameters in source order and the target is the template body whose type variables refer to those parameters.
- Re-exports represent
pub { ... } from "..."declarations: each entry carries the local name visible in this module, the canonical path of the foreign module, and the foreign symbol name. Re-exports do not introduce new values or types; they make the foreign symbol available under the listed local name in this module's public surface. usedeclarations withoutpubdo not appear in the IR; their effect is to rewrite identifier references inside this module's expressions into cross-module references during lowering.
#Constants
A constant carries:
- A name string matching the source identifier.
- A public flag set by the
pubdeclaration modifier. - An ordered list of doc comment text lines preserved from the source.
- A checked type after declared-type resolution.
- A value matching that type.
- A source span covering the const item.
The constant's type and value are linked by construction: lowering is responsible for ensuring the value matches the declared type, including for union and generic types.
#Expressions
Every IR expression belongs to one of the following forms. The form is decided by the expression itself; consumers use a type switch to dispatch.
- Null expression. The single null literal.
- Bool expression. A decoded
trueorfalse. - Integer expression. A decoded integer literal. The internal representation is a signed 64-bit integer at this stage. The Masterbelt
inttype currently maps to the host language's natural integer type when generating code. Future integer width types will extend this representation. - String expression. A decoded string literal with escape sequences resolved.
- List expression. An ordered sequence of nested expressions. The element types are determined by the surrounding constant's type.
- Map expression. An ordered sequence of key/value entries. After last-wins deduplication, no two entries have equal keys.
- Product expression. An ordered sequence of named field initializers. Field initializers preserve the source order of the literal. The field name strings match the field names of the surrounding constant's product type; every field declared by that type appears exactly once. Field types are determined by the surrounding constant's type.
- Reference expression. A reference to another constant. The expression carries the referent's module and name; when the module field is empty the reference targets a constant declared earlier in the same module, and when it names a foreign module the reference targets a public symbol of that module. Same-module forward references are rejected during checking; cross-module references resolve through the import system.
Expressions do not carry their own type. Consumers derive types from the containing constant's declared type by walking it together with the expression tree, and from referenced constants when the expression is a reference.
An expression's source span identifies the source it was lowered from. This makes diagnostics emitted by downstream phases attributable to user source.
#Lowering
Lowering is the phase that produces an IR module from a checked source file. Its input is a checker result for one file; its output is one IR module and a list of diagnostics introduced by lowering itself.
Lowering operates only on a checker result whose declarations have already been type checked. The set of diagnostics emitted by previous phases is not re-emitted by lowering; lowering appends only diagnostics that are first detected at this phase.
#Module Identity
The lowered module's name is the source file's name. Its source span is the source file's span. The constants slice mirrors source order of declarations.
#Const Declarations
Each const item from each const declaration contributes one constant to the IR module, in source order. The constant carries:
- The source identifier name.
- The public flag of the enclosing const declaration. All items in a
pub const ( ... )group are public. - The checked type recorded by the checker for the item's binding. Type declarations have been resolved to their target type.
- The doc comment lines attached to the const item, in source order, with the leading
///removed. When a const declaration has a leading doc comment and items also have their own doc comments, the declaration's lines precede each item's lines. - The lowered expression value.
- The source span of the const item.
A const item whose checker-recorded type is the invalid type is not contributed to the IR. Its diagnostics have already been emitted by previous phases.
#Type Declarations
A type declaration contributes one entry to the module's TypeDeclarations list. The entry preserves the declaration's source identifier, public flag, doc comments, declared type parameters, and the target type produced by checker-time substitution. The declared name is retained so downstream consumers can re-introduce it in target output even though the constant lowerings carry the resolved target type rather than the declared surface name.
#Anonymous Product Hoisting
Anonymous product types written inline inside a declared body (for example type Monster = { skills: list<{...}> }) are normalized during lowering by hoisting each one to its own TypeDeclaration. The synthetic declaration's name is derived from the path that reaches the anonymous product: the owner's declared name followed by the PascalCase form of each field, with Key and Value suffixes for map argument positions. Anonymous products nested inside an already-hoisted declaration recurse with the synthesized name as the new owner so the path stays scoped to a single declaration tree. Synthesized names that would collide with an existing top-level name receive a numeric suffix.
A synthesized declaration inherits the public flag of the owner so callers across modules can reach it, and carries the subset of the owner's type parameters that the hoisted body references. The original use site is rewritten to apply those parameters back, keeping the generic shape intact.
After hoisting, the IR contains only named product types. Codegen targets see the synthesized declarations as ordinary TypeDeclarations and emit them through their normal product-type code path.
#Literal Expressions
nulllowers to a Null expression.trueandfalselower to Bool expressions carrying the decoded value.- An integer literal lowers to an Int expression carrying the decoded signed 64-bit value. The literal's recorded base (2, 8, 10, or 16) and digit text drive the decode. A literal whose magnitude does not fit in a signed 64-bit integer is reported as integer out of range and the item is not contributed to the IR.
- A string literal lowers to a String expression carrying its already decoded value.
#Collection Expressions
- A list literal lowers to a List expression whose items are the lowered element expressions in source order.
- A map literal lowers to a Map expression whose entries are the lowered key/value pairs in source order, after applying last-wins deduplication: for each duplicate key, the position of the first occurrence is preserved and the value of the last occurrence replaces all earlier values for that key. Key equality is structural over already-lowered key expressions.
- An empty collection literal lowers to an empty List or empty Map according to the surrounding constant's checked type after declared-type resolution.
- A product literal lowers to a Product expression whose field initializers are the lowered values in source order. The product type used to type-check each field value is taken from the literal's typed prefix when present and otherwise from the surrounding constant's annotation after declared-type resolution. Field name uniqueness, missing fields, and unknown fields have already been validated by the checker.
#Identifier References
An identifier reference in expression position lowers to a Reference whose name matches the referent's source identifier. The checker has already established that the referent is a const item declared earlier in source; the IR guarantee that references point only backward in the constants slice follows from that.
The IR does not eagerly inline references. Consumers that need a reference's value walk to the referent constant.
#For Statements
A for statement reaches the IR as a control-flow node nested inside a function body. The IR distinguishes three subject shapes so codegen can pick the matching native loop without re-deriving them.
A for statement carries:
- A subject shape —
list,map, orrange. - For
listandmapshapes, the lowered subject expression and the subject's checked element type(s) after declared-type resolution. - For the
rangeshape, the lowered start and end expressions (the IR records the recognized counted form so codegen emits a counted loop directly). - One or two bindings depending on the shape; each binding carries a name (an empty name signals a
_skip), the binding's checked type, and a source span. Map subjects always carry two bindings in(key, value)order. - A lowered function block for the loop body.
- A source span covering the whole statement.
Break and continue statements lower as bare nodes with only a source span. The IR does not resolve which loop they target; consumers walk the surrounding statement tree to attach them to the innermost loop the same way the checker did.
The range(start, end) recognition is performed during lowering and is purely a representation choice: the IR's list<int> element type is unchanged, so consumers that ignore the counted-form flag can still treat the subject as a list and remain correct.
#Match Statements
A match statement reaches the IR as a control-flow node nested inside a function body. The IR preserves the surface ordering of arms and the bindings introduced by each pattern.
A match statement carries:
- The lowered subject expression.
- The subject's checked type after declared-type resolution.
- An ordered list of arms, in source order.
- A source span covering the whole statement.
A match arm carries:
- An ordered list of one or more lowered patterns (the
|-separated alternatives at one arm position). - An optional lowered guard expression.
- An ordered list of bindings introduced by the pattern. Each binding carries a name, the binding's checked type, and the binding's source span.
- A lowered function block for the arm body.
- A source span covering the whole arm.
A match pattern is one of:
- A type pattern carrying the matched type after declared-type resolution and an optional binding entry referenced by name.
- An enum pattern carrying the enum type and the variant name.
- A literal pattern carrying the lowered literal expression.
- A product pattern carrying the matched product type, an ordered list of field sub-patterns (each with the field name and a nested pattern), and an optional whole-value binding entry referenced by name.
- A wildcard pattern carrying no payload.
The implicit identifier narrowing described in types.md is normalized during lowering: a subject expression that is a plain identifier and a pattern without an explicit binding produce an implicit binding entry whose name is the identifier's name and whose type is the narrowed type. Downstream consumers therefore see explicit bindings on every arm where a narrowed local would be reachable.
Exhaustiveness, reachability, and binding-set agreement across alternatives are enforced by the checker; the IR records the arms verbatim and does not synthesize a catch-all arm.
#Master Validation
A master's validation section reaches the IR as a MasterValidation value. Its surface form, scoping, and execution model are defined in masterdata/validation.md; the IR records the lowered rules so the validation evaluator can run them at export time.
A MasterValidation carries:
Master— the flattened codegen name of the owning master (the name used by code-generation output, for exampleUserFriendships).VisiblePath— the module-local dotted source path of the master (for exampleUser.Friendships). The export driver rewrites the top segment of this path through the entry module's re-export aliases to obtain the entrypoint-visible path that validator configuration keys against (so an aliased re-exportpub { User as U }yieldsU.Friendships).Rules— an ordered list ofMasterValidationRulevalues in source order.
A MasterValidationRule carries:
Scope— the rule's scope, one ofMasterValidationEachorMasterValidationAll.Name— the validator's stable identifier, unique within the master across both scopes.Body— the lowered statement block.
The implicit bindings the evaluator supplies inside a rule body are determined by the scope rather than recorded as explicit IR bindings:
- For an
eachrule,rowandselfare bound to one post-filter record; both have the master's record type. - For an
allrule,tableandselfare bound to the master's post-filter relation; both have typeRelation<M>, and iterating it yields the post-filter records in plan order. A rule may apply the master's scopes and stage operators to that relation. selfis bound to the same value asrow(ineach) ortable(inall).
#Master Scopes
A master's scope declarations reach the IR as an ordered list of MasterScope values on the owning master. The surface form is defined in masterdata/schema.md; the IR records each scope so code generation can emit a relation method and the SQLite exporter can infer secondary indexes.
A MasterScope carries:
Master— the flattened codegen name of the owning master.Name— the scope name, unique within the master.Public— thepubflag controlling generated-API exposure.Indexed— theindexedflag enabling SQLite index inference.Parameters— the lowered parameter list, in source order, with the same shape a lowered function uses.Body— the lowered scope body (a statement block or an arrow expression) whose result is the master's relation.
#Query Plan Shape
To support both code generation and index inference, the lowered body exposes the scope's relation plan: the ordered stages applied to self. Each stage is one of:
Where(predicate)— a predicate tree whose leaves carry aFieldRef(the record field's source name), the operator (eq,ne,lt,le,gt,ge,in,between), and the operand (a literal, a scope-parameter reference, or a referenced const/static). Interior nodes areand/or/not.OrderBy(ordering)/ThenBy(ordering)— aFieldRefplus a direction (asc/desc).Skip(n)/Take(n)— a count operand.
A scope call inside a body inlines the callee scope's stages into the caller's plan (with the callee's parameters bound to the call arguments), so the plan is self-contained and free of scope-call nodes. The plan order preserves source order across where / orderBy / thenBy / skip / take and across inlined chains. This is the representation the SQLite index inference reads; index inference is performed on the lowered plan, not on source syntax.
#Assert Statement
An assert statement lowers to an Assert node carrying:
Condition— the lowered condition expression.ExprText— the condition's source text.
The failed assertion's expression and source span are retained rather than collapsed to a boolean result. This lets a future PowerAssert-style reporter display sub-expression values without a surface or IR change; the MVP evaluator uses only the boolean outcome.
#Symbols and Visibility
A top-level IR declaration is called a symbol. Every symbol has a name and a public flag derived from the source program's pub modifier. The set of symbol kinds is open: constants exist today; future kinds such as records, methods, callables, and type declarations will extend the same abstraction.
Every symbol declares which other symbols it references. References originate in:
- Identifier expressions inside value-position trees.
- Future: type references inside type-position trees (such as a field type that names a user-declared record).
- Future: call-site references inside callable bodies.
The IR provides a single reachability operation that walks symbols starting from the public roots and follows the declared references transitively. Downstream consumers use this operation to identify the set of symbols a public surface depends on. The current consumers are code generation targets, but any consumer that needs the same notion (linker, indexer, future tree-shaking analyses) shares it.
The reachability rule is consumer-neutral. The IR does not delete unreachable symbols; consumers choose how to act on the result.
#Effects
The IR carries an open set of effect tags that callable symbols may declare. The currently defined effects are cancellable, failable, and asyncable. Each is defined and documented in codegen/model.
Effect sets are ordered, deduplicated, and combined by union when a caller inherits a callee's effects. Downstream consumers may rely on the IR's effect set being canonical: equal effect sets compare equal positionally, with no duplicate members.
A function type carries its effect set as part of its identity (see language/types.md). A type declaration whose target is a function type therefore exposes the effect set through its target type. The IR has no callable value-level nodes yet, so an effect set today reaches targets only through type expressions; the EffectSet type is also reserved for future callable nodes that will record effects directly at IR construction time.
#Stability
The IR is an internal contract between checker and downstream consumers. Adding new value forms, new fields, or new normalization rules is a contract change and requires updating this document before or together with the implementation.