Masterbelt

masterbelt/masterbelt

Master Data Validation

Synced from main@9490864MarkdownSource

#Master Data Validation

A master's validation section declares data quality checks that run over the master's records after import and filtering. Unlike a filter, a validator never drops a record: it inspects the post-filter dataset and emits diagnostics. Validation runs during masterbelt export before any artifact is written; an error-severity validation failure blocks the entire export.

#Surface Form

The validation section is an optional master body section. It contains one or more scope groups; each group contains one or more named validate rules whose bodies assert conditions:

Masterbelt
master Records {
  record { primary ID: int, Name: string, Value: int }

  validation {
    each {
      validate nameRequired {
        assert row.Name != ""
      }

      validate valuePositive {
        assert row.Value > 0
      }
    }

    all {
      validate checkValueSum {
        let total = 0
        for row in table {
          total = total + row.Value
        }
        assert total < 1000
      }
    }
  }
}
  • validate takes a stable identifier, not a message string. The identifier names the validator in project configuration (see Severity Configuration). It must be unique within a master across both each and all groups.
  • A single validate block may contain several assert statements. Each failed assert produces one diagnostic.
  • A validation block needs no return. It is a statement block that "passes" when it runs to completion. return inside a validation block is rejected (masterbelt.checker.return_in_validation).
  • assert is the validation primitive. It is only valid inside a validation block; the checker rejects assert elsewhere (masterbelt.checker.assert_outside_validation).

#Scopes

#each

An each group runs every rule once per final record. The current record is bound to two implicit names:

  • row — the record.
  • self — an alias for the same record.

Both have the master's record type. A rule that fails on one record continues to the next; a failed assert inside a rule does not stop later statements or asserts in the same rule.

#all

An all group runs every rule once per master over the whole post-filter record collection. The collection is bound to two implicit names:

  • table — the post-filter relation.
  • self — an alias for the same relation.

Both have type Relation<M> for the surrounding master M. A rule iterates the collection with for row in table (or for row in self) — iterating a relation yields its post-filter records in plan order — and may reach other masters' post-filter records through Master.toList(). Because the binding is a relation, an all rule may also apply the master's scopes and the relation stage operators to table or self.

#Execution Semantics

Validation runs in a deterministic order:

  1. Every source record is imported.
  2. Each master's filter is applied.
  3. The final record set for each master is built.
  4. Validators run in module and source declaration order. each rules run in source order, preserving the post-filter record order; all rules run in source order.
  5. Diagnostics are returned.

A validation rule body runs in the evaluator, not in generated code: validators are a build-time contract over the data, and no validation code is emitted into any target language.

#Failure Severity

A failed assert produces a masterbelt.validation.assert_failed diagnostic. Its severity defaults to error. Project configuration can override the severity per (master, validator) pair to warning; see Severity Configuration.

  • An error-severity failure blocks the export: no artifact is written.
  • A warning-severity failure is reported but does not block the export.

Records are never removed by validation: the failing record is preserved in the export (when the export proceeds).

An evaluation error inside a validation rule (an unbound reference, a runtime type error, a division by zero) is a hard error attributed to the rule, surfaced through the underlying evaluator diagnostic or wrapped as masterbelt.validation.evaluation_failed.

#Severity Configuration

Project configuration keys severity overrides by the entrypoint-visible master path and validator ID. See tooling/configuration.md for the schema. The path is the master as it is visible from the entry module — an aliased import uses the alias, and a nested master uses its dotted path — never the flattened codegen name.

YAML
validators:
  Records:
    nameRequired: warning
    valuePositive: error
  U.Friendships:
    uniquePair: warning

A master is only validated when it is reachable from the entry module:

  • A master declared in the entry module, or re-exported from it by a single pub import, is visible and validated under its entrypoint path.
  • A master that the entry module neither declares nor re-exports is out of scope: it has no config-visible name, so its validators do not run.
  • A master re-exported from the entry under more than one name is ambiguous: no single config path identifies it, so its validators do not run and the ambiguity is reported as masterbelt.validation.ambiguous_master, which blocks the export. (Two distinct masters re-exported under the same name are rejected earlier as masterbelt.resolver.duplicate_name.)

Only error and warning are accepted. Configuration is validated before validators run, so a typo is visible even when a master imported zero records:

  • A master path that matches no master is masterbelt.validation.config_unknown_master.
  • A validator ID that matches no rule under a known master is masterbelt.validation.config_unknown_validator.
  • A severity outside error / warning is masterbelt.validation.config_invalid_severity.

Each of these config diagnostics blocks the export.

#Diagnostics

CodeSeverityMeaning
masterbelt.validation.assert_failedconfigured (default error)An assert condition evaluated false. The span is the condition expression; args carry the master path, validator ID, scope, record description, and condition source text.
masterbelt.validation.evaluation_failederrorA validation rule body raised an evaluation error.
masterbelt.validation.config_unknown_mastererrorA validators config key names a master that does not exist.
masterbelt.validation.config_unknown_validatorerrorA validators config key names a validator that does not exist on a known master.
masterbelt.validation.config_invalid_severityerrorA validators severity is not error or warning.
masterbelt.validation.ambiguous_mastererrorA master is re-exported from the entry under more than one name, so no config path identifies it unambiguously.

For an each failure, the record is described by its primary key (the same convention used by filter exclusion diagnostics); for an all failure, the record description is <table>.

#Future Work

The MVP does not implement PowerAssert-style display of sub-expression values, custom validation messages, the info / hint severities, or target-language runtime validators. The evaluator boundary already retains each failed assertion's expression and span so PowerAssert-style reporting can be added without a surface change.

Specification