ADR 0022: Machine-Managed Value Integrity and AI Provenance
Accepted architecture decision record: a top-level integrity lock for machine-managed values plus a compact, engine-agnostic AI provenance field, superseding the structured machine-translation metadata.
- Status: Accepted (2.0.0)
- Date: 2026-06-30
- Supersedes: ADR 0018
Context
ADR 0018 added machine-translation metadata to catalog entries as four structured
fields (model, modified, confidence, hash), rendered in PO as a single
#@ ferrocat-mt model=… modified=… confidence=… hash=… comment and in FCL as
separate mt.model / mt.conf / mt.hash tags. Practice since then surfaced
three problems:
- Only two of the four fields carry weight inside Ferrocat. The only module
that reasons about machine translation,
catalog_review, looks at exactly: presence (is the value machine-set?) and the hash (does it still match the current translation, i.e. current vs stale).model,confidence, andmodifiedare never read by Ferrocat logic — they are provenance passed through for hosts and dashboards. - The hash is not really about translation. Its job is integrity: a
machine set this value, and a later by-hand edit should be detectable. A
translation memory system or any automation manages values the same way, so
binding the hash to a
mt.namespace is too narrow. - The encoding is cluttered and diverges between formats. Four
key=valuepairs in PO versus three tags in FCL is verbose and not uniform, andmodifiedis a timestamp that churns every regenerated line and poisons merges.
At the same time Ferrocat should position itself as AI-native: it should understand a small, engine-neutral provenance vocabulary across engines (Palamedes, a TMS, other AI providers), rather than treat the data as opaque host-specific bytes.
Decision
Model two top-level, engine-agnostic concepts that Ferrocat understands, and
drop the mt.-namespaced structured fields.
1. Integrity lock (machine-managed, translation-agnostic)
A top-level lock carries a fingerprint of the value at the time a machine set
it:
- present ⇒ the value was machine-managed (AI engine, TMS, script, …)
hash(current value) != lock⇒ a human edited it after the machine
lock=<hash>
This replaces the translation-specific hash and is no longer nested under a machine-translation concept.
2. AI provenance (optional, understood by Ferrocat)
A top-level ai field describes which engine produced the value and how
confident it was:
ai=<model>[:<confidence>]
ai=openai/gpt-5.5-high:0.93
ai=opus-4-8:0.97
ai=grok-4
modelis an opaque, free-form identifier. Whether it carries a provider prefix is the producer's choice; Ferrocat does not parse it.confidenceis optional and trails after the final:as a decimal in the closed range[0, 1].- Parsing mirrors the origin
file:lineheuristic: split on the last:, and treat the suffix as confidence only when it is a valid[0, 1]decimal; otherwise the whole string is the model. A model id that itself contains/or:therefore keeps its value intact, and the single free-form field never needs escaping because it is taken as the remainder.
modified is dropped. Staleness is detected by the lock hash, and a timestamp
only adds merge churn.
Encoding
Both fields use the same grammar in both formats; only the wrapper differs:
- FCL:
lock=<hash>andai=<model>[:<confidence>]tags - PO:
#@ lock: <hash>and#@ ai: <model>[:<confidence>]metadata comments, which reuse the standard#@ key: valueform and which gettext tooling ignores
greeting⇥⇥Hallo⇥lock=9f2c…⇥ai=openai/gpt-5.5-high:0.93
#@ lock: 9f2c…
#@ ai: openai/gpt-5.5-high:0.93
msgctxt "greeting"
msgid "Hello {name}"
msgstr "Hallo"Consequences
Positive:
- the representation is tidy and uniform across PO and FCL, and FCL stays tab-compact (two tags, not three plus a timestamp)
- integrity is decoupled from translation, so the same mechanism covers any machine-managed value (AI, TMS, scripts)
- the AI vocabulary is engine-neutral, which supports Ferrocat as an AI-native foundation for producers beyond Palamedes while staying host-neutral
catalog_reviewsimplifies: current/stale/absent keys onlock, and any AI-specific reporting (for example a confidence gate) keys onai
Negative:
- this is a breaking change to the machine-translation API and serialization
(
model,modified,confidence, and themt.*encoding are removed); it lands in the 2.0.0 line and supersedes ADR 0018 confidencechanges from a0..=100integer to a[0, 1]decimal- the model id is opaque, so Ferrocat cannot validate a provider or normalize ids
- the positional
aiencoding grows only by appending optional trailing positions; it deliberately trades keyed flexibility for compactness - it requires producer support (Palamedes and other engines) to populate the new fields, so the format change and the producers must land together
This refines the same line of thinking as ADR 0021 (catalog references carry only stable, churn-free information) and respects the Ferrocat/Palamedes boundary: Ferrocat owns a neutral vocabulary, producers fill it.