Skip to content

API Overview

Guide to Ferrocat's public API layers and when to use each entry point.

Ferrocat's public API is organized around jobs, not around command-line tool names. Start from the catalog problem you need to solve:

  • read or write translator-friendly catalog files
  • merge newly extracted messages into existing translations
  • keep AI translation metadata next to translated catalog entries
  • audit whether a source and target catalog set is ready to ship
  • compile runtime payloads with stable keys and explicit fallback behavior
  • analyze richer messages that contain placeholders, formatting, plurals, selects, or tags

Use this page when you know the catalog task you want to perform and need the right Rust entry point.

If you want the product-level view first, read Getting Started or Catalog Modes. If you already know established Gettext-style tooling and want to map those jobs to Ferrocat, read Gettext Task Landscape.

If you want the application-framework view, read Ferrocat And Palamedes. Palamedes owns JS/TS extraction, macros, framework adapters, and runtime integration; Ferrocat owns the catalog semantics those layers should share.

Feature Profiles

The published crates keep the default feature set convenient: ferrocat, ferrocat-po, and ferrocat-icu build the full documented API by default.

Lean parser consumers can opt out with default-features = false. For ferrocat-po, that leaves the PO parser, borrowed parser, serializer, string escape helpers, and low-level merge API available without pulling in SHA-256 hashing, tempfile, or ICU4X CLDR plural data. Add the serde feature when that lean parser output should be serialized for caches or tooling.

See ADR 0020 for why the 1.x package boundary keeps the catalog layer in ferrocat-po behind feature profiles instead of moving it to a separate crate.

The named feature profiles are:

  • full: the default profile with the complete current API surface
  • catalog: high-level catalog update, parse, combine, audit, metadata, FCL, plural, and runtime compile support
  • serde: serde implementations used by schema, cache, and tooling-oriented consumers; it can be enabled without the full catalog profile for low-level PO document types
  • compile, mt, fcl, and plurals: named subsystem profiles reserved for callers that want to declare intent explicitly

docs.rs builds with all features enabled so the rendered reference shows the full API surface.

Serialized API Shapes

Enable the serde feature when Ferrocat output needs to cross a process boundary, feed CI, or back a tooling cache.

Serializable public output includes:

  • low-level owned PO documents: PoFile, PoItem, Header, and MsgStr
  • high-level CatalogMessage values and general Diagnostic records
  • CatalogAuditReport plus its summary and diagnostic records
  • compiled runtime catalog and compiled-ID report records
  • CompiledCatalogArtifact

CompiledCatalogArtifact is the host-facing runtime payload and has a versioned JSON contract. Serialized artifacts include schema_version: 1, messages, missing, and diagnostics. Deserialization rejects unknown artifact schema versions instead of guessing semantics.

Audit reports intentionally use the serde form of the Rust report API rather than a separate schema version. They are stable enough for CI consumption today: diagnostics carry machine-readable code values, and severities serialize as info, warning, or error.

See ADR 0019 for the compatibility rules around the compiled artifact JSON contract.

API Compatibility

Growth-prone public enums are deliberately marked as non-exhaustive where new variants are realistic, so new variants can be added without a breaking change within the 1.x line. Downstream code should keep fallback match arms for ICU AST nodes, catalog storage formats, compiled key strategies, and API-level error categories.

Semantically closed option enums such as CatalogSemantics and PluralEncoding remain exhaustive so callers can continue to match every documented mode directly.

Umbrella Crate Namespaces

When depending on the umbrella ferrocat crate, prefer the namespaced modules for new code:

  • ferrocat::po for low-level PO parsing, serialization, and text merge APIs
  • ferrocat::catalog for high-level catalog update, audit, combine, and runtime artifact APIs
  • ferrocat::icu for ICU MessageFormat parsing, analysis, compatibility, and metadata APIs

The historical top-level re-exports remain for compatibility. New examples use the modules because they avoid collisions such as source extraction inputs for catalog updates versus inputs for the PO merge helper. The legacy top-level has_selectordinal export is deprecated; use has_select_ordinal or ferrocat::icu::has_select_ordinal. The legacy top-level MergeExtractedMessage export is also deprecated; use ferrocat::po::MergeMessageInput.

Supported Catalog Modes

At the high-level catalog layer, ferrocat supports three explicit combinations of storage format and message semantics:

ModeStorage formatMessage model
CatalogMode::GettextPoGettext POGettext-compatible plurals
CatalogMode::IcuPoGettext POICU MessageFormat
CatalogMode::IcuFclFCL catalog storageICU MessageFormat

There is intentionally no FCL + gettext-plural mode; gettext plural behavior belongs to PO, while FCL is the ICU-native machine storage path. In API terms, that means CatalogStorageFormat::Fcl is only available with CatalogSemantics::IcuNative.

FCL (Ferrocat Catalog Lines) is the machine-owned, git-merge-optimized storage choice. Each entry is one tab-separated line, deterministically sorted by (id, ctxt), behind a single %FCL1 header line. It is useful when catalogs are maintained by larger teams, automation, or external systems that benefit from one entry per line: diffs stay focused, unchanged entries stay byte-identical across merge inputs for clean git 3-way merges, and hosted Git review workflows do not need custom merge-driver support to make routine catalog edits manageable. Against the same catalog stored as PO, FCL parses about 45% faster and is roughly 12% smaller on disk.

Quick Choice

If you want to...Use
Parse a Gettext PO file into an owned Rust structureparse_po
Parse a Gettext PO file while borrowing from the input string where possibleparse_po_borrowed
Turn a PoFile back into Gettext PO textstringify_po
Merge fresh extracted Gettext messages into an existing Gettext PO filemerge_catalog
Combine multiple catalogs with deterministic conflict and selection rulescombine_catalogs
Read a Gettext PO or FCL catalog into the higher-level canonical catalog modelparse_catalog
Build keyed lookup/helpers on top of a parsed catalogParsedCatalog::into_normalized_view
Audit source and target catalogs for release readinessaudit_catalogs
Summarize per-locale catalog completenesscatalog_coverage
Compare catalog states for translator review handoffscatalog_review
Derive the default stable runtime key from msgid and msgctxtcompiled_key
Compile a normalized catalog into runtime lookup entriesNormalizedParsedCatalog::compile
Compile a requested-locale runtime artifact with fallbacks and missing reportscompile_catalog_artifact
Compile only a selected subset of compiled runtime IDscompile_catalog_artifact_selected
Pseudolocalize final runtime artifacts for UI/layout QApseudolocalize_compiled_catalog_artifact, pseudolocalize_compiled_catalog_artifact_with_syntax_policy
Compile a runtime artifact with a sibling resolution-provenance reportcompile_catalog_artifact_report
Perform a full in-memory catalog updateupdate_catalog
Perform a full catalog update and write the result to disk only when changedupdate_catalog_file
Compute the metadata hash for a machine-generated translationmachine_translation_hash
Parse ICU MessageFormat into a structural ASTparse_icu
Render a parsed ICU AST back to canonical MessageFormat textstringify_icu
Only validate ICU syntaxvalidate_icu
Summarize ICU arguments, formatters, plurals, selects, and tagsanalyze_icu
Compare source and translated ICU message structurecompare_icu_messages
Validate discovered ICU formatters against a runtime support policyvalidate_icu_formatter_support
Extract only data argument names or only tag namesextract_argument_names / extract_tag_names
Extract variable names from a parsed ICU messageextract_variables

Gettext PO Core

parse_po

Use this when you want the normal, owned Rust representation of a Gettext PO file.

Typical use cases:

  • application code that wants a straightforward editable PoFile
  • transforms that keep parsed data around beyond the source input lifetime
  • tools where simplicity matters more than minimizing allocations

The parser treats non-empty lines that are neither PO comments, known PO keywords, nor valid continuation strings as syntax errors. This avoids silently dropping typo-like lines such as msgstr_ "...".

parse_po_borrowed

Use this when you want to parse without copying more than necessary.

Typical use cases:

  • read-heavy workflows
  • performance-sensitive inspection or transformation passes
  • benchmarks or pipelines where borrowing from the source text is helpful

parse_po_borrowed accepts LF, CRLF, and bare CR line endings while keeping unescaped fields borrowed from the input buffer.

stringify_po

Use this when you already have a PoFile and want canonical Gettext PO output.

Typical use cases:

  • writing back modified parsed files
  • generating PO content from your own tooling
  • normalizing formatting after edits

Catalog Workflows

The high-level catalog request structs are now intentionally borrowing-first:

  • string inputs such as catalog text and locales are accepted as &str
  • selected compiled IDs and fallback chains are accepted as borrowed slices
  • file-oriented updates accept &Path

That keeps the API ergonomic for callers while avoiding avoidable request-side allocation and clone pressure before the real catalog work even starts.

For the common required fields, the option structs also provide new(...) constructors. They set the required content/path/locale/input fields and leave the rest at the documented defaults, which avoids starting from intentionally invalid empty defaults in normal call sites.

merge_catalog

Use this for the basic Gettext-style merge step:

  • start from an existing Gettext PO catalog
  • feed in freshly extracted messages
  • keep matching translations
  • add new entries
  • mark removed entries as obsolete

This is the closest Ferrocat equivalent to the core "merge updated template/messages into an existing catalog" workflow that users often associate with GNU msgmerge.

Choose merge_catalog when you want the leaner, more direct merge operation and already have data in classic Gettext-like shapes.

In practice this is the fast path workflow API: it stays close to classic Gettext PO merge behavior and avoids the extra canonical catalog projection and post-processing done by update_catalog.

Ferrocat assumes exact catalog identity here: msgid plus optional msgctxt. That works for classic ID-style catalogs and for projects that use real product copy as msgid. Matching messages keep translations, new messages are added, and removed messages follow the obsolete strategy. Fuzzy matching is intentionally outside the default workflow.

The obsolete strategy is Mark, Delete, Keep, or DropObsoleteBefore(cutoff) for age-based cleanup: entries newly marked obsolete are stamped with UpdateCatalogOptions::now (a host-provided ISO date — Ferrocat never reads a clock), and DropObsoleteBefore removes obsolete entries whose since predates the cutoff. ISO dates compare lexicographically, so the host computes the cutoff and Ferrocat stays deterministic (see ADR 0025).

use std::borrow::Cow;

use ferrocat::po::{MergeMessageInput, merge_catalog};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let existing = "msgid \"Hello\"\nmsgstr \"Hallo\"\n";
    let extracted = [MergeMessageInput {
        msgid: Cow::Borrowed("Hello"),
        references: vec![Cow::Borrowed("src/app.rs:10")],
        ..MergeMessageInput::default()
    }];

    let merged = merge_catalog(existing, &extracted)?;
    assert!(merged.contains("msgstr \"Hallo\""));
    assert!(merged.contains("#: src/app.rs:10"));
    Ok(())
}

combine_catalogs

Use this when you have multiple existing catalogs or templates and want one deterministic output catalog.

This is the Rust-native counterpart to the useful parts of GNU msgcat, msgcomm, and msguniq:

  • combine N catalogs in one call
  • treat msgid plus msgctxt as the message identity
  • keep the first translation by default, so existing catalogs can be listed before newer templates
  • choose UseLast or Error when overlay or strict-conflict behavior is more appropriate
  • preserve non-empty translations when a later definition only has an empty template value
  • select all, common, less-common, or unique identities with CatalogCombineSelection
  • skip obsolete definitions by default, with an explicit opt-in when obsolete entries should participate

Ferrocat does not emit GNU-style conflict-marker translations for differing strings. Empty template translations fill gaps but never clear non-empty translations. Non-empty translation conflicts are either resolved with a warning diagnostic or rejected with ApiError::Conflict, depending on CatalogConflictStrategy.

use ferrocat::{CatalogCombineInput, CombineCatalogOptions, combine_catalogs};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let base = "msgid \"Checkout\"\nmsgstr \"Zur Kasse\"\n";
    let template = "msgid \"Checkout\"\nmsgstr \"\"\n\nmsgid \"Cart\"\nmsgstr \"\"\n";
    let inputs = [
        CatalogCombineInput::labeled(base, "de.po"),
        CatalogCombineInput::labeled(template, "messages.pot"),
    ];

    let combined = combine_catalogs(CombineCatalogOptions::new(&inputs, "en"))?;
    assert!(combined.content.contains("msgstr \"Zur Kasse\""));
    assert!(combined.content.contains("msgid \"Cart\""));
    Ok(())
}

combine_catalog_files

Use this when the inputs and output are files on disk, but you want the same deterministic combine semantics as combine_catalogs.

The helper reads input paths in precedence order, infers a single file format from the input and output paths unless one is supplied explicitly, applies the same semantics as combine_catalogs, and atomically replaces the output path only after validation, parsing, and combining succeed. That means unsupported formats, mixed inferred formats, read failures, parse errors, and rejected conflicts leave the existing output file unchanged.

Supported inferred suffixes are:

  • .po for CatalogFileFormat::Po
  • .fcl for CatalogFileFormat::Fcl

The file helper returns the output path, the file format used, combine stats, and non-fatal diagnostics. It supports one or more input files, so two-file host workflows can stay thin while larger overlay workflows do not need a separate wrapper.

When no mode is provided, CatalogFileFormat::Po uses CatalogMode::IcuPo and CatalogFileFormat::Fcl uses CatalogMode::IcuFcl. PO callers that need classic gettext plural semantics can set mode: Some(CatalogMode::GettextPo).

parse_catalog

Use this when you want more than raw Gettext PO syntax. It projects a Gettext PO or FCL catalog into ferrocat's higher-level catalog model, with explicit storage and semantics choices.

Choose this when your application wants semantic catalog data rather than just PO syntax nodes.

ParseCatalogOptions borrows the source text and locale strings, so you can parse directly from existing buffers without first building owned request strings.

High-level catalog parsing should normally choose one CatalogMode instead of setting storage, semantics, and plural encoding separately:

  • CatalogMode::IcuPo for ICU-native messages stored in PO
  • CatalogMode::IcuFcl for ICU-native messages stored as a %FCL1 header plus one tab-separated entry per line
  • CatalogMode::GettextPo for classic Gettext-compatible plurals stored in PO

The lower-level storage_format, semantics, and plural_encoding fields remain available for compatibility, but CatalogMode keeps the three choices synchronized.

Important boundaries:

  • CatalogSemantics::IcuNative only supports PluralEncoding::Icu
  • CatalogSemantics::GettextCompat only supports PluralEncoding::Gettext
  • CatalogStorageFormat::Fcl is available only with CatalogSemantics::IcuNative
  • native parsing no longer eagerly projects top-level ICU plurals into TranslationShape::Plural

That gives you exactly three supported modes:

  • classic Gettext catalog mode: Gettext PO + GettextCompat
  • ICU-native Gettext PO mode: Gettext PO + IcuNative
  • ICU-native FCL catalog mode: FCL + IcuNative

parse_catalog intentionally stays as the neutral parse step. If you want keyed lookups or effective-translation helpers, build a richer view explicitly with ParsedCatalog::into_normalized_view().

The normalized view supports both owned-key lookup with CatalogMessageKey and borrowed identity lookup with get_by_parts(msgid, msgctxt), which is useful in host adapters that already have borrowed source strings.

use ferrocat::{ParseCatalogOptions, parse_catalog};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let catalog = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Zur Kasse\"\n", "en")
    })?;

    let view = catalog.into_normalized_view()?;
    assert!(view.contains_parts("Checkout", None));
    Ok(())
}

NormalizedParsedCatalog::compile

Use this when you want a runtime-facing lookup structure with stable compiled keys rather than raw gettext identities.

This sits one layer above parsed catalog lookup:

  • start with parse_catalog
  • build the normalized keyed view
  • compile to CompiledCatalog for runtime-oriented consumption

The default behavior keeps translations as they exist in the catalog. Optional source-locale fallback is explicit rather than automatic.

The built-in CompiledKeyStrategy::FerrocatV1 contract is intentionally compact:

  • SHA-256 over a versioned source-identity payload
  • truncated to 64 bits
  • encoded as unpadded Base64URL
  • no visible version prefix in the emitted key
  • hard compile failure on collisions

audit_catalogs

Use this when you want a read-only catalog QA report before a release or CI gate. The audit API accepts normalized catalogs, a required source locale, an optional target-locale filter, optional semantic metadata, and explicit check flags through CatalogAuditOptions.

Default checks cover:

  • source locale presence
  • requested target locale presence
  • missing and empty target translations
  • target-only active messages that look stale
  • obsolete entries
  • ICU syntax in active source and target messages
  • ICU source/translation compatibility
  • semantic metadata duplication, unknown source keys, and source conflicts

ICU syntax checks use IcuSyntaxPolicy::Strict by default. Hosts whose runtime accepts ordinary literal apostrophes in messages can call audit_catalogs_with_icu_options with CatalogAuditIcuOptions set to IcuSyntaxPolicy::RuntimeLiteralApostrophes; this keeps real ICU syntax errors visible while accepting runtime-valid strings such as you're available.

Diagnostics are machine-readable. Examples include:

use ferrocat::{CatalogAuditOptions, ParseCatalogOptions, audit_catalogs, parse_catalog};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let source = parse_catalog(ParseCatalogOptions {
        locale: Some("en"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Checkout\"\n", "en")
    })?
    .into_normalized_view()?;
    let target = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"\"\n", "en")
    })?
    .into_normalized_view()?;

    let report = audit_catalogs(&[&source, &target], &CatalogAuditOptions::new("en"))?;
    assert!(report.has_errors());
    Ok(())
}
{
  "severity": "error",
  "code": "catalog.missing_translation",
  "message": "Locale `de` is missing translation for `Hello {name}`."
}
{
  "severity": "warning",
  "code": "catalog.extra_translation",
  "message": "Locale `de` contains translation `Old CTA` that is not present in the source catalog."
}
{
  "severity": "info",
  "code": "catalog.obsolete_entry",
  "message": "Message `Checkout` in locale `de` is obsolete."
}

ICU and metadata checks reuse the stable icu.* and metadata.* diagnostic codes emitted by the underlying validation helpers. The audit layer does not compile runtime artifacts, infer fuzzy matches, use previous_msgid, or repair catalogs.

For CI systems that should not maintain a Rust wrapper program, install the ferrocat-cli package and run the audit gate directly:

ferrocat audit \
  --source-locale en \
  --source locales/en.po \
  --target de=locales/de.po \
  --format text

ferrocat audit exits with 0 when the audit has no error diagnostics, 1 when the audit completes with at least one error diagnostic, and 2 for usage, I/O, parse, or serialization failures. Use --format json to write the structured CatalogAuditReport.

Use ferrocat_po::diagnostic_codes when CI, editor integrations, or release gates need canonical code strings without hard-coding literals:

use ferrocat_po::{CatalogAuditOptions, audit_catalogs, diagnostic_codes};

let report = audit_catalogs(&[&source, &target], &CatalogAuditOptions::new("en"))?;
let has_missing = report
    .diagnostics
    .iter()
    .any(|diagnostic| diagnostic.code == diagnostic_codes::catalog::MISSING_TRANSLATION);

catalog_coverage

Use this when you need per-locale completeness counters for dashboards, CI thresholds, or translator handoff summaries without rebuilding Ferrocat's catalog-state rules in application code.

The coverage API accepts normalized catalogs and a required source locale. The source locale's active, non-obsolete identities define the expected set for each target locale. Each target message is classified with the same shared status taxonomy used by catalog audit:

  • translated: active target entry with a non-empty translation
  • missing: no active or obsolete target entry exists
  • empty: active target entry exists, but its effective translation is empty
  • obsolete: target entry exists for the source identity, but is obsolete
  • extra: active target entry is not present in the active source set

Only translated messages count toward completion. Empty, obsolete, and absent entries remain visible as separate counters so hosts can choose their own threshold and presentation rules.

use ferrocat::{
    CatalogCoverageOptions, ParseCatalogOptions, catalog_coverage, parse_catalog,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let source = parse_catalog(ParseCatalogOptions {
        locale: Some("en"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Checkout\"\n", "en")
    })?
    .into_normalized_view()?;
    let target = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Zur Kasse\"\n", "en")
    })?
    .into_normalized_view()?;

    let report = catalog_coverage(&[&source, &target], &CatalogCoverageOptions::new("en"))?;
    assert_eq!(report.locales[0].completion_percent(), 100.0);
    Ok(())
}

Set CatalogCoverageOptions::with_details(true) when callers also need per-message rows. File discovery, threshold failures, JSON/table formatting, and pseudo-locale filtering stay outside the core API.

catalog_review

Use this when a host needs a deterministic translator handoff or CI review report between two normalized catalog states. The API compares previous and current catalog sets, using msgctxt + msgid as the canonical identity.

The report includes:

  • source identity additions and removals
  • per-locale current coverage using the CatalogMessageStatus taxonomy
  • active target translations whose effective value changed
  • machine-translation metadata freshness based on machine_translation_hash

Rename detection is intentionally out of scope: if a source identity changes, Ferrocat reports a removal plus an addition instead of guessing intent.

use ferrocat::{
    CatalogReviewOptions, ParseCatalogOptions, catalog_review, parse_catalog,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let previous_source = parse_catalog(ParseCatalogOptions {
        locale: Some("en"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Checkout\"\n", "en")
    })?
    .into_normalized_view()?;
    let current_source = parse_catalog(ParseCatalogOptions {
        locale: Some("en"),
        ..ParseCatalogOptions::new(
            "msgid \"Checkout\"\nmsgstr \"Checkout\"\n\nmsgid \"Cancel\"\nmsgstr \"Cancel\"\n",
            "en",
        )
    })?
    .into_normalized_view()?;
    let previous_de = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Zur Kasse\"\n", "en")
    })?
    .into_normalized_view()?;
    let current_de = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Checkout\"\n", "en")
    })?
    .into_normalized_view()?;

    let report = catalog_review(
        &[&previous_source, &previous_de],
        &[&current_source, &current_de],
        &CatalogReviewOptions::new("en").with_details(true),
    )?;
    assert_eq!(report.summary.source_added, 1);
    assert_eq!(report.summary.translation_changed, 1);
    Ok(())
}

Set CatalogReviewOptions::with_details(false) when callers need summary-only output without large per-message vectors. Git history lookup, PR comments, repository layout, and reviewer UI remain caller responsibilities.

compiled_key

Use this when a host adapter, source transform, or manifest builder needs the same default runtime key that Ferrocat emits during catalog compilation, but only has msgid and optional msgctxt available.

This is the public, host-facing helper for the current default key contract. It corresponds to CompiledKeyStrategy::FerrocatV1.

compile_catalog_artifact

Use this when you want the final host-neutral runtime artifact for one requested locale instead of one catalog's typed lookup payload.

This sits one step above NormalizedParsedCatalog::compile:

  • start from one or more normalized catalogs
  • choose a requested locale and source locale
  • optionally provide a fallback chain
  • compile a final key -> ICU string runtime map
  • collect missing-message records for non-source locales
  • validate the final runtime strings as ICU messages

Choose this when your downstream tooling needs locale resolution semantics centralized in Ferrocat instead of rebuilding them in a host adapter.

This is the main boundary for Palamedes-style integrations. Ferrocat should decide the effective locale artifact; Palamedes or another host adapter should decide how that artifact becomes framework modules, sidecars, or runtime payloads.

CompileCatalogArtifactOptions borrows locale strings and the fallback-chain slice, which keeps host-side request assembly cheap even when compilation is performance-sensitive.

use ferrocat::{
    CompileCatalogArtifactOptions, ParseCatalogOptions, compile_catalog_artifact, parse_catalog,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let source = parse_catalog(ParseCatalogOptions {
        locale: Some("en"),
        ..ParseCatalogOptions::new(
            "msgid \"Checkout\"\nmsgstr \"Checkout\"\n\nmsgid \"Cart\"\nmsgstr \"Cart\"\n",
            "en",
        )
    })?
    .into_normalized_view()?;
    let german = parse_catalog(ParseCatalogOptions {
        locale: Some("de"),
        ..ParseCatalogOptions::new("msgid \"Checkout\"\nmsgstr \"Zur Kasse\"\n", "en")
    })?
    .into_normalized_view()?;

    let options = CompileCatalogArtifactOptions {
        source_fallback: true,
        ..CompileCatalogArtifactOptions::new("de", "en")
    };
    let artifact = compile_catalog_artifact(&[&source, &german], &options)?;
    assert_eq!(artifact.messages.len(), 2);
    assert_eq!(artifact.missing.len(), 1);
    Ok(())
}

Important semantics:

  • only non-obsolete messages participate in artifact compilation
  • empty non-source translations are treated as unresolved and can fall through to the fallback chain
  • source fallback is explicit for non-source locale compilation
  • source-locale compilation always materializes empty source values from source text
  • plural messages are emitted as final ICU plural strings using the preserved plural variable
  • invalid final ICU strings become diagnostics by default and can become hard errors in strict mode
  • source/translation ICU compatibility diagnostics are opt-in with icu_compatibility
  • final ICU syntax and compatibility parsing use IcuSyntaxPolicy::Strict by default; use compile_catalog_artifact_with_icu_options or compile_catalog_artifact_selected_with_icu_options with CompileCatalogArtifactIcuOptions set to IcuSyntaxPolicy::RuntimeLiteralApostrophes when the runtime accepts ordinary literal apostrophes
  • runtime-specific formatter support diagnostics are opt-in with CompileCatalogArtifactIcuOptions::with_formatter_support; the callback decides which ICU formatter kinds and styles are supported while Ferrocat attaches compiled-key, source-key, and resolved locale metadata to the emitted diagnostics

Use pseudolocalize_compiled_catalog_artifact when a host wants a pseudo-locale variant of the final runtime artifact. It transforms only ICU literal text in the artifact messages map and preserves placeholders, formatter syntax, plural selectors, #, tag names, missing-message records, and diagnostics. Use pseudolocalize_compiled_catalog_artifact_with_syntax_policy when the artifact was compiled with a non-default ICU syntax policy such as IcuSyntaxPolicy::RuntimeLiteralApostrophes; pass the same policy so pseudolocalization accepts the same final runtime strings as compilation. Pseudo-locale routing and framework output remain host responsibilities.

compile_catalog_artifact_selected

Use this when a host adapter already knows the exact compiled runtime IDs it needs and wants only that slice of a requested-locale artifact.

This is the narrower companion to compile_catalog_artifact:

  • build or reuse a CompiledCatalogIdIndex
  • pass only the selected compiled IDs
  • keep the same fallback, missing, and ICU-validation semantics
  • return the same CompiledCatalogArtifact shape, but filtered to the requested subset

Choose this when a bundler/plugin layer has already mapped modules or chunks to the exact message IDs they require.

Like the broader artifact API, the request struct borrows locale data and selection slices, so callers can reuse existing vectors or arrays of compiled IDs without another owned wrapper.

compile_catalog_artifact_report

Use this when host tooling needs the final runtime artifact plus an explanation of how each compiled message resolved.

The returned CompiledCatalogArtifactReport contains:

  • artifact, the unchanged CompiledCatalogArtifact payload
  • provenance.requested_locale, source_locale, and fallback_chain
  • provenance.messages, with one row per compiled source identity

Each provenance row includes the compiled key, original source_key, resolved locale, and a resolution kind:

  • requested when the requested locale supplied the runtime value
  • fallback when a configured non-source fallback locale supplied it
  • source_fallback when the source locale supplied it through explicit source fallback
  • unresolved when no runtime value was available

Use CompileCatalogArtifactReportOptions::new for the full artifact/report pair, or CompileCatalogArtifactReportOptions::selected when a host adapter already has a CompiledCatalogIdIndex and selected compiled IDs. The same report-oriented entry point accepts ICU options through CompileCatalogArtifactReportOptions::icu_options, so provenance does not add another full cross-product of compile functions.

CompiledCatalogIdIndex

Use this when you need stable compiled-ID metadata without compiling message payloads immediately.

Useful helpers now include:

  • iter for deterministic compiled-ID traversal
  • as_btreemap / into_btreemap when another tool wants the raw ordered mapping
  • describe_compiled_ids to ask which requested IDs are known, available in a given catalog set, and whether they are singular or plural

describe_compiled_ids returns a structured report:

  • described for IDs that are known to the index and present in the provided catalog set
  • unknown_compiled_ids for IDs that do not exist in the index at all
  • unavailable_compiled_ids for IDs that are known to the index but not present in the provided catalog set

update_catalog

Use this for the full high-level catalog update path in memory.

This goes beyond a raw merge. It can:

  • parse an existing catalog into the canonical model
  • merge extracted messages from either structured catalog input (CatalogUpdateInput::Structured) or source-first messages (CatalogUpdateInput::SourceFirst)
  • handle locale/plural logic
  • apply storage-specific defaults
  • preserve or report diagnostics
  • sort and export the final catalog as PO or FCL

Choose update_catalog when you want a complete update operation rather than just the lower-level merge step.

Compared with merge_catalog, this is the "full semantics" path. It is the better fit when catalog correctness and consistency matter more than taking the shortest merge route, for example in release pipelines or when you want predictable headers, ordering, plural handling, and diagnostics.

UpdateCatalogOptions borrows locale strings, optional existing content, and optional custom-header maps. The extracted message payload itself stays owned, because that is usually the natural shape for upstream extractor output.

Like parsing, updates should normally choose one CatalogMode. PO with ICU-native semantics remains the default. FCL storage uses a single %FCL1 header line plus one tab-separated entry per line.

Choose FCL when collaboration and automation matter more than maximum PO fidelity. The one-entry-per-line shape keeps large catalog diffs readable, keeps unchanged entries byte-identical across merge inputs for clean git 3-way merges, and makes ordinary Git conflict handling more practical, including hosted review flows where custom .gitattributes merge drivers may not be part of the web merge path.

Use CatalogMode::GettextPo for explicit PO-interop mode that writes classic gettext plurals.

In GettextCompat mode, Ferrocat synthesizes Plural-Forms only for safe locale defaults that it knows, including common one-form, Germanic/Romance two-form, French/Brazilian Portuguese, Slavic three-form, Polish, Czech/Slovak, and Arabic rules. Unknown locales keep the header unset and report a diagnostic instead of guessing a plural expression.

In native mode, CatalogUpdateInput::SourceFirst stays source-text-first; it no longer auto-projects ICU strings into structured plural messages. Use CatalogUpdateInput::Structured when you want explicit plural structure.

In FCL storage, arbitrary gettext-style custom headers are intentionally out of scope for v1; only the explicit %FCL1 header metadata and per-entry columns are persisted. Machine-managed value metadata uses the optional lock= and ai= tags appended to the entry line (see ADR 0022).

use ferrocat_po::{
    CatalogMode, CatalogUpdateInput, SourceExtractedMessage, UpdateCatalogOptions, update_catalog,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let result = update_catalog(UpdateCatalogOptions {
        locale: Some("de"),
        mode: CatalogMode::IcuFcl,
        input: CatalogUpdateInput::SourceFirst(vec![SourceExtractedMessage {
            msgid: "Checkout".to_owned(),
            ..SourceExtractedMessage::default()
        }]),
        ..UpdateCatalogOptions::new("en", CatalogUpdateInput::default())
    })?;

    assert!(result.content.starts_with("%FCL1\t"));
    assert!(result.content.contains("Checkout"));
    Ok(())
}

Field type note. CatalogMessage::origin — and the per-item collections on the lower-level PoItem/BorrowedPoItem (references, comments, extracted_comments, flags, metadata) — are PoVec<T>, a re-exported SmallVec that stores the common single-element case inline. Reading is unchanged (it dereferences to a slice and iterates like a Vec); to construct one, use PoVec::new(), Default::default(), or vec![…].into(). PoVec and SmallVec are re-exported through ferrocat::catalog, ferrocat::po, and the crate root, so you never need to depend on smallvec directly.

Machine-managed value metadata

CatalogMessage::machine carries optional metadata for entries whose value was set by a machine — an AI engine, a translation memory system, or a script. Its presence marks the value as machine-managed. It has two fields:

  • lock: an integrity fingerprint of the value when the machine set it. If hash(current value) no longer equals lock, a human edited it afterwards.
  • ai: optional AI provenance — an opaque model id and an optional confidence decimal in [0, 1].

The point is to protect rare human corrections. Machine translations ship as-is by default; when someone fixes one by hand, the lock stops matching, so a re-translation pass can recognize that entry and leave it alone instead of overwriting it.

Use machine_translation_hash to compute the lock for the current translation. Ferrocat uses SHA-256 over a canonical translation payload plus the fixed ferrocat:mt:v1 namespace, truncates the digest to 128 bits, and encodes it as unpadded Base64URL. This is a change-detection marker, not a security signature.

Both storage formats use the same fields. PO stores them as metadata comments:

#@ lock: <hash>
#@ ai: openai/gpt-5.5-high:0.95

FCL appends them as tags on the entry line:

Hello		Hallo	lock=<hash>	ai=openai/gpt-5.5-high:0.95

The ai model id is free-form, so the provider prefix is optional and only a trailing [0, 1] decimal after the last : is read as confidence. A machine timestamp is deliberately not stored: timestamps churn merges, and staleness is detected by the lock hash. See ADR 0022.

Parsing preserves the metadata even when the lock no longer matches. High-level writers such as update_catalog verify the lock while rendering and drop the whole block when the value has changed.

use ferrocat::{
    EffectiveTranslationRef, machine_translation_hash,
};

fn main() {
    let hash = machine_translation_hash(EffectiveTranslationRef::Singular("Hallo"));
    assert!(!hash.is_empty());
}

update_catalog_file

Use this when you want the same high-level behavior as update_catalog, but against a file path.

It reads the current file if present, runs the full update, and only writes back when the result actually changed.

Choose this for CLI tools, task runners, or build/dev pipelines that work directly on catalog files on disk.

Like update_catalog, it accepts CatalogUpdateInput, so extractor tooling can choose between a raw source-first path and an explicitly structured plural path without having to write PO/FCL itself.

UpdateCatalogFileOptions borrows both the path and the locale/header inputs, so file-based automation can call it without constructing throwaway owned request objects.

ICU MessageFormat

Ferrocat's ICU APIs currently target ICU MessageFormat v1. MessageFormat 2 is not parsed, validated, converted, or formatted by the public API. That is a deliberate scope boundary: MF2 is worth tracking, but the current catalog surface gets more practical value from robust MF1 parsing, authoring diagnostics, and artifact validation.

parse_icu

Use this when you need the parsed ICU AST.

Typical use cases:

  • inspecting plural/select structure
  • converting ICU messages into another internal representation
  • extracting semantic information from messages

stringify_icu

Use this when you already have an IcuMessage AST and need ICU MessageFormat text again. The serializer is canonical rather than byte-preserving: whitespace and apostrophe quoting may differ from the source, but parse -> stringify -> parse preserves the supported AST structure.

pseudolocalize_icu

Use this when you need pseudo-locale text while preserving the ICU message contract. The transform parses the message, walks the AST, modifies only literal text, and renders through stringify_icu.

It preserves:

  • argument names and placeholders
  • formatter kinds and styles
  • plural/select selectors and plural offsets
  • # placeholders inside plural branches
  • rich-text tag names
use ferrocat::{IcuPseudolocalizationOptions, pseudolocalize_icu};

let pseudo = pseudolocalize_icu(
    "Hello {name}",
    &IcuPseudolocalizationOptions::new().with_expansion_percent(0),
)?;
assert_eq!(pseudo, "[!! Ĥéļļö {name} !!]");
# Ok::<(), ferrocat::IcuParseError>(())

Default options add a visible wrapper and 30% literal filler for layout pressure. Wrapping and ASCII-letter mapping are stable across accidental repeat application, so pseudolocalizing an already pseudolocalized message does not keep growing it.

validate_icu

Use this when you only need a yes/no syntax check with an error surface.

analyze_icu

Use this after parse_icu when you need a structured summary of message arguments, formatter kinds and styles, plural/select selectors, and rich-text tags.

compare_icu_messages

Use this to compare source and translated ICU messages before shipping a runtime artifact. The report uses stable diagnostic codes for missing or extra arguments, formatter kind or style changes, tag mismatches, missing select or plural selectors, plural offset changes, and discouraged pattern-style formatters.

At the catalog artifact layer, set icu_compatibility: true on CompileCatalogArtifactOptions or CompileSelectedCatalogArtifactOptions to collect the same diagnostics while compiling the final locale artifact. The ferrocat_icu::diagnostic_codes::icu constants expose the canonical code strings for direct ICU checks, while ferrocat_po::diagnostic_codes::icu contains the same compatibility strings plus catalog-level ICU syntax codes.

validate_icu_formatter_support

Use this when a host runtime supports only a subset of ICU formatter kinds or styles. The API analyzes a parsed message, passes each discovered formatter to a caller-provided support callback, and returns standard icu.* diagnostics for unsupported formatter kinds or styles.

This keeps runtime-specific support policy in the host adapter while giving CI, editors, and build tools the same diagnostic shape as the rest of Ferrocat's ICU compatibility checks. For example, a JavaScript adapter can accept number and date, reject list, or reject pattern-style date formats before the catalog artifact ships.

extract_argument_names / extract_tag_names

Use these when data variables and rich-text tags need to be handled separately. The older extract_variables helper still returns the historical mixed view.

extract_variables

Use this after parse_icu when you want the variable names referenced by the message.

Semantic message metadata

Use MessageMetadataInput and normalize_message_metadata when an extractor or host adapter wants to describe source-side facts around a message without inventing a separate stable ID scheme. The required identity is still msgid plus optional msgctxt; in Palamedes-style workflows the msgid is usually the source string, while ID-style catalogs can still use an opaque key as msgid.

The metadata shape is progressive. A simple record can be just:

{ "msgid": "Cart" }

Argument metadata can use a shorthand when the host knows only the broad kind:

{
  "msgid": "Hello {name}",
  "args": {
    "name": "string"
  }
}

For ICU MessageFormat v1 messages, Ferrocat can derive normalized arguments, rich-text tags, and select/plural selector metadata from the msgid. Use validate_message_metadata to report conflicts between explicit metadata and the parsed source message. Translations remain catalog data; msgstr is not part of this source-side metadata format.

Error Surface

ApiError and the lower-level ParseError are intentionally small today. ApiError distinguishes parse, I/O, invalid-argument, conflict, and unsupported-operation failures, and its std::error::Error::source() implementation preserves underlying parse and I/O causes. Disk helpers that know the affected path attach path context to ApiError::Io; call ApiError::path() to read that context without parsing the display string. PO ParseError keeps the existing human-readable display message and exposes message() plus optional position() metadata for tooling. Position metadata reports a zero-based byte offset plus one-based line and column when the parser can attach source context. For file or network input, use ferrocat_po::parse_po_bytes when you want Ferrocat to validate UTF-8 bytes and reject declared non-UTF-8 PO charsets before syntax parsing.

Practical Rule Of Thumb

  • Editing raw PO files: parse_po + stringify_po
  • Hot-path PO inspection: parse_po_borrowed
  • Classic Gettext merge step: merge_catalog
  • N-way catalog overlays and set operations: combine_catalogs
  • Full app-level catalog maintenance: update_catalog or update_catalog_file
  • Parsed catalog consumption with keyed accessors: parse_catalog + into_normalized_view
  • Locale-specific runtime artifact generation: compile_catalog_artifact
  • Selected locale artifact generation by compiled ID: compile_catalog_artifact_selected
  • Runtime artifact provenance reporting: compile_catalog_artifact_report
  • Release QA across catalog sets: audit_catalogs
  • Per-locale completeness dashboards: catalog_coverage
  • Translator handoff diffs: catalog_review
  • ICU analysis: parse_icu
  • ICU-aware pseudolocalization: pseudolocalize_icu