Archive: the canonical corpus

The archive is the canonical body of legislation in Akoma Ntoso, produced by the pipeline, addressed by a stable identifier, and released into the public domain. It is generated, never hand-edited.

archive on GitHub ↗︎

Generated, not curated

The archive is the project’s output, not its source of truth. Every document in it came out of the pipeline, built from an official source, validated against the conformance suite, and reproducible from that source.

None of it is edited by hand. If a document is wrong, the fix goes into the adapter that produced it, never into the archive. That’s the whole point. There is no manual layer where a typo, an opinion, or a quiet “correction” can slip in.

What’s in it

Each act is stored as native Akoma Ntoso, and carries three things:

a stable OLF identifier, resolvable down to the element;
normalized temporal metadata, so “the law as it stood on a given date” is a real query and not a research project;
its citation graph: how it relates to every other act.

The content stays in the jurisdiction’s own Akoma Ntoso profile. The archive normalizes the metadata around the text. It never touches the text itself.

Public domain

The archive is CC0-1.0. No rights reserved. The law belongs to everyone, and a faithful structured copy of it should too. (The code that produces it, spec plus pipeline and diff, is Apache-2.0.)

Status

Early, and we’d rather say so plainly. The archive grows as adapters land in the pipeline, and the first content comes from the Italy and France adapters being built now. The structure and the license are settled. The coverage is just getting started.