How sovereignty is implemented in the systems where data lives
Sovereignty is not self-executing. It may be protected or diminished in access controls, schema choices, model training pipelines, and the provenance metadata — in any line of code and any infrastructure decision that touches Tribal data. Good intentions do not survive contact with a misconfigured S3 bucket. The interviews and the literature are clear on this: the work of honoring sovereignty in technical systems is a design discipline, not a disclosure statement.
The FAIR principles — Findable, Accessible, Interoperable, Reusable — are embraced as the foundation of Open Science. These principles should not be applied automatically to Indigenous data except where they are intentionally chosen through self-determination and informed consent. The CARE principles — Collective Benefit, Authority to Control, Responsibility, Ethics — make explicit what FAIR implies25,28,29. FAIR asks whether others can find and use data; CARE asks whether they should, and on whose terms.
Access is not one decision — it's many. Raw datasets, analytical outputs, and the derived artifacts that flow from them (models, feature stores, embeddings, synthetic data) each need their own controls, because each carries different risks of re-identification and misuse5,9,15. Tiered access is how sovereignty becomes enforceable in a system that otherwise defaults to "make it queryable." Who can reach what, under what conditions, and for how long — these belong in the system's architecture, co-determined by the partners at the outset, not bolted on after launch.
Partnerships end. Some pause. Others evolve into something different than what was first signed. Agreements need to answer what happens to the data, and to everything built from the data — reports, features, models — at each of those moments4,22. How long is it kept? What triggers deletion? Who decides, and in what timeframe? A data policy without a deletion plan is not a policy. It's an assumption that partnerships last forever, which is one of the few things we can reliably predict they won't.
The context around a dataset's origins has to travel with it — across forks, across teams, across the staff turnover that eventually hits every organization24,25. Where the data came from, under what agreement, with which restrictions: these facts shape every future decision about reuse. Without them, the person asking "can I use this for X?" five years from now is guessing. And guessing is how consent erodes. Provenance metadata is the paper trail that keeps accountability attached to the data after the people who negotiated the original agreement have moved on.
"Good outcome" is a governance question before it is a technical one. Before touching a model, establish mutual agreement on what success actually means — in what language, at what scale, for whose benefit. Default metrics (deficit indicators, accuracy against a conventional benchmark, engagement-as-success) often describe problems Tribal Nations did not ask to have solved— and reflect frameworks that originated outside tribal priorities.21,30,31. A model that optimizes for the wrong target is not "almost there." It is a different model than the one that was needed.
Consent for collecting data is not consent for training and deploying a model with that data22,23,28. Fine-tuning requires its own consent. Reuse across projects requires its own consent. Each is a separate governance question with its own scope, its own conditions, and its own way to end it. Decide with the partner Nation, up front, what can be built from the data and what cannot — and put the answer in writing before the first training run.
A model trained on sensitive data is still shaped by those data, even when the raw data are locked away23,28. Derived features can reveal patterns. Embeddings can encode identity. Model weights can be probed. Synthetic data generated from a population of 200 is not the same kind of protection as synthetic data generated from 200,000. Which derived artifacts leave the system, who can use them, and under what controls — these are questions to settle before the artifacts exist, not after a researcher asks for the model checkpoint.
Data get misread. Analyses get cited out of context. Systems get breached26,27. Plan for all three. That means: the ability to pause access quickly, revoke use when needed, and — the hardest one — pick up the phone and call the partner yourself when something has gone wrong, before they hear about it from someone else. This is not defensive engineering. It is respect, made operational.