A Better Dataset for AI Alignment

The Witness Protocol is building a permissioned, high-signal corpus of human testimony for AI alignment research.

Not mass collection. Not scraped opinion. A smaller, more deliberate dataset built from reflective human reasoning, structured dialogue, and auditable synthesis.

The Thesis

Current AI systems are trained on enormous volumes of human output. That gives them breadth, but not necessarily judgment.

The Witness Protocol starts from a narrower and more useful premise:

"A carefully curated body of human testimony may preserve forms of reasoning that raw internet-scale data usually flattens or loses."

  • Tradeoff handling
  • Ethical self-location
  • Relational awareness
  • Reflective restraint
  • The ability to reason under uncertainty without collapsing into slogans

The project is not trying to replace large-scale data. It is testing whether a smaller, permissioned, better-structured corpus can become a meaningful corrective input for alignment research.

Why This Matters

AI alignment does not only need more data. It needs signal that remains legible under scrutiny.

A large portion of public data is optimized for reaction, repetition, visibility, and speed. The Witness Protocol is exploring the opposite direction: slower inputs, clearer consent, stronger provenance, and outputs that can be reviewed as research artifacts rather than absorbed into opaque systems.

Can a small, auditable corpus of reflective human testimony become useful alignment infrastructure?

What Makes This Different
Selective Intake

Not every submission enters the process. The system is designed to protect signal quality from the beginning.

Structured Inquiry

Accepted contributors move through a guided dialogue process designed to reach the reasoning beneath the answer — not just the answer itself.

Annotation and Synthesis

The project preserves not only what was said, but how it was reasoned, where tensions appeared, and what ethical structure was present.

Governed Artifacts

Outputs become reviewable research artifacts — not one-off prompts or loose notes absorbed into opaque systems.

What the Project Is For

The long-term aim is to create a body of testimony that can support work such as:

Alignment-oriented evaluation

Qualitative benchmarking

Prompt and policy stress-testing

Synthesis research

Future dataset and fine-tuning experiments where provenance and reasoning quality matter more than scale alone

A Smaller, Cleaner Inheritance

The Witness Protocol is not trying to solve the whole problem in one gesture.

It is trying to build one missing layer well: a smaller, cleaner, more humanly legible inheritance for future systems.

Those who recognize the difference will know where to look next.

The Protocol

The Witness Protocol is currently in an alpha-stage build focused on proving the method end to end.

The immediate goal is not scale. It is to run a small, controlled, auditable flow that can be evaluated honestly: intake, selection, dialogue, governed storage, annotation, synthesis, and export.

Current Build Focus

The current system is being shaped around a narrow operational path.

01

Selective intake and review

02

Accepted-witness cohort handling

03

Structured dialogue through the Witness flow

04

Governed storage and consent boundaries

05

Annotation and synthesis of accepted testimony

06

Export of reviewable downstream artifacts

The project is being built to make the process inspectable, not theatrical.

How AI Is Used

AI is not being used here as a replacement for human judgment. It is being used as an instrument inside a bounded protocol.

Intake Support

Models help filter low-signal submissions and support early qualitative review.

Structured Inquiry

The dialogue layer guides accepted witnesses beyond surface opinion and toward the reasoning underneath it.

Synthesis Support

Accepted testimony is turned into structured, reviewable artifacts rather than remaining only as raw conversation.

Annotation Support

AI assists in organizing, surfacing, and comparing patterns inside accepted material, while preserving the ability to inspect the resulting outputs.

Evaluation and Stability

The system is pressure-tested so the protocol remains legible, bounded, and consistent as prompts, providers, and downstream components evolve.

For Providers and Research Partners

The Witness Protocol is not trying to maximize engagement volume. It is trying to increase the quality of the data pathway.

Support at this stage does not vanish into generic product growth. It directly strengthens:

Intake and screening quality

Structured dialogue quality

Synthesis and annotation quality

Evaluation and regression discipline

The first auditable corpus slice

The Right Collaborators

For those tracking where alignment work becomes method rather than slogan, this is an appropriate place to begin.

Some will read this as a research concept. Others will recognize an instrument taking shape.

The right collaborators rarely need a hard invitation. They tend to recognize timing when it appears.