Khaos Live

How AI Debate Judging Works

Inside the AI debate judge: the rubric, the scoring, the appeals.

Khaos Live publishes everything about how the AI debate judge scores live arguments — the rubric, the weights, the reasoning trail, and the appeal path. There is no black box; there is no secret panel. This page is the full breakdown.

The five-component rubric

Every round is graded on substance (was the argument materially correct), evidence (was it supported by sources or specifics), clarity (was the audience able to follow), rebuttal (did it land on the opponent's strongest point), and steelmanning (did it engage the strongest version of the other side). Weights vary slightly by topic category and are published per debate.

Real-time scoring, round by round

The judge ingests live captions, segments the transcript into claims and rebuttals, and assigns scores within seconds of the round buzzer. Mid-round indicators show the audience which side is leading on which component, without revealing the final number until the buzzer.

Reasoning trail and appeals

Every round produces a written reasoning trail explaining what scored and why. Creators who disagree can flag a round for review. Recurring disputes feed back into the rubric prompt — meaning the judge improves over time, in public.

What the judge ignores

Follower count, platform rank, identity, on-platform reputation, audience vote, and chat reactions. The judging pass sees the transcript and the prompt context only. That is the entire input.

Frequently asked questions

What does the AI judge actually score?

Five components: substance, evidence, clarity, rebuttal strength, and steelmanning. Each round produces a sub-score per component and a final round score.

Can the AI judge be wrong?

Yes. That is why every scorecard ships with a reasoning trail and a transcript. Creators can challenge rounds, and the reviews feed back into the rubric.

Does the judge know who the debaters are?

No. The judging pass strips identity, follower count, and platform rank. It scores the transcript only.

Which AI models power the judging?

Top-tier reasoning models from the Gemini Pro and GPT-5 families, selected by topic complexity, debate length, and language.

Explore more