Operator-level assessment

The target level is operator: someone who catches flaws, explains consequences, and connects technical decisions to users, money, time, risk, and production behavior.

Participants can be evaluated through AI-agent scenarios that do not give hints or partial credit. The point is to test whether they actually understood the session or only think they did.

From order taker to operator

An order taker accepts AI output. A mechanic catches local technical mistakes. An operator understands the system-level consequence and can push back with reasons.

Evaluation harness thinking

Participants learn to design tests, slices, regression gates, failure cases, and review loops for AI systems. The same mindset used to evaluate agents is used to evaluate participant work.

How We Evaluate

A correct output is not enough.

Most courses test recall. Antern evaluates whether participants can think like operators when the problem is ambiguous, the AI is confident, and the obvious answer is wrong.

Evaluation is process-aware: participants are assessed on reasoning, verification, business consequence, failure detection, and ability to defend technical decisions, not only on whether they produced a polished artifact.

Reasoning Trace

Can the participant explain the path from problem framing to final decision, including assumptions, alternatives, and why a chosen approach is appropriate?

Verification Discipline

Can the participant test AI output, inspect evidence, design checks, use independent reviewers, and reject plausible but unsupported answers?

Failure Awareness

Can the participant predict where the system breaks: hallucination, bad retrieval, context overflow, silent regressions, cost spikes, latency, or unsafe actions?

System Judgment

Can the participant connect technical choices to users, maintainability, business constraints, deployment realities, and long-term system behavior?

Order Taker

Accepts the answer, follows the tool, and treats fluent output as evidence of competence.

Mechanic

Can make the artifact run, fix local issues, and identify obvious technical mistakes.

Operator

Can supervise the full system, challenge AI, defend tradeoffs, and reason about consequences under ambiguity.

Ambiguous prompt

Participants receive a task with missing context, unclear intent, or a tempting but incomplete instruction.

System design first

They define the objective, invariants, risks, success metrics, and plan before implementation.

AI-assisted build

They use AI tools, but must preserve logs, reasoning, checkpoints, and decision ownership.

Independent verification

They evaluate the artifact with tests, slices, critics, reviewers, traces, and failure probes.

Defense

They explain what worked, what failed, what they would change, and why the system should be trusted.

Evidence Required

Proof-of-work must include the reasoning around the work.

A final demo matters, but it is not enough. Participants are expected to produce artifacts that make judgment inspectable: what they planned, what they tested, what failed, what changed, and what still carries risk.

Definition of doneSystem design noteInvariants and success metricsEvaluation slicesFailure-mode analysisHuman-AI verification logArchitecture decision recordFinal demo and technical writeup

What Does Not Count

Fluent output is not proof of capability.

The evaluation explicitly rejects signals that look impressive but do not show durable understanding.

A polished AI-generated answer without explanation
A project that works only on the happy path
A demo with no failure cases or evaluation
A correct output that the participant cannot defend
High activity with no clear reasoning trail
Overconfidence when the evidence is weak