How We Evaluate

A correct output is not enough.

Most courses test recall. Antern evaluates whether participants can think like operators when the problem is ambiguous, the AI is confident, and the obvious answer is wrong.

Evaluation is process-aware: participants are assessed on reasoning, verification, business consequence, failure detection, and ability to defend technical decisions, not only on whether they produced a polished artifact.

Assessment Model

We evaluate the process behind the answer.

In an AI-native world, output alone is a weak signal. A participant can produce something fluent without understanding it. The assessment looks for reasoning, verification, and consequence-awareness.

Reasoning Trace

Can the participant explain the path from problem framing to final decision, including assumptions, alternatives, and why a chosen approach is appropriate?

Verification Discipline

Can the participant test AI output, inspect evidence, design checks, use independent reviewers, and reject plausible but unsupported answers?

Failure Awareness

Can the participant predict where the system breaks: hallucination, bad retrieval, context overflow, silent regressions, cost spikes, latency, or unsafe actions?

System Judgment

Can the participant connect technical choices to users, maintainability, business constraints, deployment realities, and long-term system behavior?

Competence Ladder

The goal is not tool usage. The goal is supervision.

Participants move from accepting AI output, to fixing local mistakes, to operating a complete system with judgment.

Order Taker

Accepts the answer, follows the tool, and treats fluent output as evidence of competence.

Mechanic

Can make the artifact run, fix local issues, and identify obvious technical mistakes.

Operator

Can supervise the full system, challenge AI, defend tradeoffs, and reason about consequences under ambiguity.

Evaluation Loop

A serious assessment includes ambiguity, build work, verification, and defense.

The loop is designed to reveal whether participants actually understand the system or only produced a convincing artifact with AI.

01

Ambiguous prompt

Participants receive a task with missing context, unclear intent, or a tempting but incomplete instruction.

02

System design first

They define the objective, invariants, risks, success metrics, and plan before implementation.

03

AI-assisted build

They use AI tools, but must preserve logs, reasoning, checkpoints, and decision ownership.

04

Independent verification

They evaluate the artifact with tests, slices, critics, reviewers, traces, and failure probes.

05

Defense

They explain what worked, what failed, what they would change, and why the system should be trusted.

Evidence Required

Proof-of-work must include the reasoning around the work.

A final demo matters, but it is not enough. Participants are expected to produce artifacts that make judgment inspectable: what they planned, what they tested, what failed, what changed, and what still carries risk.

Definition of doneSystem design noteInvariants and success metricsEvaluation slicesFailure-mode analysisHuman-AI verification logArchitecture decision recordFinal demo and technical writeup
What Does Not Count

Fluent output is not proof of capability.

The evaluation explicitly rejects signals that look impressive but do not show durable understanding.

  • A polished AI-generated answer without explanation
  • A project that works only on the happy path
  • A demo with no failure cases or evaluation
  • A correct output that the participant cannot defend
  • High activity with no clear reasoning trail
  • Overconfidence when the evidence is weak