Production Builds

Participants build systems, not disposable projects.

Every build exists to teach a discipline: context, agents, retrieval, evaluation, reliability, product judgment, or outreach systems.

The sprint is implementation-first. Participants build AI systems live, document architectural decisions, evaluate failure modes, and turn the work into proof-of-work that can survive technical scrutiny.

Build Families

Projects are selected to teach engineering disciplines.

The goal is not to collect portfolio thumbnails. Each build forces participants to learn a system pattern that appears in serious AI products.

AI Coding Systems

Participants study coding agents as systems: context loading, planning, edits, tests, review, rollback, logs, and human approval before risky changes.

PlanningRepo ContextPatch ReviewTest LoopsRollback

Retrieval + Knowledge Systems

Participants build ingestion, chunking, embeddings, hybrid search, reranking, context assembly, citation behavior, and retrieval evaluation.

IngestionEmbeddingsHybrid SearchRerankingCitations

Agent Reliability Harnesses

Participants design agents that can handle ambiguity, missing capabilities, policy boundaries, tool failures, token budgets, and inconsistent user intent.

AmbiguityPoliciesTool UseRetriesVerifiers

Evaluation Platforms

Participants create datasets, slices, judges, regression gates, traces, dashboards, and review workflows for AI systems that must improve safely.

Golden SetsSlicesLLM JudgesTracingRegression Gates

AI Product Systems

Participants connect model behavior to product workflows: user intent, backend services, queues, observability, cost, latency, and deployment.

BackendQueuesObservabilityCostDeployment

Outreach + Operator Systems

Participants learn to build AI-assisted systems for research, ICP design, campaign debugging, personalization, CRM hygiene, and opportunity creation.

ICPResearchCampaignsCRMFollow-up
Build Lifecycle

Participants are evaluated on the system around the demo.

The live build process forces participants to design first, implement with AI, verify independently, and explain what the artifact can and cannot do.

01

Frame

Define the user, task, constraints, risks, and what success would actually mean.

02

Design

Write architecture, invariants, interfaces, context strategy, and evaluation plan before coding.

03

Build

Implement with AI assistance while preserving checkpoints, logs, and human decision ownership.

04

Evaluate

Run tests, slices, traces, critics, regressions, and failure probes against the system.

05

Defend

Explain tradeoffs, rejected alternatives, limits, cost, latency, and what would fail in production.

06

Publish

Turn the build into proof-of-work: demo, writeup, architecture note, and evaluation report.

Hard Constraints

Builds are tested against the conditions that break real agents.

Participants do not only build happy-path demos. They learn how systems behave when tools are missing, context is incomplete, policies constrain behavior, costs matter, latency matters, and a human must approve risky actions.

Ambiguous or incomplete user requestsMissing tools or unsupported capabilitiesPolicy-constrained actionsToken and latency budgetsExpensive model callsContext overflowTool failures and retriesHuman approval gates
Capstone Directions

Final systems are chosen for depth, defensibility, and signal.

Participants may work in different product directions, but every serious capstone must include architecture, evaluation, deployment thinking, and proof-of-work.

PR Review Agent

A system that reviews changes, checks codebase consistency, runs verification, and explains risk before approval.

Research Agent

A workflow that reads sources, extracts claims, compares evidence, tracks uncertainty, and produces citation-aware briefs.

Knowledge Agent

A retrieval system with memory, citations, evals, context assembly, and clear failure behavior.

Sales or Outreach Agent

A disciplined operator system for account research, message generation, follow-up logic, and campaign debugging.

Agent Observability Platform

A dashboard for traces, decisions, tool calls, cost, latency, failure modes, and regression signals.

Evaluation Harness

A reusable test bed for checking reliability across slices, ambiguous requests, policies, and repeated runs.

Project Details

Exact build briefs stay inside the cohort.

This page shows the public build architecture. The counsellor flow covers exact weekly projects, datasets, deliverables, evaluation rubrics, and expected proof-of-work.

Talk to a counsellor