Production System
AskAlan
A production AI teaching assistant that gives University of Toronto engineering students trusted, instructor-approved access to their own course material — deployed across 19 courses and answering thousands of student questions.
Python · FastAPI · PostgreSQL · Anthropic Claude · RAG
Production system · University of Toronto
01 / Overview
AskAlan is a retrieval-augmented AI assistant that gives University of Toronto engineering students trusted access to their instructors' own course material — lectures, exercises, past exams, and announcements — instead of the generic, unsourced answers public chatbots give. It began as a capstone project and is now a production system maintained by a faculty-led team, deployed across 19 courses.
I led the consolidation of its retrieval stack onto Postgres. The first phase migrated the keyword-search layer off Elasticsearch behind a behavior-preserving adapter, mirroring roughly 118,000 document chunks across all 19 courses and validating ranking parity against more than 8,000 real student queries before any cutover. It cut keyword-search latency by about 40% and removed one of three separate datastores from the operational footprint, with dual-read/dual-write safety and instant rollback throughout.
Beyond the migration I worked across the stack: rebuilding multimodal retrieval so exam and image content surfaces correctly, instrumenting end-to-end token and cost tracking across five model providers, cutting streaming latency about 35%, and hardening the live deployment's secret handling. The throughline is shipping reliable changes inside a real system with real users and real constraints — and leaving it measurably faster and cheaper to run.
02 / Key Features
Zero-Downtime Search Migration
Re-architected the retrieval backbone from Elasticsearch onto Postgres behind a behavior-preserving adapter, with dual-read/dual-write validation and one-flag rollback, so live courses saw no disruption.
Validated at Production Scale
Replayed 8,000+ real student queries through both engines to prove ranking parity before cutover, and cut keyword-search latency roughly 40%.
Multimodal Retrieval
Rebuilt the pipeline so lecture text and exam-image content surface together, re-embedding thousands of page images and fixing an embedding defect that was burying real results.
Full-Stack Cost Observability
Instrumented every model and embedding call across five providers for per-course, per-feature cost attribution, surfaced in live faculty and developer dashboards.
Streaming Latency Optimization
Reordered the response-streaming path to cut perceived latency about 35%, improving time-to-first-token on every conversation.