Portfolio - RAG Chatbot
Builder at Personal Project · 2025
Built a full-stack AI chatbot from scratch to demonstrate that I don't just spec AI products—I build them.
The Challenge
Most PM portfolios are static pages listing past experience. I wanted to build something that demonstrates I can go beyond writing specs—actually designing and implementing an AI system end-to-end. The goal: a portfolio that itself is a product, showcasing RAG architecture, prompt engineering, and modern development workflows.
My Approach
Chat-First Product Decision
Made the chat interface the homepage itself—visitors land directly in a conversation with an AI that knows my background. This forces the AI implementation to be excellent, since it's the first thing people experience.
RAG Architecture Design
Designed a retrieval-augmented generation pipeline: content chunked into semantic pieces, embedded with OpenAI (1536 dimensions), stored in PostgreSQL + pgvector, and retrieved via cosine similarity to ground Claude's responses in real portfolio content.
AI-Assisted Development
Used Claude Code with specialized agents (planner, architect, tdd-guide, code-reviewer, security-reviewer) orchestrated through an autonomous workflow. This meta-approach demonstrates how AI tools can accelerate product delivery while maintaining quality gates.
Quality Engineering
Applied strict TDD methodology: tests written before implementation, code review on every change, security review on all API endpoints. 200+ tests ensure the system works reliably and can be extended confidently.
Evaluation Pipeline
Built an evals system following an error-analysis-first methodology: reviewed real interactions to categorize failure modes, then created separate evaluators for retriever quality and generator quality. This treats the chatbot as a measurable product, not a black box.
Key Decisions
- →Chat-first homepage over traditional portfolio layout. The chat IS the product—if the AI experience is compelling, it proves the point better than any case study description.
- →pgvector over dedicated vector DB (Pinecone, Weaviate). For 25+ content chunks, PostgreSQL with vector extension provides the same capabilities without adding infrastructure complexity or cost.
- →Claude for generation, OpenAI for embeddings. Each model excels at its task—Claude's reasoning produces better conversational answers, while OpenAI's embedding model provides efficient vector representations.
- →Structured phases with autonomous execution. Breaking the build into 8 clear phases with quality gates enabled rapid parallel development without sacrificing code quality.
- →Evaluated retriever and generator separately rather than treating the RAG pipeline as a single unit. When answers are poor, the fix is different depending on whether the right content wasn't retrieved or whether the LLM misused good context—separating these surfaces the real problem.
Results
This project demonstrates end-to-end AI product execution: from architecture decisions and vector database design to a production-ready chat interface, all built in a structured, test-driven workflow.
Development Speed
8 phases
Shipped in days, not weeks
Test Coverage
200+
Tests passing (TDD approach)
Response Time
<2s
RAG query to answer
Key Learnings
- ✓AI-assisted development changes the PM-engineer dynamic. Using Claude Code with specialized agents let me move from "writing specs for engineers" to "directing AI agents through implementation"—a workflow that will define the next generation of technical PMs.
- ✓RAG architecture is about trade-offs, not complexity. The core pattern (chunk → embed → retrieve → generate) is straightforward; the real skill is choosing the right chunk size, similarity threshold, and prompt structure to get useful answers.
- ✓Products differentiate through execution, not features. Every PM could list "built a chatbot" on their portfolio. Actually shipping one—with tests, security reviews, and production-quality code—demonstrates a fundamentally different level of capability.
- ✓AI products need evals the way traditional products need analytics. Off-the-shelf metrics like "hallucination score" disconnect from real user problems—starting with error analysis of actual interactions and categorizing failure modes produces evaluators that actually improve the product.
Tech Stack
Links
Have questions about this project or want to know more?
Ask the AI Assistant