RAG Chatbot Build Guide: From Plan to Production
How We Built It, What Broke, and Where It’s Going
The Vision
Turn 212+ blog posts into a searchable AI knowledge base. Ask questions in natural language, get answers grounded in actual blog content with source citations. No hallucinations — every answer traced back to a real post.
The constraint: build it entirely on existing infrastructure. No new services, no new vendors, no new monthly bills. Use the n8n-fabric stack that was already running.
How It Was Supposed to Work
The Original Plan
Two n8n workflows, six API calls, done in an afternoon:
-
Indexing Workflow — Manual trigger reads all 212 markdown files from the Astro blog directory, parses frontmatter, enriches content with metadata, chunks it, embeds it with OpenAI
text-embedding-3-small(1536 dims), stores vectors in Qdrant Cloud. -
Chat Workflow — Chat Trigger feeds into an AI Agent backed by Claude Sonnet 4.5. Agent has a Qdrant vector search tool. User asks question → agent searches vectors → retrieves relevant chunks → synthesizes answer with citations.
Expected timeline: 2-3 hours. Actual timeline: ~12 hours of debugging across two sessions.
The Original Stack Choice
| Component | Original Choice | What We Ended Up With |
|---|---|---|
| Embeddings | OpenAI text-embedding-3-small (1536d) | Voyage AI voyage-3 (1024d) |
| Chat LLM | Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) | Claude 3.5 Haiku (claude-3-5-haiku-20241022) |
| Vector DB | Qdrant Cloud | Qdrant Cloud (same, but recreated collection) |
| Orchestrator | n8n workflows only | n8n workflows + Python rate-limiting script |
| Auth mode | Chat Trigger internal only | Chat Trigger public + hosted chat |
What We Actually Had to Do
Problem 1: Docker Filesystem Isolation
The wall: n8n runs in Docker. Blog posts live on the host at /Users/ryandahlberg/Projects/blog/src/content/posts/. The container can’t see host files.
The fix: Add a read-only volume mount to docker-compose.yml:
volumes:
- n8n_data:/home/node/.n8n
- /Users/ryandahlberg/Projects/blog/src/content/posts:/data/blog-posts:ro
Inside the container, blog posts appear at /data/blog-posts/.
Problem 2: n8n Code Node Sandbox
The wall: n8n sandboxes JavaScript in Code nodes. require('fs') throws Module 'fs' is disallowed.
The fix: Environment variable in docker-compose.yml:
environment:
- NODE_FUNCTION_ALLOW_BUILTIN=fs,path
One line, one restart. But discovering this took far longer than writing it.
Problem 3: Qdrant Cloud Key Confusion
The wall: Qdrant Cloud has two key types:
- Management keys — UUID|token format, gRPC-based, for cluster CRUD operations
- Database API keys — JWT format, REST-based, for reading/writing vectors
We spent significant time authenticating with management keys against the REST data API (and vice versa). They return 401 either way with no indication you’re using the wrong key type.
The fix: Went into the Qdrant Cloud dashboard in the browser, found the cluster, navigated to Access Management, and got the correct JWT database key.
Problem 4: OpenAI Quota Wall
The wall: OpenAI API returned 429 — You exceeded your current quota. No credits on the key.
The fix: Switched to Voyage AI (acquired by Anthropic). Key insight: n8n’s OpenAI Embeddings node accepts a custom base URL. We created an “OpenAI” credential in n8n but pointed it at https://api.voyageai.com/v1 with a Voyage AI key. The node worked without code changes because Voyage AI implements the OpenAI-compatible API standard.
The consequence: Voyage AI’s voyage-3 produces 1024-dimensional embeddings vs. OpenAI’s 1536. This meant destroying the Qdrant collection and recreating it with the correct dimensions.
# Delete old collection
curl -X DELETE "https://CLUSTER_URL:6333/collections/ryops_blog" \
-H "api-key: DB_KEY"
# Create with 1024 dimensions
curl -X PUT "https://CLUSTER_URL:6333/collections/ryops_blog" \
-H "api-key: DB_KEY" \
-H "Content-Type: application/json" \
-d '{"vectors": {"size": 1024, "distance": "Cosine"}}'
Problem 5: Rate Limit Marathon
The wall: Voyage AI free tier: 3 requests/minute, 10,000 tokens/minute. We had ~2,500 chunks to embed. The n8n workflow has no built-in rate limiting — it fired all requests immediately and hit 429 after 36 posts.
The fix: A standalone Python script (index_remaining_v3.py) that:
- Batches 8 chunks per API request (~3,000 tokens, under the 10K TPM limit)
- Waits 25 seconds between requests (under the 3 RPM limit)
- Tracks already-indexed filenames via Qdrant scroll API
- Resumes from where it left off on restart
- Exponential backoff on 429 errors (caps at 10 minutes)
The numbers: 2,066 chunks across 175 remaining posts, zero failed batches, ~108 minutes total.
The first 36 posts (845 chunks) were indexed by the n8n workflow before it hit rate limits. The Python script handled the remaining 175 posts.
Total: 2,911 vector points across 212+ posts.
Problem 6: n8n Resource Locator Format
The wall: n8n’s Anthropic Chat Model node (typeVersion 1.3) stores the model name as a resource locator object, not a plain string. Setting "model": "claude-sonnet-4-5-20250929" via the API causes "Could not get parameter" at runtime.
The fix: Use the __rl resource locator format:
{
"model": {
"__rl": true,
"mode": "list",
"value": "claude-3-5-haiku-20241022",
"cachedResultName": "Claude 3.5 Haiku(20241022)"
}
}
This is an n8n internals detail that isn’t documented anywhere. You’d only discover it by reading the node source code inside the Docker container.
Problem 7: Chat Trigger Webhook Not Registering
The wall: After activating the workflow, the chat URL returned 404. Logs showed "The requested webhook is not registered".
The fix: The Chat Trigger node has a public parameter (default: false). When false, it only works through n8n’s internal editor — no external webhook is registered. Setting public: true and mode: "hostedChat" registers the webhook and serves the built-in chat UI.
{
"public": true,
"mode": "hostedChat",
"options": {}
}
Problem 8: Anthropic Model 404
The wall: claude-sonnet-4-5-20250929 and claude-3-5-sonnet-20241022 both returned "The resource you are requesting could not be found". The API key doesn’t have access to Sonnet-tier models.
The fix: Switched to claude-3-5-haiku-20241022, which works with the current key and still produces excellent RAG responses with proper citations.
What It Does Today
Live System
- Chat URL:
http://localhost:5678/webhook/blog-rag-chat/chat - Hosted UI: n8n serves a built-in chat widget — no frontend code needed
- Knowledge base: 2,911 vector chunks from 212+ blog posts
- Source attribution: Every answer cites post titles with direct URLs
- Multi-turn: 20-message conversation memory for follow-up questions
Sample Interactions
Q: “What MCP servers has Ryan built?” → Returns 6 MCP servers (n8n, Checkmk, Wazuh, UniFi, Netdata, Talos) with links to each post.
Q: “What security topics has Ryan written about?” → Returns OWASP, Zero Trust, Incident Response, Kubernetes Security, Post-Mortem Analysis — all with source links.
Q: “What is Cortex and how does it work?” → Returns workflow executor details, zero-downtime prompt migrations, and task lineage — all sourced from specific blog posts.
The Final Stack
| Layer | Component | Details |
|---|---|---|
| Orchestration | n8n v2.6.3 | Self-hosted Docker, AI/LangChain nodes |
| Vector DB | Qdrant Cloud | 1024-dim Cosine, ryops_blog collection |
| Embeddings | Voyage AI voyage-3 | Via OpenAI-compatible API |
| Chat LLM | Claude 3.5 Haiku | Temperature 0.3, 4096 max tokens |
| Infrastructure | Docker Compose | n8n + PostgreSQL + local Qdrant + Redis |
| Content Source | Astro blog | 212+ markdown posts with YAML frontmatter |
Future Possibilities
Near-Term
- Auto-reindexing webhook — Trigger indexing when new posts are published (git push → webhook → index new post)
- Metadata filtering — Search within specific categories, date ranges, or tag sets
- Hybrid search — Combine vector similarity with BM25 keyword matching for better recall
- Upgrade to Sonnet — Update the Anthropic API key to one with Sonnet access for higher-quality reasoning
- Public-facing widget — Embed the chat on the live blog at ry-ops.dev
Medium-Term
- Evaluation pipeline — Automated retrieval quality testing with golden question/answer sets
- Streaming responses — Switch from batch response to token streaming for better UX
- Multi-source RAG — Index GitHub repos, project READMEs, and documentation alongside blog posts
- MCP integration — Expose the RAG chatbot as an MCP tool so Claude Code can query blog knowledge
Long-Term
- Agentic RAG — Multi-step reasoning where the agent decomposes complex questions, searches multiple times, and synthesizes across sources
- Knowledge graph overlay — Build entity relationships between blog posts (which posts reference which projects, which concepts build on others)
- Self-updating knowledge base — The chatbot monitors for new posts, indexes them automatically, and can summarize what’s new
- Cross-fabric integration — Cortex workflows that use the RAG system for contextual decision-making
Architecture
High-Level System Architecture
graph TB
subgraph "HOST MACHINE"
Blog[Astro Blog<br/>212+ .md files<br/>/Projects/blog/]
subgraph "n8n-fabric Docker Compose"
N8N[n8n<br/>:5678<br/>AI/LangChain Nodes]
PG[(PostgreSQL<br/>:5432<br/>Workflow Storage)]
LocalQ[(Local Qdrant<br/>:6343)]
Redis[(Redis<br/>:6389<br/>Cache)]
end
end
subgraph "EXTERNAL SERVICES"
Voyage[Voyage AI<br/>voyage-3<br/>1024 dims]
QCloud[(Qdrant Cloud<br/>ryops_blog<br/>2,911 points)]
Claude[Anthropic<br/>Claude 3.5 Haiku<br/>temp 0.3]
end
Blog -->|read-only mount| N8N
N8N --> PG
N8N --> LocalQ
N8N --> Redis
N8N -->|HTTPS| Voyage
N8N -->|HTTPS| QCloud
N8N -->|HTTPS| Claude
Indexing Workflow
graph LR
Trigger[Manual Trigger] --> List[List Blog Files<br/>Code + fs]
List --> Parse[Parse & Enrich<br/>Markdown<br/>YAML frontmatter]
Parse --> Insert[Insert into Qdrant<br/>vectorStoreQdrant]
subgraph "Insert Sub-nodes"
Load[Load Blog Content<br/>dataLoader]
Chunk[Chunk Text<br/>textSplitter<br/>1500c / 300c]
Embed[Voyage AI Embeddings<br/>voyage-3, 1024d]
Load --> Chunk
Chunk --> Embed
end
Insert -.contains.-> Load
Embed --> QCloud[(Qdrant Cloud<br/>ryops_blog)]
Chat Workflow
graph TB
ChatTrigger[Chat Trigger<br/>public, hosted<br/>webhook/blog-rag-chat/chat]
ChatTrigger --> Agent[Blog Assistant<br/>AI Agent]
subgraph "Agent Components"
LLM[Claude 3.5 Haiku<br/>temp: 0.3<br/>max: 4096]
Memory[Memory Buffer<br/>Window<br/>20 messages]
Tool[search_blog_posts<br/>Qdrant tool<br/>topK: 8]
subgraph "Tool Sub-node"
ToolEmbed[Voyage AI<br/>Embeddings<br/>voyage-3]
end
end
Agent --> LLM
Agent --> Memory
Agent --> Tool
Tool -.contains.-> ToolEmbed
Tool --> QCloud[(Qdrant Cloud)]
Content Enrichment Pipeline
graph TD
Raw[Raw Markdown File<br/>---<br/>title: 'Post Title'<br/>category: AI & ML<br/>tags: RAG, n8n<br/>---<br/># Content...]
Raw --> Enrich[Parse & Enrich<br/>Prepend metadata to body]
Enrich --> Enriched[Enriched Content<br/>Title: Post Title<br/>Category: AI & ML<br/>Tags: RAG, n8n<br/><br/># Content...]
Enriched --> ChunkProc[Chunk<br/>1500 chars<br/>300 overlap]
ChunkProc --> Chunks[Chunk 1 + metadata<br/>Chunk 2 + metadata<br/>Chunk N + metadata]
Indexing Data Flow
flowchart TD
Start([START]) --> Read[Read markdown files<br/>from /data/blog-posts/<br/>212+ .md/.mdx files]
Read --> ParseYAML[Parse YAML frontmatter<br/>Extract: title, category,<br/>tags, date, description]
ParseYAML --> Enrich[Enrich content<br/>Prepend metadata header<br/>for semantic context]
Enrich --> Chunk[Chunk text<br/>1,500 chars per chunk<br/>300 char overlap<br/>~12 chunks per post avg]
Chunk --> Batch[Batch chunks<br/>8 chunks per API request<br/>~3,000 tokens per batch]
Batch --> GenEmbed[Generate embeddings<br/>Voyage AI voyage-3<br/>→ 1024-dim float vector<br/>Rate limit: 3 RPM]
GenEmbed --> Upsert[Upsert to Qdrant Cloud<br/>UUID point ID<br/>Vector + payload<br/>content + metadata]
Upsert --> Wait[Wait 25 seconds<br/>rate limit compliance<br/>Then → next batch]
Wait --> More{More chunks?}
More -->|yes| Batch
More -->|no| Done([DONE<br/>2,911 vectors stored])
Chat Query Flow
flowchart TD
User([USER<br/>'What is Cortex?']) --> ChatTrigger[Chat Trigger receives<br/>message via webhook<br/>or hosted chat UI]
ChatTrigger --> AgentProc[AI Agent processes<br/>System prompt instructs:<br/>'Always search blog<br/>before answering']
AgentProc --> ToolCall[Agent calls tool:<br/>search_blog_posts<br/>query: 'Cortex']
ToolCall --> EmbedQuery[Embed query<br/>'Cortex' → Voyage AI<br/>→ 1024-dim vector]
EmbedQuery --> VectorSearch[Vector search<br/>Qdrant Cloud cosine<br/>similarity search<br/>→ top 8 matching chunks]
VectorSearch --> Context[Context assembly<br/>8 chunks with metadata<br/>injected into agent<br/>context window]
Context --> Generate[Claude generates response<br/>Grounded in retrieved<br/>chunks, includes source<br/>citations with URLs]
Generate --> Response[Response returned<br/>to chat UI or webhook<br/>caller with citations]
Error Recovery Flow
flowchart TD
Request[API Request] --> Status{Status?}
Status -->|200| OK([OK])
Status -->|429| RateLimit[Rate Limit]
Status -->|other| LogError[Log error<br/>Wait 30s<br/>Retry]
RateLimit --> ExpBackoff[Exponential backoff<br/>60s → 120s<br/>→ 240s → ...<br/>cap 600s]
ExpBackoff --> RetryCheck{Retry #<br/>< 8?}
RetryCheck -->|yes| Retry([Retry])
RetryCheck -->|no| Failed[Log FAILED<br/>Skip batch<br/>Continue]
Technical Deep Dive
Qdrant Cloud Collection Structure
{
"collection": "ryops_blog",
"cluster": "us-west-1, AWS",
"vectors": {
"size": 1024,
"distance": "Cosine"
},
"points": 2911,
"point_structure": {
"id": "UUID",
"vector": "[1024 floats]",
"payload": {
"content": "enriched chunk text",
"metadata": {
"title": "Post Title",
"category": "AI & ML",
"date": "2026-01-15",
"tags": "RAG, n8n, Qdrant",
"slug": "post-slug",
"url": "https://ry-ops.dev/...",
"filename": "2026-01-15-post.md"
}
}
}
}
n8n Credential Configuration
Voyage AI (as OpenAI type)
{
"type": "openAiApi",
"baseUrl": "https://api.voyageai.com/v1",
"apiKey": "pa-...",
"note": "OpenAI-compatible API standard"
}
Anthropic
{
"type": "anthropicApi",
"apiKey": "sk-ant-...",
"modelAccess": "Haiku tier only"
}
Qdrant Cloud
{
"type": "qdrantApi",
"url": "https://8b04bf09-...cloud.qdrant.io:6333",
"apiKey": "eyJ... (JWT database key, NOT management key)"
}
The Takeaway
Building RAG systems in production is less about the happy path and more about navigating the obstacles:
- Docker volume mounts for filesystem access
- Sandbox escapes for code execution
- API key confusion across similar-looking credentials
- Quota walls forcing vendor pivots
- Rate limits requiring custom batching logic
- Undocumented internal data structures in orchestration tools
- Model access tiers blocking preferred choices
The final system works beautifully: 2,911 vectors, natural language queries, source-cited answers, multi-turn conversations, all running on existing infrastructure.
But getting there required 12 hours of debugging, credential juggling, rate-limit arithmetic, and reading Docker container source code.
The plan was 2-3 hours. The reality was a masterclass in production engineering.
That’s RAG.