RAG Chatbot Build Guide: From Plan to Production

How We Built It, What Broke, and Where It’s Going

The Vision

Turn 212+ blog posts into a searchable AI knowledge base. Ask questions in natural language, get answers grounded in actual blog content with source citations. No hallucinations — every answer traced back to a real post.

The constraint: build it entirely on existing infrastructure. No new services, no new vendors, no new monthly bills. Use the n8n-fabric stack that was already running.

How It Was Supposed to Work

The Original Plan

Two n8n workflows, six API calls, done in an afternoon:

Indexing Workflow — Manual trigger reads all 212 markdown files from the Astro blog directory, parses frontmatter, enriches content with metadata, chunks it, embeds it with OpenAI text-embedding-3-small (1536 dims), stores vectors in Qdrant Cloud.
Chat Workflow — Chat Trigger feeds into an AI Agent backed by Claude Sonnet 4.5. Agent has a Qdrant vector search tool. User asks question → agent searches vectors → retrieves relevant chunks → synthesizes answer with citations.

Expected timeline: 2-3 hours. Actual timeline: ~12 hours of debugging across two sessions.

The Original Stack Choice

Component	Original Choice	What We Ended Up With
Embeddings	OpenAI `text-embedding-3-small` (1536d)	Voyage AI `voyage-3` (1024d)
Chat LLM	Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)	Claude 3.5 Haiku (`claude-3-5-haiku-20241022`)
Vector DB	Qdrant Cloud	Qdrant Cloud (same, but recreated collection)
Orchestrator	n8n workflows only	n8n workflows + Python rate-limiting script
Auth mode	Chat Trigger internal only	Chat Trigger public + hosted chat

What We Actually Had to Do

Problem 1: Docker Filesystem Isolation

The wall: n8n runs in Docker. Blog posts live on the host at /Users/ryandahlberg/Projects/blog/src/content/posts/. The container can’t see host files.

The fix: Add a read-only volume mount to docker-compose.yml:

volumes:
  - n8n_data:/home/node/.n8n
  - /Users/ryandahlberg/Projects/blog/src/content/posts:/data/blog-posts:ro

Inside the container, blog posts appear at /data/blog-posts/.

Problem 2: n8n Code Node Sandbox

The wall: n8n sandboxes JavaScript in Code nodes. require('fs') throws Module 'fs' is disallowed.

The fix: Environment variable in docker-compose.yml:

environment:
  - NODE_FUNCTION_ALLOW_BUILTIN=fs,path

One line, one restart. But discovering this took far longer than writing it.

Problem 3: Qdrant Cloud Key Confusion

The wall: Qdrant Cloud has two key types:

Management keys — UUID|token format, gRPC-based, for cluster CRUD operations
Database API keys — JWT format, REST-based, for reading/writing vectors

We spent significant time authenticating with management keys against the REST data API (and vice versa). They return 401 either way with no indication you’re using the wrong key type.

The fix: Went into the Qdrant Cloud dashboard in the browser, found the cluster, navigated to Access Management, and got the correct JWT database key.

Problem 4: OpenAI Quota Wall

The wall: OpenAI API returned 429 — You exceeded your current quota. No credits on the key.

The fix: Switched to Voyage AI (acquired by Anthropic). Key insight: n8n’s OpenAI Embeddings node accepts a custom base URL. We created an “OpenAI” credential in n8n but pointed it at https://api.voyageai.com/v1 with a Voyage AI key. The node worked without code changes because Voyage AI implements the OpenAI-compatible API standard.

The consequence: Voyage AI’s voyage-3 produces 1024-dimensional embeddings vs. OpenAI’s 1536. This meant destroying the Qdrant collection and recreating it with the correct dimensions.

# Delete old collection
curl -X DELETE "https://CLUSTER_URL:6333/collections/ryops_blog" \
  -H "api-key: DB_KEY"

# Create with 1024 dimensions
curl -X PUT "https://CLUSTER_URL:6333/collections/ryops_blog" \
  -H "api-key: DB_KEY" \
  -H "Content-Type: application/json" \
  -d '{"vectors": {"size": 1024, "distance": "Cosine"}}'

Problem 5: Rate Limit Marathon

The wall: Voyage AI free tier: 3 requests/minute, 10,000 tokens/minute. We had ~2,500 chunks to embed. The n8n workflow has no built-in rate limiting — it fired all requests immediately and hit 429 after 36 posts.

The fix: A standalone Python script (index_remaining_v3.py) that:

Batches 8 chunks per API request (~3,000 tokens, under the 10K TPM limit)
Waits 25 seconds between requests (under the 3 RPM limit)
Tracks already-indexed filenames via Qdrant scroll API
Resumes from where it left off on restart
Exponential backoff on 429 errors (caps at 10 minutes)

The numbers: 2,066 chunks across 175 remaining posts, zero failed batches, ~108 minutes total.

The first 36 posts (845 chunks) were indexed by the n8n workflow before it hit rate limits. The Python script handled the remaining 175 posts.

Total: 2,911 vector points across 212+ posts.

Problem 6: n8n Resource Locator Format

The wall: n8n’s Anthropic Chat Model node (typeVersion 1.3) stores the model name as a resource locator object, not a plain string. Setting "model": "claude-sonnet-4-5-20250929" via the API causes "Could not get parameter" at runtime.

The fix: Use the __rl resource locator format:

{
  "model": {
    "__rl": true,
    "mode": "list",
    "value": "claude-3-5-haiku-20241022",
    "cachedResultName": "Claude 3.5 Haiku(20241022)"
  }
}

This is an n8n internals detail that isn’t documented anywhere. You’d only discover it by reading the node source code inside the Docker container.

Problem 7: Chat Trigger Webhook Not Registering

The wall: After activating the workflow, the chat URL returned 404. Logs showed "The requested webhook is not registered".

The fix: The Chat Trigger node has a public parameter (default: false). When false, it only works through n8n’s internal editor — no external webhook is registered. Setting public: true and mode: "hostedChat" registers the webhook and serves the built-in chat UI.

{
  "public": true,
  "mode": "hostedChat",
  "options": {}
}

Problem 8: Anthropic Model 404

The wall: claude-sonnet-4-5-20250929 and claude-3-5-sonnet-20241022 both returned "The resource you are requesting could not be found". The API key doesn’t have access to Sonnet-tier models.

The fix: Switched to claude-3-5-haiku-20241022, which works with the current key and still produces excellent RAG responses with proper citations.

What It Does Today

Live System

Chat URL: http://localhost:5678/webhook/blog-rag-chat/chat
Hosted UI: n8n serves a built-in chat widget — no frontend code needed
Knowledge base: 2,911 vector chunks from 212+ blog posts
Source attribution: Every answer cites post titles with direct URLs
Multi-turn: 20-message conversation memory for follow-up questions

Sample Interactions

Q: “What MCP servers has Ryan built?” → Returns 6 MCP servers (n8n, Checkmk, Wazuh, UniFi, Netdata, Talos) with links to each post.

Q: “What security topics has Ryan written about?” → Returns OWASP, Zero Trust, Incident Response, Kubernetes Security, Post-Mortem Analysis — all with source links.

Q: “What is Cortex and how does it work?” → Returns workflow executor details, zero-downtime prompt migrations, and task lineage — all sourced from specific blog posts.

The Final Stack

Layer	Component	Details
Orchestration	n8n v2.6.3	Self-hosted Docker, AI/LangChain nodes
Vector DB	Qdrant Cloud	1024-dim Cosine, `ryops_blog` collection
Embeddings	Voyage AI `voyage-3`	Via OpenAI-compatible API
Chat LLM	Claude 3.5 Haiku	Temperature 0.3, 4096 max tokens
Infrastructure	Docker Compose	n8n + PostgreSQL + local Qdrant + Redis
Content Source	Astro blog	212+ markdown posts with YAML frontmatter

Future Possibilities

Near-Term

Auto-reindexing webhook — Trigger indexing when new posts are published (git push → webhook → index new post)
Metadata filtering — Search within specific categories, date ranges, or tag sets
Hybrid search — Combine vector similarity with BM25 keyword matching for better recall
Upgrade to Sonnet — Update the Anthropic API key to one with Sonnet access for higher-quality reasoning
Public-facing widget — Embed the chat on the live blog at ry-ops.dev

Medium-Term

Evaluation pipeline — Automated retrieval quality testing with golden question/answer sets
Streaming responses — Switch from batch response to token streaming for better UX
Multi-source RAG — Index GitHub repos, project READMEs, and documentation alongside blog posts
MCP integration — Expose the RAG chatbot as an MCP tool so Claude Code can query blog knowledge

Long-Term

Agentic RAG — Multi-step reasoning where the agent decomposes complex questions, searches multiple times, and synthesizes across sources
Knowledge graph overlay — Build entity relationships between blog posts (which posts reference which projects, which concepts build on others)
Self-updating knowledge base — The chatbot monitors for new posts, indexes them automatically, and can summarize what’s new
Cross-fabric integration — Cortex workflows that use the RAG system for contextual decision-making

Architecture

High-Level System Architecture

graph TB
    subgraph "HOST MACHINE"
        Blog[Astro Blog<br/>212+ .md files<br/>/Projects/blog/]

        subgraph "n8n-fabric Docker Compose"
            N8N[n8n<br/>:5678<br/>AI/LangChain Nodes]
            PG[(PostgreSQL<br/>:5432<br/>Workflow Storage)]
            LocalQ[(Local Qdrant<br/>:6343)]
            Redis[(Redis<br/>:6389<br/>Cache)]
        end
    end

    subgraph "EXTERNAL SERVICES"
        Voyage[Voyage AI<br/>voyage-3<br/>1024 dims]
        QCloud[(Qdrant Cloud<br/>ryops_blog<br/>2,911 points)]
        Claude[Anthropic<br/>Claude 3.5 Haiku<br/>temp 0.3]
    end

    Blog -->|read-only mount| N8N
    N8N --> PG
    N8N --> LocalQ
    N8N --> Redis
    N8N -->|HTTPS| Voyage
    N8N -->|HTTPS| QCloud
    N8N -->|HTTPS| Claude

Indexing Workflow

graph LR
    Trigger[Manual Trigger] --> List[List Blog Files<br/>Code + fs]
    List --> Parse[Parse & Enrich<br/>Markdown<br/>YAML frontmatter]
    Parse --> Insert[Insert into Qdrant<br/>vectorStoreQdrant]

    subgraph "Insert Sub-nodes"
        Load[Load Blog Content<br/>dataLoader]
        Chunk[Chunk Text<br/>textSplitter<br/>1500c / 300c]
        Embed[Voyage AI Embeddings<br/>voyage-3, 1024d]

        Load --> Chunk
        Chunk --> Embed
    end

    Insert -.contains.-> Load
    Embed --> QCloud[(Qdrant Cloud<br/>ryops_blog)]

Chat Workflow

graph TB
    ChatTrigger[Chat Trigger<br/>public, hosted<br/>webhook/blog-rag-chat/chat]

    ChatTrigger --> Agent[Blog Assistant<br/>AI Agent]

    subgraph "Agent Components"
        LLM[Claude 3.5 Haiku<br/>temp: 0.3<br/>max: 4096]
        Memory[Memory Buffer<br/>Window<br/>20 messages]
        Tool[search_blog_posts<br/>Qdrant tool<br/>topK: 8]

        subgraph "Tool Sub-node"
            ToolEmbed[Voyage AI<br/>Embeddings<br/>voyage-3]
        end
    end

    Agent --> LLM
    Agent --> Memory
    Agent --> Tool
    Tool -.contains.-> ToolEmbed
    Tool --> QCloud[(Qdrant Cloud)]

Content Enrichment Pipeline

graph TD
    Raw[Raw Markdown File<br/>---<br/>title: 'Post Title'<br/>category: AI & ML<br/>tags: RAG, n8n<br/>---<br/># Content...]

    Raw --> Enrich[Parse & Enrich<br/>Prepend metadata to body]

    Enrich --> Enriched[Enriched Content<br/>Title: Post Title<br/>Category: AI & ML<br/>Tags: RAG, n8n<br/><br/># Content...]

    Enriched --> ChunkProc[Chunk<br/>1500 chars<br/>300 overlap]

    ChunkProc --> Chunks[Chunk 1 + metadata<br/>Chunk 2 + metadata<br/>Chunk N + metadata]

Indexing Data Flow

flowchart TD
    Start([START]) --> Read[Read markdown files<br/>from /data/blog-posts/<br/>212+ .md/.mdx files]
    Read --> ParseYAML[Parse YAML frontmatter<br/>Extract: title, category,<br/>tags, date, description]
    ParseYAML --> Enrich[Enrich content<br/>Prepend metadata header<br/>for semantic context]
    Enrich --> Chunk[Chunk text<br/>1,500 chars per chunk<br/>300 char overlap<br/>~12 chunks per post avg]
    Chunk --> Batch[Batch chunks<br/>8 chunks per API request<br/>~3,000 tokens per batch]
    Batch --> GenEmbed[Generate embeddings<br/>Voyage AI voyage-3<br/>→ 1024-dim float vector<br/>Rate limit: 3 RPM]
    GenEmbed --> Upsert[Upsert to Qdrant Cloud<br/>UUID point ID<br/>Vector + payload<br/>content + metadata]
    Upsert --> Wait[Wait 25 seconds<br/>rate limit compliance<br/>Then → next batch]
    Wait --> More{More chunks?}
    More -->|yes| Batch
    More -->|no| Done([DONE<br/>2,911 vectors stored])

Chat Query Flow

flowchart TD
    User([USER<br/>'What is Cortex?']) --> ChatTrigger[Chat Trigger receives<br/>message via webhook<br/>or hosted chat UI]
    ChatTrigger --> AgentProc[AI Agent processes<br/>System prompt instructs:<br/>'Always search blog<br/>before answering']
    AgentProc --> ToolCall[Agent calls tool:<br/>search_blog_posts<br/>query: 'Cortex']
    ToolCall --> EmbedQuery[Embed query<br/>'Cortex' → Voyage AI<br/>→ 1024-dim vector]
    EmbedQuery --> VectorSearch[Vector search<br/>Qdrant Cloud cosine<br/>similarity search<br/>→ top 8 matching chunks]
    VectorSearch --> Context[Context assembly<br/>8 chunks with metadata<br/>injected into agent<br/>context window]
    Context --> Generate[Claude generates response<br/>Grounded in retrieved<br/>chunks, includes source<br/>citations with URLs]
    Generate --> Response[Response returned<br/>to chat UI or webhook<br/>caller with citations]

Error Recovery Flow

flowchart TD
    Request[API Request] --> Status{Status?}
    Status -->|200| OK([OK])
    Status -->|429| RateLimit[Rate Limit]
    Status -->|other| LogError[Log error<br/>Wait 30s<br/>Retry]

    RateLimit --> ExpBackoff[Exponential backoff<br/>60s → 120s<br/>→ 240s → ...<br/>cap 600s]
    ExpBackoff --> RetryCheck{Retry #<br/>< 8?}
    RetryCheck -->|yes| Retry([Retry])
    RetryCheck -->|no| Failed[Log FAILED<br/>Skip batch<br/>Continue]

Technical Deep Dive

Qdrant Cloud Collection Structure

{
  "collection": "ryops_blog",
  "cluster": "us-west-1, AWS",
  "vectors": {
    "size": 1024,
    "distance": "Cosine"
  },
  "points": 2911,
  "point_structure": {
    "id": "UUID",
    "vector": "[1024 floats]",
    "payload": {
      "content": "enriched chunk text",
      "metadata": {
        "title": "Post Title",
        "category": "AI & ML",
        "date": "2026-01-15",
        "tags": "RAG, n8n, Qdrant",
        "slug": "post-slug",
        "url": "https://ry-ops.dev/...",
        "filename": "2026-01-15-post.md"
      }
    }
  }
}

n8n Credential Configuration

Voyage AI (as OpenAI type)

{
  "type": "openAiApi",
  "baseUrl": "https://api.voyageai.com/v1",
  "apiKey": "pa-...",
  "note": "OpenAI-compatible API standard"
}

Anthropic

{
  "type": "anthropicApi",
  "apiKey": "sk-ant-...",
  "modelAccess": "Haiku tier only"
}

Qdrant Cloud

{
  "type": "qdrantApi",
  "url": "https://8b04bf09-...cloud.qdrant.io:6333",
  "apiKey": "eyJ... (JWT database key, NOT management key)"
}

The Takeaway

Building RAG systems in production is less about the happy path and more about navigating the obstacles:

Docker volume mounts for filesystem access
Sandbox escapes for code execution
API key confusion across similar-looking credentials
Quota walls forcing vendor pivots
Rate limits requiring custom batching logic
Undocumented internal data structures in orchestration tools
Model access tiers blocking preferred choices

The final system works beautifully: 2,911 vectors, natural language queries, source-cited answers, multi-turn conversations, all running on existing infrastructure.

But getting there required 12 hours of debugging, credential juggling, rate-limit arithmetic, and reading Docker container source code.

The plan was 2-3 hours. The reality was a masterclass in production engineering.

That’s RAG.

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Watching Infrastructure Learn From Itself: A Claude Code Reflection

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Concept: Homomorphic encryption techniques for secure computation on encrypted data