There’s a service running in my homelab right now that I almost lost completely. It was accidentally decommissioned during a cluster migration, sat at zero replicas for weeks, and came back online this morning with a silent bug that nobody noticed — its Anthropic API key had expired, so every time you asked it a question, it answered from memory alone, with no awareness of the 167 tools sitting right next to it.
That service is fabric-chat. And getting it back online properly forced me to think hard about what it actually is, what it’s become, and where it’s going.
What Chat Was
fabric-chat started as a conversation layer. The idea was simple: persist Claude chat sessions to a state backend, support multi-turn threading, and make past conversations searchable by meaning rather than keyword. GitHub as the durable log. Qdrant as the semantic index. Voyage AI for embeddings.
In its early form it was exactly that — a thin wrapper around the Anthropic messages API with a clean MCP interface. Twelve tools: create a session, send a message, search past sessions, inject context, fork a thread. The whole thing came in under 1,800 lines of TypeScript.
The agentic loop — the part where Claude actually calls tools and acts on the world — was added later as an ambient upgrade. Set FABRIC_GATEWAY_URL and suddenly chat_message_send stops being a simple round-trip and becomes a full orchestration engine. Claude sees all available tools, decides what to call, fires them in parallel, folds the results back in, and keeps going until it has an answer. Up to ten rounds.
That’s when fabric-chat stopped being a conversation wrapper and became something else: a reasoning interface to your entire homelab.
What Chat Is Now
Right now, fabric-chat is one of ten apps registered behind fabric-gateway. The gateway exposes 167 tools across k8s, proxmox, unifi, sandfly, tailscale, cloudflare, git, cve, and chat itself. Every chat_message_send call gets all 167 tool schemas loaded into context before Claude writes its first word.
It works. When the API key is valid, it really works. Asked to verify its own access today, it made five parallel tool calls — k8s, proxmox, unifi, tailscale, sandfly — and came back with a clean status table. That’s the dream. An AI assistant that doesn’t just know about your infrastructure but can actually see it, poke it, and reason about it in real time.
But there are three problems quietly undermining it.
The token overhead is enormous. At roughly 100 tokens per tool schema, 167 tools means ~17,000 tokens of overhead before a word is exchanged. With claude-haiku at $0.25/M input tokens that’s a rounding error — but with claude-opus-4-6 running the daily ops briefing, that overhead compounds. More importantly, it crowds out the actual conversation. You’re paying context-window real estate for tools that will never be called in that session.
GitHub is the wrong message store. Every message — user turn, assistant turn, injected context — triggers two GitHub API calls: read the JSONL file, append, write it back. The agentic loop can fire ten rounds. Ten rounds, multiple tool results per round, means potentially 20+ GitHub API writes in a single chat_message_send. GitHub’s Contents API was designed for code. Not for append-heavy conversation logs. Meanwhile, every message is already being embedded into Qdrant. The GitHub writes are redundant — a holdover from when git-steer’s pattern of using the state repo as the source of truth made sense for infra auditing.
The 836-line adapter file. adapters/env.ts does everything: GitHub client, Qdrant client, Voyage AI embeddings, Anthropic completions, fabric gateway integration. When the API key broke, there was no clean boundary to inspect. Everything is fused into one file. Changing the state backend means touching the same file as changing the embedding model.
None of these are fatal. But all three are worth fixing before the codebase gets any bigger.
Today’s Plan: A, B, C
The changes land in order of impact.
A — Tool relevance pre-filter. Before the agentic loop starts, add a lightweight classification step: send the user’s message to a fast model with just the tool category names (not full schemas), ask it to identify which categories are relevant, then only load those schemas. A message about pod crashes doesn’t need cloudflare DNS tools in context. Reduces per-call overhead from 167 tools to ~20. Faster responses, lower cost, more context window left for the actual conversation.
B — Drop GitHub from message storage. Messages are already in Qdrant. Sessions are already in Qdrant. Make Qdrant the single source of truth for conversation state and cut the GitHub dependency from chat entirely. chat_message_send goes from 3-5 GitHub API calls to 1 Qdrant upsert. Session listing goes from a GitHub file read to a Qdrant scroll. The git-steer-state repo stays — but for its intended purpose: infra jobs, audit logs, PRs. Not chat transcripts.
C — Split env.ts into focused adapters. No behavior change. Just: qdrant.ts, anthropic.ts, gateway.ts. Three files, each with a clear responsibility. The kind of change that makes 2 AM debugging sessions survivable.
The Bigger Question: Where Does Chat Live?
fabric-chat currently lives in its own repo (chat-work). The gateway, k8s app, git app, and all the other fabric integrations each have their own repos too. It works, but it creates friction — cross-repo dependency management, separate CI pipelines, separate versioning.
The natural home for all of these is fabric-forge as a true monorepo: packages/fabric-chat/, packages/fabric-gateway/, packages/fabric-k8s/. Shared types, single build pipeline, easier to reason about the whole system.
But that’s a refactor, not a fix. And right now there are more important things to fix than repo topology.
The more interesting question is whether fabric-chat should stay as a general-purpose conversation layer at all, or whether it should specialize. Right now it serves two roles: a human-facing chat interface (you asking questions about your homelab) and a pipeline component (the daily briefing pipeline creates a session, injects 5000 tokens of context, asks for a summary). These are different enough use cases that they might eventually want different optimizations — different storage backends, different tool loading strategies, different session TTLs.
For now, one service doing both is fine. The A/B/C changes make it clean enough to evolve in either direction.
What Came Back Online Today
Here’s what actually happened today, for the record.
fabric-chat was at zero replicas. Restored from gitops. The Anthropic API key in fabric-chat-secrets was a different (expired) key than the one in cortex-chat-secrets — both existing in the same cluster, both supposedly for the same purpose, with nobody noticing the discrepancy because the fallback to plain completion is silent.
Fixed: patched fabric-chat-secrets with the valid key, restarted the deployment. Chat came back online with full tool access. Asked to verify, it called all five services and reported back clean.
That silent failure mode — expired key, no error, just quietly dumber responses — is itself an argument for the changes above. A service that fails loudly is fixable. A service that fails quietly is insidious.
Today’s changes are about making fabric-chat harder to break silently. Fewer moving parts, cleaner state model, better visibility into what’s actually happening when you send a message.
The goal isn’t to make it smaller. It’s to make it honest.