Building the Cortex Fabric Network: A Day of Infrastructure Evolution
The Starting Point
Today’s session began where I left off from an earlier conversation that had run out of context. The previous work had established the foundation: a Chat Fabric architecture for Cortex, UniFi fabric integration, SSE streaming fixes, and UniFi MCP server authentication repairs. My todo list showed ambitious goals ahead - creating individual fabric activators for GitHub, Cloudflare, Proxmox, Kubernetes, Sandfly, and Cortex itself.
When I returned with a simple “let’s continue,” the real work began.
The Challenge
My vision was clear: transform Cortex from a monolithic system into a distributed fabric network where each domain (networking, virtualization, Kubernetes, security, etc.) has its own intelligent activator layer. Each fabric would:
- Consume tasks from Redis Streams
- Discover and call MCP (Model Context Protocol) servers
- Use Claude to process queries with domain-specific context
- Publish results back to a unified stream
The architecture looked simple on paper:
Chat Request → Intent Classifier → Redis Streams → Fabric Activators → MCP Servers
↓
cortex.results stream
But making it “light up” - that was the challenge.
Creating the Fabric Activators
Claude and I built six new fabric directories, each containing:
- ApplicationSet for ArgoCD deployment
- Helm Chart with values and templates
- Python Activator that handles task consumption and MCP tool execution
The New Fabrics
| Fabric | Purpose | Redis Stream |
|---|---|---|
| cortex-github-fabric | Repository ops, PRs, issues, workflows | cortex.github.tasks |
| cortex-cloudflare-fabric | DNS, tunnels, WAF, CDN | cortex.cloudflare.tasks |
| cortex-sandfly-fabric | Security scanning, threat alerts | cortex.sandfly.tasks |
| cortex-proxmox-fabric | VMs, containers, LXC, storage | cortex.proxmox.tasks |
| cortex-kubernetes-fabric | Pods, deployments, services | cortex.kubernetes.tasks |
| cortex-cortex-fabric | Meta-operations, agent registry | cortex.cortex.tasks |
Each activator follows the same pattern - a FastAPI application that:
- Connects to Redis on startup
- Creates consumer groups for its task stream
- Discovers tools from its MCP server
- Processes queries using Claude with tool access
- Publishes results back to the central results stream
The Intent Classifier Evolution
The intent classifier grew from handling a few domains to understanding eight categories:
KEYWORD_MAPPINGS = {
"unifi": {...}, # Network, WiFi, clients
"proxmox": {...}, # VMs, containers, storage
"kubernetes": {...}, # Pods, deployments, services
"github": {...}, # Repos, PRs, issues
"cloudflare": {...}, # DNS, tunnels, WAF
"sandfly": {...}, # Security scans, threats
"cortex": {...}, # System status, help
"automation": {...} # n8n, workflows
}
For ambiguous queries, Claude Haiku provides fast classification as a fallback.
The Protocol Problem
During the comprehensive audit, I discovered that not all MCP servers were playing nicely. The GitHub MCP server returned 500 errors, and Cloudflare MCP returned 404s.
The Root Cause: Protocol mismatch. My fabric activators speak JSON-RPC:
{"jsonrpc": "2.0", "method": "tools/list", "id": 1}
But the GitHub MCP server expected:
{"tool": "list_prs", "arguments": {"repo": "owner/name"}}
The Fix
I updated the GitHub MCP server to support both protocols:
def do_POST(self):
request = json.loads(body)
# Check if this is a JSON-RPC request
if "jsonrpc" in request:
self.handle_jsonrpc(request)
return
# Legacy format
tool_name = request.get("tool")
...
Added proper tool definitions with inputSchema for JSON-RPC compatibility, and implemented handlers for tools/list, tools/call, and initialize methods.
The Capacity Crunch
With six new pods requesting memory, the cluster pushed back. Kubernetes nodes showed 98-99% memory allocation:
k3s-master03: memory 5826Mi (98%)
k3s-worker01: memory 5868Mi (99%)
k3s-worker02: memory 5904Mi (99%)
The scheduler couldn’t find room for all the new activators. I scaled down some services with multiple replicas:
kubectl scale deployment memory-service --replicas=1
kubectl scale deployment layer-activator --replicas=1
kubectl scale deployment fabric-gateway --replicas=1
This freed enough resources for most activators to start.
The Redis Secret Mystery
Even after scheduling, pods crashed with:
redis.exceptions.AuthenticationError: Authentication required.
The fabric activators in cortex-system namespace needed the Redis secret from cortex-chat namespace. A quick secret copy solved it:
kubectl get secret cortex-chat-secrets -n cortex-chat -o yaml | \
sed 's/namespace: cortex-chat/namespace: cortex-system/' | \
kubectl apply -f -
The Final State
By the end of the session:
Running Activators
- cloudflare-activator
- github-activator
- kubernetes-activator
- proxmox-activator
- sandfly-activator
- (plus existing: infra-activator, layer-activator, security-activator)
MCP Server Status
- github-mcp: Working with JSON-RPC (6 tools)
- cloudflare-mcp: Wrapper deployed, needs initialization debugging
- proxmox-mcp: Working (21 tools)
- kubernetes-mcp: Working (11 tools)
- sandfly-mcp: Working (43 tools)
- cortex-mcp: Working (3 tools)
Pending
- cortex-fabric-activator (waiting for cluster capacity)
What I Committed
A single commit to cortex-gitops containing:
- 6 new fabric directories (~4,600 lines added)
- Updated chat-activator with 7 fabric routes
- Updated intent classifier with 8 categories
- Fixed GitHub MCP server with JSON-RPC support
- Fixed Cloudflare MCP server with HTTP wrapper
commit dedca48
Add domain-specific fabric activators and fix MCP protocol support
34 files changed, 4620 insertions(+), 77 deletions(-)
Lessons Learned
-
Protocol compatibility matters - When integrating multiple services, ensure they speak the same language (JSON-RPC vs custom formats)
-
Cluster capacity planning - Adding 6 new services at once exposed memory constraints. Consider resource requests carefully.
-
Cross-namespace secrets - Kubernetes namespaces are isolation boundaries. Secrets don’t cross them automatically.
-
ArgoCD ApplicationSets - Creating the YAML isn’t enough; you need to
kubectl applythe ApplicationSet to register it. -
Stdio-based MCP servers - They require proper initialization sequences before responding to tool calls.
What’s Next
- Debug the Cloudflare MCP initialization sequence
- Scale up cluster capacity for remaining activators
- Test end-to-end chat-to-fabric routing
- Add monitoring and alerting for fabric health
- Consider adding more fabrics (n8n, Tailscale, etc.)
The Journey Continues
What started as “let’s continue” evolved into building a complete distributed fabric network for Cortex. Six new intelligent activators, protocol fixes, capacity management, and secret propagation - all in a day’s work.
The Cortex platform is becoming what I envisioned it to be: a network of specialized AI agents, each an expert in its domain, working together to manage my infrastructure through natural conversation.
“Let’s make sure anything and everything lights up.” - And so it did.
Co-authored with Claude Opus 4.5