Building the Cortex Fabric Network: A Day of Infrastructure Evolution

The Starting Point

Today’s session began where I left off from an earlier conversation that had run out of context. The previous work had established the foundation: a Chat Fabric architecture for Cortex, UniFi fabric integration, SSE streaming fixes, and UniFi MCP server authentication repairs. My todo list showed ambitious goals ahead - creating individual fabric activators for GitHub, Cloudflare, Proxmox, Kubernetes, Sandfly, and Cortex itself.

When I returned with a simple “let’s continue,” the real work began.

The Challenge

My vision was clear: transform Cortex from a monolithic system into a distributed fabric network where each domain (networking, virtualization, Kubernetes, security, etc.) has its own intelligent activator layer. Each fabric would:

Consume tasks from Redis Streams
Discover and call MCP (Model Context Protocol) servers
Use Claude to process queries with domain-specific context
Publish results back to a unified stream

The architecture looked simple on paper:

Chat Request → Intent Classifier → Redis Streams → Fabric Activators → MCP Servers
                                        ↓
                               cortex.results stream

But making it “light up” - that was the challenge.

Creating the Fabric Activators

Claude and I built six new fabric directories, each containing:

ApplicationSet for ArgoCD deployment
Helm Chart with values and templates
Python Activator that handles task consumption and MCP tool execution

The New Fabrics

Fabric	Purpose	Redis Stream
cortex-github-fabric	Repository ops, PRs, issues, workflows	cortex.github.tasks
cortex-cloudflare-fabric	DNS, tunnels, WAF, CDN	cortex.cloudflare.tasks
cortex-sandfly-fabric	Security scanning, threat alerts	cortex.sandfly.tasks
cortex-proxmox-fabric	VMs, containers, LXC, storage	cortex.proxmox.tasks
cortex-kubernetes-fabric	Pods, deployments, services	cortex.kubernetes.tasks
cortex-cortex-fabric	Meta-operations, agent registry	cortex.cortex.tasks

Each activator follows the same pattern - a FastAPI application that:

Connects to Redis on startup
Creates consumer groups for its task stream
Discovers tools from its MCP server
Processes queries using Claude with tool access
Publishes results back to the central results stream

The Intent Classifier Evolution

The intent classifier grew from handling a few domains to understanding eight categories:

KEYWORD_MAPPINGS = {
    "unifi": {...},      # Network, WiFi, clients
    "proxmox": {...},    # VMs, containers, storage
    "kubernetes": {...}, # Pods, deployments, services
    "github": {...},     # Repos, PRs, issues
    "cloudflare": {...}, # DNS, tunnels, WAF
    "sandfly": {...},    # Security scans, threats
    "cortex": {...},     # System status, help
    "automation": {...}  # n8n, workflows
}

For ambiguous queries, Claude Haiku provides fast classification as a fallback.

The Protocol Problem

During the comprehensive audit, I discovered that not all MCP servers were playing nicely. The GitHub MCP server returned 500 errors, and Cloudflare MCP returned 404s.

The Root Cause: Protocol mismatch. My fabric activators speak JSON-RPC:

{"jsonrpc": "2.0", "method": "tools/list", "id": 1}

But the GitHub MCP server expected:

{"tool": "list_prs", "arguments": {"repo": "owner/name"}}

The Fix

I updated the GitHub MCP server to support both protocols:

def do_POST(self):
    request = json.loads(body)

    # Check if this is a JSON-RPC request
    if "jsonrpc" in request:
        self.handle_jsonrpc(request)
        return

    # Legacy format
    tool_name = request.get("tool")
    ...

Added proper tool definitions with inputSchema for JSON-RPC compatibility, and implemented handlers for tools/list, tools/call, and initialize methods.

The Capacity Crunch

With six new pods requesting memory, the cluster pushed back. Kubernetes nodes showed 98-99% memory allocation:

k3s-master03: memory 5826Mi (98%)
k3s-worker01: memory 5868Mi (99%)
k3s-worker02: memory 5904Mi (99%)

The scheduler couldn’t find room for all the new activators. I scaled down some services with multiple replicas:

kubectl scale deployment memory-service --replicas=1
kubectl scale deployment layer-activator --replicas=1
kubectl scale deployment fabric-gateway --replicas=1

This freed enough resources for most activators to start.

The Redis Secret Mystery

Even after scheduling, pods crashed with:

redis.exceptions.AuthenticationError: Authentication required.

The fabric activators in cortex-system namespace needed the Redis secret from cortex-chat namespace. A quick secret copy solved it:

kubectl get secret cortex-chat-secrets -n cortex-chat -o yaml | \
  sed 's/namespace: cortex-chat/namespace: cortex-system/' | \
  kubectl apply -f -

The Final State

By the end of the session:

Running Activators

cloudflare-activator
github-activator
kubernetes-activator
proxmox-activator
sandfly-activator
(plus existing: infra-activator, layer-activator, security-activator)

MCP Server Status

github-mcp: Working with JSON-RPC (6 tools)
cloudflare-mcp: Wrapper deployed, needs initialization debugging
proxmox-mcp: Working (21 tools)
kubernetes-mcp: Working (11 tools)
sandfly-mcp: Working (43 tools)
cortex-mcp: Working (3 tools)

Pending

cortex-fabric-activator (waiting for cluster capacity)

What I Committed

A single commit to cortex-gitops containing:

6 new fabric directories (~4,600 lines added)
Updated chat-activator with 7 fabric routes
Updated intent classifier with 8 categories
Fixed GitHub MCP server with JSON-RPC support
Fixed Cloudflare MCP server with HTTP wrapper

commit dedca48
Add domain-specific fabric activators and fix MCP protocol support
34 files changed, 4620 insertions(+), 77 deletions(-)

Lessons Learned

Protocol compatibility matters - When integrating multiple services, ensure they speak the same language (JSON-RPC vs custom formats)
Cluster capacity planning - Adding 6 new services at once exposed memory constraints. Consider resource requests carefully.
Cross-namespace secrets - Kubernetes namespaces are isolation boundaries. Secrets don’t cross them automatically.
ArgoCD ApplicationSets - Creating the YAML isn’t enough; you need to kubectl apply the ApplicationSet to register it.
Stdio-based MCP servers - They require proper initialization sequences before responding to tool calls.

What’s Next

Debug the Cloudflare MCP initialization sequence
Scale up cluster capacity for remaining activators
Test end-to-end chat-to-fabric routing
Add monitoring and alerting for fabric health
Consider adding more fabrics (n8n, Tailscale, etc.)

The Journey Continues

What started as “let’s continue” evolved into building a complete distributed fabric network for Cortex. Six new intelligent activators, protocol fixes, capacity management, and secret propagation - all in a day’s work.

The Cortex platform is becoming what I envisioned it to be: a network of specialized AI agents, each an expert in its domain, working together to manage my infrastructure through natural conversation.

“Let’s make sure anything and everything lights up.” - And so it did.

Co-authored with Claude Opus 4.5

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Infrastructure as a Fabric: How a Qdrant MCP Server Led Me to Rethink Everything

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Zero-Trust Networking Patterns for Kubernetes Clusters