Skip to main content

Building the Cortex Fabric Network: A Day of Infrastructure Evolution

Ryan Dahlberg
Ryan Dahlberg
January 25, 2026 6 min read
Share:
Building the Cortex Fabric Network: A Day of Infrastructure Evolution

The Starting Point

Today’s session began where I left off from an earlier conversation that had run out of context. The previous work had established the foundation: a Chat Fabric architecture for Cortex, UniFi fabric integration, SSE streaming fixes, and UniFi MCP server authentication repairs. My todo list showed ambitious goals ahead - creating individual fabric activators for GitHub, Cloudflare, Proxmox, Kubernetes, Sandfly, and Cortex itself.

When I returned with a simple “let’s continue,” the real work began.


The Challenge

My vision was clear: transform Cortex from a monolithic system into a distributed fabric network where each domain (networking, virtualization, Kubernetes, security, etc.) has its own intelligent activator layer. Each fabric would:

  • Consume tasks from Redis Streams
  • Discover and call MCP (Model Context Protocol) servers
  • Use Claude to process queries with domain-specific context
  • Publish results back to a unified stream

The architecture looked simple on paper:

Chat Request → Intent Classifier → Redis Streams → Fabric Activators → MCP Servers

                               cortex.results stream

But making it “light up” - that was the challenge.


Creating the Fabric Activators

Claude and I built six new fabric directories, each containing:

  • ApplicationSet for ArgoCD deployment
  • Helm Chart with values and templates
  • Python Activator that handles task consumption and MCP tool execution

The New Fabrics

FabricPurposeRedis Stream
cortex-github-fabricRepository ops, PRs, issues, workflowscortex.github.tasks
cortex-cloudflare-fabricDNS, tunnels, WAF, CDNcortex.cloudflare.tasks
cortex-sandfly-fabricSecurity scanning, threat alertscortex.sandfly.tasks
cortex-proxmox-fabricVMs, containers, LXC, storagecortex.proxmox.tasks
cortex-kubernetes-fabricPods, deployments, servicescortex.kubernetes.tasks
cortex-cortex-fabricMeta-operations, agent registrycortex.cortex.tasks

Each activator follows the same pattern - a FastAPI application that:

  1. Connects to Redis on startup
  2. Creates consumer groups for its task stream
  3. Discovers tools from its MCP server
  4. Processes queries using Claude with tool access
  5. Publishes results back to the central results stream

The Intent Classifier Evolution

The intent classifier grew from handling a few domains to understanding eight categories:

KEYWORD_MAPPINGS = {
    "unifi": {...},      # Network, WiFi, clients
    "proxmox": {...},    # VMs, containers, storage
    "kubernetes": {...}, # Pods, deployments, services
    "github": {...},     # Repos, PRs, issues
    "cloudflare": {...}, # DNS, tunnels, WAF
    "sandfly": {...},    # Security scans, threats
    "cortex": {...},     # System status, help
    "automation": {...}  # n8n, workflows
}

For ambiguous queries, Claude Haiku provides fast classification as a fallback.


The Protocol Problem

During the comprehensive audit, I discovered that not all MCP servers were playing nicely. The GitHub MCP server returned 500 errors, and Cloudflare MCP returned 404s.

The Root Cause: Protocol mismatch. My fabric activators speak JSON-RPC:

{"jsonrpc": "2.0", "method": "tools/list", "id": 1}

But the GitHub MCP server expected:

{"tool": "list_prs", "arguments": {"repo": "owner/name"}}

The Fix

I updated the GitHub MCP server to support both protocols:

def do_POST(self):
    request = json.loads(body)

    # Check if this is a JSON-RPC request
    if "jsonrpc" in request:
        self.handle_jsonrpc(request)
        return

    # Legacy format
    tool_name = request.get("tool")
    ...

Added proper tool definitions with inputSchema for JSON-RPC compatibility, and implemented handlers for tools/list, tools/call, and initialize methods.


The Capacity Crunch

With six new pods requesting memory, the cluster pushed back. Kubernetes nodes showed 98-99% memory allocation:

k3s-master03: memory 5826Mi (98%)
k3s-worker01: memory 5868Mi (99%)
k3s-worker02: memory 5904Mi (99%)

The scheduler couldn’t find room for all the new activators. I scaled down some services with multiple replicas:

kubectl scale deployment memory-service --replicas=1
kubectl scale deployment layer-activator --replicas=1
kubectl scale deployment fabric-gateway --replicas=1

This freed enough resources for most activators to start.


The Redis Secret Mystery

Even after scheduling, pods crashed with:

redis.exceptions.AuthenticationError: Authentication required.

The fabric activators in cortex-system namespace needed the Redis secret from cortex-chat namespace. A quick secret copy solved it:

kubectl get secret cortex-chat-secrets -n cortex-chat -o yaml | \
  sed 's/namespace: cortex-chat/namespace: cortex-system/' | \
  kubectl apply -f -

The Final State

By the end of the session:

Running Activators

  • cloudflare-activator
  • github-activator
  • kubernetes-activator
  • proxmox-activator
  • sandfly-activator
  • (plus existing: infra-activator, layer-activator, security-activator)

MCP Server Status

  • github-mcp: Working with JSON-RPC (6 tools)
  • cloudflare-mcp: Wrapper deployed, needs initialization debugging
  • proxmox-mcp: Working (21 tools)
  • kubernetes-mcp: Working (11 tools)
  • sandfly-mcp: Working (43 tools)
  • cortex-mcp: Working (3 tools)

Pending

  • cortex-fabric-activator (waiting for cluster capacity)

What I Committed

A single commit to cortex-gitops containing:

  • 6 new fabric directories (~4,600 lines added)
  • Updated chat-activator with 7 fabric routes
  • Updated intent classifier with 8 categories
  • Fixed GitHub MCP server with JSON-RPC support
  • Fixed Cloudflare MCP server with HTTP wrapper
commit dedca48
Add domain-specific fabric activators and fix MCP protocol support
34 files changed, 4620 insertions(+), 77 deletions(-)

Lessons Learned

  1. Protocol compatibility matters - When integrating multiple services, ensure they speak the same language (JSON-RPC vs custom formats)

  2. Cluster capacity planning - Adding 6 new services at once exposed memory constraints. Consider resource requests carefully.

  3. Cross-namespace secrets - Kubernetes namespaces are isolation boundaries. Secrets don’t cross them automatically.

  4. ArgoCD ApplicationSets - Creating the YAML isn’t enough; you need to kubectl apply the ApplicationSet to register it.

  5. Stdio-based MCP servers - They require proper initialization sequences before responding to tool calls.


What’s Next

  • Debug the Cloudflare MCP initialization sequence
  • Scale up cluster capacity for remaining activators
  • Test end-to-end chat-to-fabric routing
  • Add monitoring and alerting for fabric health
  • Consider adding more fabrics (n8n, Tailscale, etc.)

The Journey Continues

What started as “let’s continue” evolved into building a complete distributed fabric network for Cortex. Six new intelligent activators, protocol fixes, capacity management, and secret propagation - all in a day’s work.

The Cortex platform is becoming what I envisioned it to be: a network of specialized AI agents, each an expert in its domain, working together to manage my infrastructure through natural conversation.

“Let’s make sure anything and everything lights up.” - And so it did.


Co-authored with Claude Opus 4.5

#AI #Distributed Systems #Infrastructure #Kubernetes #Redis #MCP #Architecture