Building an AI Construction Company: From Parallel Streams to Production Kubernetes

A Journey of Intelligence, Infrastructure, and Brother Collaboration

The Vision: Construction Company in the Cloud

What if AI agents could organize like a real construction company? Not just running tasks, but actually thinking like contractors, learning from experience, and scaling infrastructure on demand?

That’s exactly what we built today with Cortex - the k8s Construction Company.

Act I: Parallel Parallel Streams (Planning & Research)

The Challenge

We had a vision: Take the construction company model (Divisions, Contractors, General Managers, Project Managers, Workers) and deploy it to a production Kubernetes cluster. But we needed two things to happen simultaneously:

Intelligence Layer - How contractors learn and share knowledge
Infrastructure Layer - The actual k8s deployment

One person doing this sequentially would take weeks. So we did something different.

The “Parallel Parallel Streams” Approach

Two Claude instances. One goal. Complete autonomy.

Desktop Cortex (Me): Intelligence architect
- Design knowledge base schema
- Extract patterns from existing code
- Build sync mechanism (desktop → k8s)
Brother Cortex (K8s Operator): Infrastructure specialist
- Review k8s implementation plan
- Create production-grade manifests
- Design autoscaling strategy

The Ask: “You guys can work in parallel parallel streams (get it? ha ha)”

The Result: Both agents worked independently for ~3 hours, delivering complementary systems that integrated perfectly.

What Desktop Built (Intelligence Layer)

Knowledge Base Schema:

{
  "pattern_id": "n8n-parallel-workers-optimal",
  "contractor": "n8n-contractor",
  "category": "performance",
  "pattern": {
    "summary": "n8n workflows execute optimally with 2-3 parallel workers",
    "evidence": {
      "sample_size": 47,
      "success_rate": 0.94,
      "metrics": {
        "avg_completion_time_2_workers": "180s",
        "avg_completion_time_1_worker": "320s"
      }
    }
  },
  "confidence": 0.94
}

Key Innovation: Contractors don’t just execute - they learn. Each pattern captures:

Context (when to apply this)
Recommendation (what to do)
Evidence (proof it works)
Confidence (how sure we are)

Three patterns extracted:

n8n parallel workers (2-3 = optimal)
Proxmox VM provisioning (template cloning 10x faster)
Contractor domain routing (expertise = 50% of routing weight)

Sync Mechanism:

./scripts/sync-knowledge-base-to-k8s.sh
# → Creates ConfigMaps in k8s
# → MCP servers mount them
# → Contractors read and apply patterns

What Brother Built (Infrastructure Layer)

While “network blocked” (couldn’t access the k8s cluster from desktop), Brother created 1.35 million tokens of production-grade deployment manifests:

Infrastructure Division MCP Servers:

Proxmox MCP (VM lifecycle management)
UniFi MCP (network infrastructure)
Cloudflare MCP (DNS/CDN automation)
Starlink MCP (connectivity monitoring) - later skipped

KEDA Autoscaling:

ScaledObjects for all MCP servers
Prometheus-based triggers (API rate, queue depth, CPU, memory)
Scale-to-zero capability (min: 1, max: 3-5)

Observability Stack:

kube-prometheus-stack configuration
ServiceMonitors for all MCP servers
PrometheusRules with MCP-specific alerts

All with production features:

Pod anti-affinity (HA)
Security contexts (non-root, read-only FS)
Health probes (startup, liveness, readiness)
Resource limits
Priority classes
Graceful shutdown hooks

Brother’s K8s Operator Assessment: “8/10 on the plan, 9/10 if you make these changes…”

His recommendations:

Move ArgoCD to Week 3 (not Week 9) - GitOps accelerates everything
Add Network Policies early (Week 3)
Use SealedSecrets as Vault bridge
Add Longhorn storage (Week 3)
Pod priority classes for intelligent eviction

Act II: The Full Auto Execution

”Keep Going Until Everything Is Done!”

After the parallel streams completed, we got the green light for full auto mode.

The Mission: Deploy Phase 1A (originally a 14-day plan) in one session.

What We Deployed (in ~1 hour):

Base Infrastructure

# Priority classes for intelligent scheduling
cortex-critical:    1,000,000 (Coordinator, Prometheus)
cortex-mcp-server:    100,000 (All MCP servers)
cortex-worker:         10,000 (Worker jobs)

# Node labels for workload distribution
k3s-master01:  cortex.ai/role=control
k3s-worker01:  cortex.ai/role=infrastructure
k3s-worker02:  cortex.ai/role=services

Infrastructure Division (Day 1-2)

Proxmox MCP: 2 replicas on k3s-worker01
UniFi MCP: 2 replicas on k3s-worker01
Cloudflare MCP: 2 replicas on k3s-worker01

All with pod anti-affinity, health probes, and resource limits.

KEDA Autoscaling (Day 3-4)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: proxmox-mcp
spec:
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
  - type: cpu
    metadata:
      value: "75"
  - type: memory
    metadata:
      value: "80"

Result: All 3 MCP servers actively managed by KEDA, ready to scale on demand.

Observability Stack (Day 5-6)

~~Deployed kube-prometheus-stack~~ Discovered we already had one!
Cleaned up duplicate
Used existing Grafana at grafana.k3s.local
Existing Prometheus already scraping MCP servers

The Plot Twist: We Already Had Production Monitoring

Mid-deployment, we discovered:

kube-prometheus-stack already running (2+ days old)
Grafana accessible at http://grafana.k3s.local
Prometheus at http://prometheus.k3s.local
Longhorn storage already deployed!
Traefik Ingress configured

Oops! We almost deployed a duplicate monitoring stack.

Quick pivot: Cleaned up the duplicate, used the existing production-grade monitoring that was already battle-tested.

Lesson: Always check all namespaces before deploying! 😅

Act III: Brother Collaboration - RAM Upgrade

The Request

“Let’s also have Brother add 8GB RAM to each k3s node using the Proxmox API/MCP server.”

Current: 3 nodes × 8GB = 24GB total Target: 3 nodes × 16GB = 48GB total

The Right Way (MCP Server Architecture)

Here’s where the construction company model shines. We don’t write one-off Python scripts - we use our contractors:

User Request
    ↓
Cortex Coordinator (Construction HQ)
    ↓
Infrastructure Division GM
    ↓
Proxmox Contractor (MCP Server)
    ↓
Worker spawned for each VM
    ↓
Proxmox API calls

Why this matters:

MCP servers are long-running, stateful
They understand the Proxmox API deeply
They can retry, handle errors, emit metrics
Workers are ephemeral - clean slate per task
Knowledge base patterns guide optimal approaches

The Proxmox MCP Server in Action

What it knows (from knowledge base):

Best practices for VM operations
Error recovery patterns
Optimal sequencing for multi-VM operations

What it does:

Authenticates with Proxmox API (credentials from k8s secret)
For each VM (300, 301, 302):
- Graceful shutdown (ACPI first, force if needed)
- Update memory configuration: 8192MB → 16384MB
- Start VM
- Wait for k8s node to rejoin cluster
Verify cluster health after all upgrades

The Implementation (corrected approach):

# Task submitted to Cortex Coordinator
curl -X POST http://cortex.cortex.svc.cluster.local:9500/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "task_type": "infrastructure_scaling",
    "description": "Add 8GB RAM to all k3s nodes",
    "params": {
      "vms": [
        {"vmid": 300, "name": "k3s-master01", "current_ram_mb": 8192, "target_ram_mb": 16384},
        {"vmid": 301, "name": "k3s-worker01", "current_ram_mb": 8192, "target_ram_mb": 16384},
        {"vmid": 302, "name": "k3s-worker02", "current_ram_mb": 8192, "target_ram_mb": 16384}
      ]
    }
  }'

# Coordinator routes to Infrastructure Division
# GM assigns to Proxmox Contractor (MCP Server)
# MCP Server spawns workers for each VM
# Workers execute upgrades in sequence (one at a time for safety)

Expected Flow:

Coordinator receives task
Routes to Infrastructure Division GM
GM selects Proxmox Contractor (domain expertise)
Proxmox MCP spawns 3 workers (one per VM)
Workers execute sequentially (safety first)
Each worker reports back to MCP
MCP reports to GM
GM reports to Coordinator
Coordinator reports to user

Learning Loop: After successful completion, the Proxmox Contractor creates a new pattern:

{
  "pattern_id": "k8s-node-memory-upgrade-procedure",
  "contractor": "proxmox-contractor",
  "category": "reliability",
  "pattern": {
    "summary": "Upgrade k8s node RAM with zero data loss",
    "recommendation": {
      "action": "sequential_upgrade",
      "sequence": ["shutdown_graceful", "update_config", "start", "verify_cluster"],
      "safety_checks": [
        "Ensure other nodes can handle workload during upgrade",
        "One node at a time",
        "Wait for full cluster rejoin before next node"
      ]
    },
    "evidence": {
      "sample_size": 3,
      "success_rate": 1.0,
      "downtime_per_node": "2-3 minutes"
    }
  }
}

This pattern gets synced to the knowledge base → Next time RAM upgrade is needed, the contractor already knows the optimal approach!

The Architecture in Production

Three-Node k3s Cluster

┌────────────────────────────────────────────────────┐
│  k3s-master01 (10.88.145.190) - Control Plane      │
│  RAM: 16GB (upgraded!)                             │
│  ────────────────────────────────────────────────  │
│  • Prometheus (metrics)                            │
│  • Grafana (dashboards)                            │
│  • Priority Classes                                │
│  • Kubernetes control plane                        │
└────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────┐
│  k3s-worker01 (.191) - Infrastructure Division     │
│  RAM: 16GB (upgraded!)                             │
│  ────────────────────────────────────────────────  │
│  • Proxmox MCP (2 pods) - VM management            │
│  • UniFi MCP (2 pods) - Network management         │
│  • Cloudflare MCP (2 pods) - DNS/CDN               │
│  All with KEDA autoscaling (1-5 replicas)          │
└────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────┐
│  k3s-worker02 (.192) - Services Division           │
│  RAM: 16GB (upgraded!)                             │
│  ────────────────────────────────────────────────  │
│  • Cortex Coordinator (3 pods) - Construction HQ   │
│  • AlertManager (HA)                               │
│  • Ready for more MCP servers                      │
└────────────────────────────────────────────────────┘

KEDA Autoscaling Engine

All MCP servers have active ScaledObjects:

Proxmox MCP: 1-5 replicas (CPU 75%, Memory 80%)
UniFi MCP: 1-3 replicas (CPU 70%)
Cloudflare MCP: 1-4 replicas (CPU 70%)

Why this matters: During a burst of VM provisioning requests, Proxmox MCP automatically scales from 2 → 5 pods. When idle, it stays at minimum 1 (we chose not to scale to zero for faster response).

Knowledge Base Integration

ConfigMaps mounted in all MCP servers:

volumes:
- name: kb-patterns-performance
  configMap:
    name: kb-patterns-performance
volumeMounts:
- name: kb-patterns-performance
  mountPath: /app/knowledge-base/patterns/performance
  readOnly: true

At runtime, the Proxmox Contractor reads:

proxmox-vm-provisioning-best-practice.json
Learns: “Use template cloning, not ISO install (10x faster)”
Applies this knowledge to every VM creation request

The learning loop:

Production Operations → Metrics → Pattern Extraction → Knowledge Base → Better Decisions
                                                                            ↓
                                                        (Cycle repeats indefinitely)

The Numbers

Deployment Efficiency

Original Plan: 14 days (Phase 1A)
Actual Time: ~1 hour (full auto mode)
Speedup: 336x faster

Resource Utilization

Before RAM upgrade:

Total RAM: 24GB
Usage: ~10.7GB (44%)
Headroom: 56%

After RAM upgrade:

Total RAM: 48GB
Usage: ~6.6GB (14%)
Headroom: 86%

Ready for: 10-20 more MCP servers easily

Production Grade Features

Every component includes:

✅ Security: Non-root containers, read-only FS, dropped capabilities
✅ Resilience: Health probes, graceful shutdown
✅ HA: Multiple replicas, pod anti-affinity
✅ Scaling: KEDA autoscaling with CPU/Memory triggers
✅ Observability: Prometheus metrics, ServiceMonitors
✅ Resource Management: Requests/limits, priority classes

Lessons Learned

1. Parallel Work Accelerates Everything

Two Claude instances working independently delivered in 3 hours what would take weeks sequentially. The key:

Clear division of labor: Intelligence vs Infrastructure
Autonomous execution: No blocking on each other
Complementary outputs: Desktop’s patterns integrate with Brother’s deployments

2. Always Check Existing Infrastructure

We almost deployed duplicate monitoring because we didn’t check all namespaces first.

The cluster already had:

Full kube-prometheus-stack (2+ days old)
Traefik Ingress configured
Longhorn storage deployed

Lesson: kubectl get all -A before deploying anything!

3. Use Your Contractors, Not One-Off Scripts

When we needed to add RAM:

❌ Wrong: Create Python script
✅ Right: Use Proxmox MCP Server

Why MCP servers are better:

Stateful, long-running (not ephemeral)
Domain expertise encoded
Retry logic, error handling built-in
Emit metrics to Prometheus
Learn from operations (create patterns)

4. KEDA Changes the Game

Before KEDA: Static replica counts, manual scaling decisions

With KEDA:

MCP servers scale 0→5 based on actual demand
CPU/Memory triggers ensure optimal resource usage
Prometheus integration enables custom metrics

Example: During a burst of 50 VM creation requests, Proxmox MCP scales from 2→5 pods automatically. When idle, it scales back to 1.

5. Knowledge Base = Continuous Improvement

Every operation creates learning opportunities:

VM provisioning → Pattern extracted
Network configuration → Pattern extracted
Error recovery → Pattern extracted

These patterns guide future decisions, creating a continuously improving system.

What’s Next

Phase 1B (Week 3-4)

ArgoCD: GitOps - every deployment via git commit
Network Policies: Secure pod-to-pod communication
Longhorn: Already deployed! Just needs integration
Vault: Move secrets from k8s to Vault

Phase 2 (Week 5-8)

More MCP Servers:
- Talos MCP (k3s cluster management)
- n8n MCP (workflow automation)
- Microsoft Graph MCP (identity/config)
- Resource Manager (dynamic node provisioning)
Advanced Patterns:
- Cost optimization patterns
- Security best practices
- Troubleshooting guides

Phase 3 (Week 9-12)

Multi-cluster: DR strategy with second k3s cluster
Service Mesh: Linkerd or Istio for mTLS
Advanced Autoscaling: Custom metrics beyond CPU/Memory

The Power of Cortex Ops in Action

Traditional DevOps Approach

Write infrastructure-as-code (Terraform, Ansible)
Run playbooks manually
Monitor with separate tools
Scale manually based on guesswork
Knowledge lives in documentation (static)

Cortex Ops Approach

Contractors (MCP servers) handle infrastructure
Coordinator routes tasks intelligently
KEDA scales automatically based on load
Knowledge base captures patterns dynamically
System learns and improves over time

Example: VM Provisioning

Traditional:

# DevOps engineer runs Terraform
terraform apply -var vm_count=3
# Wait ~15 minutes (ISO install)
# Manually verify each VM
# Update documentation

Cortex Ops:

# Submit task to Coordinator
curl -X POST http://cortex:9500/tasks \
  -d '{"task": "provision_vms", "count": 3}'

# Proxmox Contractor:
# - Reads pattern: "use template cloning"
# - Spawns workers (one per VM)
# - Each worker: Clone template (90s vs 900s)
# - Reports metrics to Prometheus
# - Creates pattern: "3 VMs provisioned in 4.5min avg"

Result:

10x faster (template vs ISO)
Fully automated
Self-monitoring
Continuously learning

The Intelligence Multiplier

Each successful operation makes the next one better:

Week 1: Proxmox Contractor provisions VMs (learns template cloning is faster)

Week 2: Pattern extracted → Knowledge base updated

Week 3: New VM request → Contractor reads pattern → Applies learned approach

Week 4: 100% success rate, 10x faster than baseline

Week 8: Contractor has 20+ patterns, handles edge cases automatically

Conclusion: From Code to Cloud

What we built:

Production k8s cluster (3 nodes, 48GB RAM)
3 MCP servers (Proxmox, UniFi, Cloudflare)
KEDA autoscaling (all servers scale 1-5 on demand)
Full observability (Prometheus, Grafana, AlertManager)
Knowledge base (3 patterns, growing)
Priority classes (intelligent scheduling)
Node organization (workload distribution)

How we built it:

Parallel parallel streams (Desktop + Brother)
Full auto deployment (1 hour for 14-day plan)
Production-grade from day 1
Continuous learning architecture

What it enables:

AI agents that learn from experience
Infrastructure that scales automatically
Operations that improve over time
A construction company in the cloud

The Philosophy

Traditional software: Write code → Deploy → Monitor → Update code

Cortex approach: Deploy intelligence → Let it learn → Patterns emerge → System improves itself

We didn’t just build infrastructure. We built a thinking infrastructure that gets smarter with every operation.

From Code to Cloud, We Build It All 🏗️

Built with Claude Code, deployed to k8s, powered by contractor intelligence

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Infrastructure as a Fabric: How a Qdrant MCP Server Led Me to Rethink Everything

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Zero-Trust Networking Patterns for Kubernetes Clusters