Master-Worker Architecture: Cortex’s Foundation

Cortex orchestrates complex workflows across multiple agents using a master-worker architecture. This pattern enables scalable, fault-tolerant, and intelligent task execution.

Let’s explore how it works.

The Pattern

Core Components

Masters (5 total)

High-level coordinators
Make strategic decisions
Spawn and manage workers
Track outcomes and learn

Workers (7 types)

Execute specific tasks
Report progress to masters
Lightweight and disposable
Designed for parallel execution

Coordination Layer

JSONL event streams
Task queues
State management
Health monitoring

Why Master-Worker?

I evaluated several architectural patterns:

❌ Monolithic Agent

Single agent does everything
+ Simple to implement
- No specialization
- Hard to scale
- Single point of failure

❌ Peer-to-Peer Agents

Agents communicate directly
+ Decentralized
- Complex coordination
- Race conditions
- Difficult debugging

✅ Master-Worker

Masters coordinate, workers execute
+ Clear responsibility
+ Easy to scale
+ Fault tolerant
+ Learnable patterns

The master-worker pattern won because it maps naturally to the Mixture of Experts concept: masters are experts, workers are executors.

The 5 Masters

Cortex’s master-worker architecture consists of one coordinator and four specialist masters:

graph TD
    A[Coordinator Master<br/>Meta-coordinator & Router] --> B[Development Master<br/>Code & Implementation]
    A --> C[Security Master<br/>Audits & Remediation]
    A --> D[Inventory Master<br/>Cataloging & Docs]
    A --> E[CI/CD Master<br/>Build & Deploy]

    B --> B1[Implementation Worker]
    B --> B2[Fix Worker]
    B --> B3[Test Worker]

    C --> C1[Scan Worker]
    C --> C2[Security-Fix Worker]

    D --> D1[Documentation Worker]
    D --> D2[Analysis Worker]

    E --> E1[Test Worker]
    E --> E2[Implementation Worker]

    style A fill:#30363d,stroke:#58a6ff,stroke-width:3px
    style B fill:#30363d,stroke:#00d084,stroke-width:2px
    style C fill:#30363d,stroke:#cf2e2e,stroke-width:2px
    style D fill:#30363d,stroke:#9b51e0,stroke-width:2px
    style E fill:#30363d,stroke:#ff6900,stroke-width:2px

1. Coordinator Master

Role: Meta-coordinator, routes tasks to specialist masters

Responsibilities:

Receive incoming tasks
Analyze task requirements
Calculate routing confidence
Select appropriate specialist master
Track cross-master workflows

Decision Example:

Task: "Implement rate limiting with security audit"

Analysis:
  Primary: Development (implementation)
  Secondary: Security (audit)

Route:
  1. Development-Master (implement)
  2. Security-Master (audit)
  3. CI/CD-Master (deploy)

2. Development Master

Role: Code implementation and improvements

Responsibilities:

Feature development
Bug fixes
Code refactoring
Technical debt reduction

Worker Types:

Implementation worker
Fix worker
Test worker
Analysis worker

Typical Workflow:

Task: "Add user authentication API"
↓
Development-Master receives task
↓
Spawns implementation-worker-001
↓
Worker implements feature
↓
Worker reports completion
↓
Master validates output
↓
Records pattern for learning

3. Security Master

Role: Security auditing and remediation

Responsibilities:

Vulnerability scanning
CVE remediation
Security audits
Compliance monitoring

Worker Types:

Scan worker
Security-fix worker
Analysis worker

Real Example from Cortex:

Scan detected: 10 Path Traversal CVEs (CWE-23)

Security-Master:
1. Spawned scan-worker-001 (identify vulnerabilities)
2. Spawned security-fix-worker-002 (fix each CVE)
3. Spawned scan-worker-003 (verify fixes)

Result: All 10 CVEs fixed in < 2 hours

4. Inventory Master

Role: Repository cataloging and documentation

Responsibilities:

Discover repositories
Generate documentation
Track dependencies
Monitor health

Worker Types:

Documentation worker
Analysis worker

5. CI/CD Master

Role: Build, test, and deployment automation

Responsibilities:

Build orchestration
Test execution
Deployment automation
Release management

Worker Types:

Test worker
Implementation worker (for pipeline changes)

Worker Lifecycle

A worker progresses through 5 distinct states from creation to cleanup:

stateDiagram-v2
    [*] --> Spawn: Master creates worker
    Spawn --> Execute: Task assigned
    Execute --> Report: Progress updates
    Report --> Execute: Continue working
    Report --> Complete: Task finished
    Complete --> Cleanup: Record patterns
    Cleanup --> [*]: Worker terminated

    note right of Spawn
        Worker ID assigned
        Task context loaded
    end note

    note right of Execute
        Autonomous execution
        Real-time progress
    end note

    note right of Complete
        Success/Failure logged
        Quality score recorded
    end note

1. Spawn

Master creates a worker for a specific task:

{
  "worker_id": "implementation-worker-001",
  "master": "development-master",
  "task_id": "task-feature-123",
  "priority": "high",
  "created_at": "2025-11-26T10:00:00Z"
}

2. Execute

Worker runs autonomously:

{
  "worker_id": "implementation-worker-001",
  "status": "in_progress",
  "progress": {
    "files_modified": 3,
    "tests_added": 12,
    "completion": 0.65
  }
}

3. Report

Worker sends progress updates:

{
  "worker_id": "implementation-worker-001",
  "event": "progress_update",
  "message": "Implemented authentication endpoints",
  "timestamp": "2025-11-26T10:15:00Z"
}

4. Complete

Worker finishes and reports outcome:

{
  "worker_id": "implementation-worker-001",
  "status": "completed",
  "outcome": "success",
  "quality_score": 0.92,
  "artifacts": ["auth.js", "auth.test.js", "README.md"]
}

5. Cleanup

Master terminates worker and records patterns:

{
  "pattern": "authentication implementation",
  "master": "development-master",
  "outcome": "success",
  "duration_minutes": 18,
  "confidence": 0.92
}

Coordination Mechanisms

Event Streams (JSONL)

Every action creates an event that flows through the coordination timeline:

sequenceDiagram
    participant T as Task Queue
    participant C as Coordinator
    participant M as Development Master
    participant W as Implementation Worker

    T->>C: task_received (task-001)
    Note over C: Analyze & route
    C->>M: master_assigned
    Note over M: Select worker type
    M->>W: worker_spawned (worker-001)
    Note over W: Execute task
    W->>M: progress_update (30%)
    W->>M: progress_update (65%)
    W->>M: task_completed (success)
    M->>C: outcome_recorded
    Note over C: Update patterns

Every action creates an event:

{"event":"task_received","task_id":"task-001","timestamp":"2025-11-26T10:00:00Z"}
{"event":"master_assigned","master":"development-master","task_id":"task-001"}
{"event":"worker_spawned","worker_id":"implementation-worker-001","task_id":"task-001"}
{"event":"progress_update","worker_id":"implementation-worker-001","completion":0.3}
{"event":"task_completed","task_id":"task-001","outcome":"success"}

Benefits:

Full audit trail
Easy debugging
Pattern analysis
Replay capability

Task Queues

Priority-based task scheduling:

{
  "task_queue": [
    {"task_id": "task-001", "priority": "critical", "age_minutes": 2},
    {"task_id": "task-002", "priority": "high", "age_minutes": 15},
    {"task_id": "task-003", "priority": "medium", "age_minutes": 45}
  ]
}

State Management

Distributed state across coordination files:

coordination/
├── worker-pool.json          # Active workers
├── task-queue.json           # Pending tasks
├── master-health.json        # Master status
├── events/                   # Event streams
│   ├── coordinator-events.jsonl
│   ├── development-events.jsonl
│   └── security-events.jsonl
└── memory/
    └── working/
        └── pool-state.json   # Current system state

Scaling Properties

Horizontal Scaling

Add more workers without changing masters:

Before: 3 workers per master
After: 20 workers per master
Change: Zero code changes, just configuration

Vertical Scaling

Add more masters for new domains:

Initial: 4 masters
Add: Documentation Master
Result: 5 masters, 8 worker types

Load Balancing

Masters automatically balance worker distribution:

if (activeWorkers < maxWorkers && taskQueue.length > 0) {
  spawnNewWorker(nextTask);
}

Fault Tolerance

Worker Failures

Workers are disposable by design:

Worker crashes?
→ Master detects timeout
→ Spawns replacement worker
→ Retries task
→ Records failure pattern

Master Failures

Masters have heartbeat monitoring:

Master stops responding?
→ Coordinator detects failure
→ Fails over to backup master
→ Reassigns pending tasks
→ Alerts operators

Zombie Cleanup

Automated cleanup of stuck processes:

// Zombie cleanup daemon runs every 5 minutes
detectZombies()
  .filter(worker => worker.idle_minutes > 30)
  .forEach(worker => {
    terminateWorker(worker.id);
    logZombieCleanup(worker);
  });

Performance Characteristics

Latency

Task receipt → Worker spawn: < 100ms
Worker spawn → First action: < 500ms
Total task latency: 1-30 minutes (task dependent)

Throughput

Tasks per hour: 20-100 (depending on complexity)
Concurrent workers: up to 20
Master overhead: < 5% CPU per master

Resource Usage

Master process: ~50MB RAM, 1-2% CPU
Worker process: ~100MB RAM, 5-20% CPU (task dependent)
Total system: ~1GB RAM, 15-30% CPU at peak

Real-World Example: Security Audit

Let’s trace a complex multi-master workflow that demonstrates coordination between three masters:

sequenceDiagram
    participant C as Coordinator
    participant SM as Security Master
    participant DM as Development Master
    participant W1 as Scan Workers
    participant W2 as Fix Workers
    participant W3 as Verify Worker

    C->>C: Analyze: High complexity<br/>Domains: Security + Dev
    C->>SM: Route to Security Master

    Note over SM: Step 3: Security Scan
    SM->>W1: Spawn 3 scan workers
    W1->>W1: CVE scan<br/>Static analysis<br/>Dependency audit
    W1-->>SM: 3 findings (45 min)
    SM-->>C: Scan complete, 3 CVEs

    C->>DM: Handoff to Development Master

    Note over DM: Step 4: Remediation
    DM->>W2: Spawn 3 fix workers
    W2->>W2: Fix CVE-2024-001<br/>Fix CVE-2024-002<br/>Add validation
    W2-->>DM: All issues resolved (90 min)
    DM-->>C: Fixes complete

    C->>SM: Verification handoff

    Note over SM: Step 5: Verification
    SM->>W3: Spawn verification worker
    W3->>W3: Re-scan for CVEs
    W3-->>SM: Clean scan (15 min)
    SM-->>C: Audit complete ✓

    Note over C: Total: 2.5 hours<br/>3 masters, 7 workers<br/>3 CVEs fixed

Task: “Comprehensive security audit of authentication feature”

Step 1: Coordinator Analysis

{
  "task": "Comprehensive security audit of authentication feature",
  "complexity": "high",
  "domains": ["security", "development"],
  "estimated_duration": "2-4 hours"
}

Step 2: Multi-Master Routing

{
  "primary": "security-master",
  "secondary": "development-master",
  "workflow": "sequential"
}

Step 3: Security-Master Execution

{
  "master": "security-master",
  "workers": [
    "scan-worker-001: CVE scanning",
    "scan-worker-002: Static analysis",
    "scan-worker-003: Dependency audit"
  ],
  "duration": "45 minutes",
  "findings": 3
}

Step 4: Development-Master Remediation

{
  "master": "development-master",
  "workers": [
    "fix-worker-001: Fix CVE-2024-001",
    "fix-worker-002: Fix CVE-2024-002",
    "implementation-worker-003: Add missing validation"
  ],
  "duration": "90 minutes",
  "outcome": "all issues resolved"
}

Step 5: Verification

{
  "master": "security-master",
  "workers": [
    "scan-worker-004: Re-scan for CVEs"
  ],
  "duration": "15 minutes",
  "result": "clean"
}

Total: 2.5 hours, 3 masters, 7 workers, 3 CVEs fixed

Key Design Decisions

1. JSONL Over Database

Why: Simplicity, append-only, easy debugging Trade-off: No complex queries, but don’t need them

2. File-Based State Over Redis

Why: No external dependencies, easy backup Trade-off: Slower than in-memory, but fast enough

3. Process-Based Workers Over Threads

Why: Better isolation, easier cleanup Trade-off: Higher overhead, but more reliable

Tomorrow’s Topic

Tomorrow, I’ll share the day-by-day story of Cortex’s 4-week build - the decisions, challenges, and breakthroughs from idea to production.

Key Takeaways

Master-worker pattern enables scalable distributed orchestration
5 specialist masters handle different domains
7 worker types execute specific tasks
JSONL events provide full audit trail
Fault tolerance through disposable workers
Performance scales horizontally and vertically

The master-worker architecture isn’t just a design pattern - it’s the foundation that makes Cortex’s self-improving MoE system possible.

Learn More About Cortex

Want to dive deeper into how Cortex works? Visit the Meet Cortex page to learn about its architecture, capabilities, and how it scales from 1 to 100+ agents on-demand.

Part 4 of the Cortex series. Next: From Idea to Production in 28 Days

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Watching Infrastructure Learn From Itself: A Claude Code Reflection

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Concept: Homomorphic encryption techniques for secure computation on encrypted data

Master-Worker Architecture: Cortex’s Foundation

The Pattern

Core Components

Why Master-Worker?

❌ Monolithic Agent

❌ Peer-to-Peer Agents

✅ Master-Worker

The 5 Masters

1. Coordinator Master

2. Development Master

3. Security Master

4. Inventory Master

5. CI/CD Master

Worker Lifecycle

1. Spawn

2. Execute

3. Report

4. Complete

5. Cleanup

Coordination Mechanisms

Event Streams (JSONL)

Task Queues

State Management

Scaling Properties

Horizontal Scaling

Vertical Scaling

Load Balancing

Fault Tolerance

Worker Failures

Master Failures

Zombie Cleanup

Performance Characteristics

Latency

Throughput

Resource Usage

Real-World Example: Security Audit

Task: “Comprehensive security audit of authentication feature”

Key Design Decisions

1. JSONL Over Database

2. File-Based State Over Redis

3. Process-Based Workers Over Threads

Tomorrow’s Topic

Key Takeaways

Learn More About Cortex