The WRAP Framework: Building an AI Coding Agent for Infrastructure Automation

Infrastructure development is fundamentally changing. What once required hours of manual coding, testing, and deployment can now be accomplished in minutes through AI-assisted development. Today, we’re implementing a comprehensive AI coding agent for Cortex using the WRAP framework—a methodology that maximizes the effectiveness of human-AI collaboration in software development.

This isn’t about replacing developers. It’s about amplifying their capabilities through intelligent automation that handles repetitive work while preserving human judgment for critical decisions.

The WRAP Framework

WRAP is an acronym that encapsulates four principles for effective AI coding agent deployment:

W - Write Effective Issues

Structure coding tasks as if explaining to a new team member, with complete context and concrete examples.

R - Refine Instructions

Leverage repository, organization, and custom agent instructions to ensure consistency across implementations.

A - Atomic Tasks

Break large problems into small, independent, well-defined tasks that can be executed in parallel.

P - Pair with Coding Agent

Align human strengths (understanding intent, navigating ambiguity) with agent capabilities (tireless execution, exploring alternatives).

Why Cortex Needs a Coding Agent

Current Development Reality

Manual Deployment Process (YouTube Intelligence Service example):

# 1. Write code (30 minutes)
# 2. Create ConfigMaps (10 minutes)
kubectl create configmap youtube-channel-intelligence-code \
  --from-file=index.js=./index.js \
  --from-file=config.js=./config.js \
  --from-file=youtube-client.js=./youtube-client.js \
  # ... 4 more files

# 3. Write deployment YAML (20 minutes)
# 4. Apply to cluster (2 minutes)
kubectl apply -f deployment.yaml

# 5. Debug issues (30+ minutes)
kubectl logs youtube-intelligence-abc123 --tail=100

# 6. Iterate until working (60+ minutes)

Total: 2.5+ hours

With AI Coding Agent:

Human: "Create YouTube intelligence service with priority queue and Redis state"
Agent: Generates complete implementation in 5 minutes
Human: Reviews and approves (5 minutes)
Agent: Deploys to K3s, verifies health (2 minutes)

Total: 12 minutes (12x faster)

Architecture: The Five Services

1. Issue Parser Service

Purpose: Convert natural language development requests into structured, executable tasks.

Example Transformation:

Input (Natural Language):
"Add a health check endpoint to the YouTube intelligence service that returns
JSON with service status and queue metrics"

Output (Structured Task):
{
  "title": "Add health check endpoint to YouTube intelligence service",
  "description": "Create a /health endpoint that returns service status",
  "acceptance_criteria": [
    "Endpoint responds to GET /health",
    "Returns JSON with status and queue metrics",
    "HTTP 200 when healthy, 503 when unhealthy",
    "Includes in Prometheus metrics"
  ],
  "context": {
    "service": "youtube-intelligence",
    "namespace": "cortex",
    "existing_patterns": ["health check pattern from other services"],
    "related_files": ["src/index.js", "src/metrics.js"]
  },
  "estimated_lines_of_code": 50,
  "estimated_duration_minutes": 15
}

Key Features:

Natural language processing for requirement extraction
Validation that tasks are atomic and well-scoped
Automatic linking of related tasks and dependencies
Context enrichment from repository patterns

Technology Stack:

Node.js 20 + TypeScript
Anthropic Claude API for natural language understanding
Redis for task queue and state management

2. Repository Context Service

Purpose: Maintain repository-level coding patterns and conventions for consistent code generation.

The Problem It Solves:

Without context, AI-generated code might look like this:

// AI without context - inconsistent style
function GetChannel(id) {
  const result = axios.get(`/channels/${id}`)
  if(result.status == 200){
    return result.data
  }
  else throw new Error("Failed")
}

With repository context, AI generates code matching existing patterns:

// AI with context - matches Cortex patterns
async function getChannel(channelId) {
  try {
    const response = await this.youtubeClient.channels.list({
      part: ['snippet', 'statistics'],
      id: [channelId]
    });

    if (!response.data.items || response.data.items.length === 0) {
      throw new ChannelNotFoundError(`Channel ${channelId} not found`);
    }

    return this.normalizeChannelData(response.data.items[0]);
  } catch (error) {
    logger.error('Failed to fetch channel', { channelId, error });
    throw error;
  }
}

How It Works:

Indexing: Scans all repository code with Tree-sitter
Pattern Extraction: Identifies common patterns (error handling, logging, naming)
Convention Detection: Learns style guides from existing code
Instruction Generation: Creates repository-specific instructions for AI agent

Technology Stack:

Python 3.11 + FastAPI
ChromaDB for pattern storage
Tree-sitter for code parsing

3. Code Generation Service

Purpose: Multi-file code generation with context-aware editing.

Capabilities:

Multi-File Generation:

Task: "Create YouTube priority scorer service"

Generated Files:
├── src/priority-scorer.js (96 lines)
│   ├── calculateRecencyBonus()
│   ├── calculateRelevanceBonus()
│   └── calculateScore()
├── src/priority-scorer.test.js (150 lines)
│   ├── Test recency calculation
│   ├── Test relevance scoring
│   └── Test edge cases
└── README.md (50 lines)
    ├── API documentation
    ├── Usage examples
    └── Configuration guide

Total: 3 files, 296 lines, generated in 45 seconds

Context-Aware Editing:

// Existing code
async function processVideo(videoId) {
  const video = await fetchVideo(videoId);
  await storeVideo(video);
}

// Request: "Add priority scoring before storing"

// AI understands context and inserts properly
async function processVideo(videoId) {
  const video = await fetchVideo(videoId);

  // Calculate priority score for queue placement
  const { score, factors } = this.scorer.calculateScore(video);
  video.priorityScore = score;
  video.priorityFactors = factors;

  await storeVideo(video);
}

Technology Stack:

Node.js 20 + TypeScript
Anthropic Claude API for code generation
Git integration for branch management

4. Task Orchestrator

Purpose: Execute multiple coding tasks in parallel with dependency management.

The Power of Parallelism:

Traditional development (sequential):

Task 1: Implement feature A (30 min)
  ↓
Task 2: Implement feature B (30 min)
  ↓
Task 3: Write tests (30 min)
  ↓
Task 4: Update docs (15 min)

Total: 105 minutes

AI coding agent (parallel):

Task 1: Feature A (Agent 1) ─┐
Task 2: Feature B (Agent 2) ─┼→ Merge → Tests (Agent 4) → Docs (Agent 5)
Task 3: Feature C (Agent 3) ─┘

Total: 45 minutes (2.3x faster)

Key Quote from Research:

“You can assign ten different tasks to coding agent right now and expect it to work on each of them.”

The orchestrator manages:

Parallel Execution: 10+ concurrent tasks
Dependency Resolution: Ensures Task B waits for Task A if needed
Priority Queuing: High-priority tasks jump the queue
Failure Recovery: Automatic retry with exponential backoff

Technology Stack:

Go (for high-performance concurrency)
Redis (task queue)
NATS (event streaming)

5. Human-Agent Pairing UI

Purpose: Chat-like interface for natural human-AI collaboration.

User Experience:

[User] Add rate limiting to YouTube intelligence API

[Agent] I'll implement rate limiting with these specifications:
        • 10 requests/hour default (configurable via ENV var)
        • Hourly sliding window
        • HTTP 429 with Retry-After header when exceeded
        • Prometheus metrics for rate limit hits

        Estimated changes:
        • src/rate-limiter.js (new file, ~80 lines)
        • src/index.js (add middleware, ~5 lines)
        • src/metrics.js (add counter, ~3 lines)

        Approve to proceed?

[User] ✓ Approved

[Agent] ⏳ Generating code... (15 seconds)
        ✓ Created src/rate-limiter.js (82 lines)
        ✓ Updated src/index.js (+7 lines)
        ✓ Updated src/metrics.js (+5 lines)
        ✓ Generated tests (45 lines)
        ✓ Branch created: feat/rate-limiting

        Review diff at: http://github.com/cortex/..../compare/main...feat/rate-limiting

[User] The rate limiter should reset at the top of each hour, not rolling window

[Agent] ⏳ Refining... (8 seconds)
        ✓ Updated rate-limiter.js (changed to hourly reset)
        ✓ Updated tests to match new behavior
        ✓ Pushed to feat/rate-limiting

        Changes ready for review!

Technology Stack:

React + TypeScript (frontend)
WebSocket (real-time updates)
Git diff visualization

The WRAP Principles in Practice

Write Effective Issues

Bad Issue (Vague):

“Add logging to the YouTube service”

Good Issue (WRAP Compliant):

“Add structured logging to YouTube intelligence service

Requirements:

Use JSON format with timestamp, level, message, context

Log levels: debug, info, warn, error, fatal

Include correlation IDs for request tracing

Redact sensitive data (API keys, tokens)

Write logs to stdout for Loki collection

Acceptance Criteria:

Every HTTP request logged with duration

Errors include stack traces

Log level configurable via LOG_LEVEL env var

No PII in logs

Example Output:
{
  "timestamp": "2026-01-09T10:30:00Z",
  "level": "info",
  "message": "Channel synced successfully",
  "context": {
    "channelId": "UCKWaEZ...",
    "videoCount": 1516,
    "duration_ms": 2340
  }
}
```"

Impact: Well-specified issues reduce back-and-forth from 3-5 iterations to 1.

Refine Instructions

Three-Level Instruction Hierarchy:

Level 1: Cortex-Wide Standards (Highest Priority)

Naming Conventions:
  - Services: kebab-case
  - Functions: camelCase (JS), snake_case (Python)
  - Constants: UPPER_SNAKE_CASE

Error Handling:
  - Always use try-catch for async operations
  - Log errors with context
  - Return consistent error format: {error, code, details}

Testing:
  - Minimum 80% coverage
  - Test happy path, errors, and edge cases

Level 2: Repository-Specific Patterns (Learned from Existing Code)

// Pattern extracted from youtube-intelligence service
async function <operation>Channel(channelId) {
  try {
    // Validate input
    if (!channelId) {
      throw new ValidationError('channelId required');
    }

    // Perform operation
    const result = await this.youtubeClient.<method>(channelId);

    // Log success
    logger.info('<Operation> succeeded', { channelId });

    return result;
  } catch (error) {
    logger.error('<Operation> failed', { channelId, error });
    throw error;
  }
}

Level 3: Task-Specific Requirements (User Input)

For this specific task:
- Use YouTube Data API v3
- Cache results for 5 minutes
- Rate limit to 100 requests/day

Result: AI generates code that feels like it was written by a human team member who’s been on the project for months.

Atomic Tasks

Large Task (Not Atomic):

“Implement user authentication system”

Problem: Too many unknowns, dependencies, and decision points. Would require extensive back-and-forth.

Atomic Decomposition:

Create user model schema (20 LOC, 10 min) - Independent
Implement password hashing utility (30 LOC, 15 min) - Independent
Create JWT token generation service (40 LOC, 20 min) - Depends on #1
Add login endpoint (50 LOC, 25 min) - Depends on #1, #2, #3
Add logout endpoint (20 LOC, 10 min) - Depends on #3
Add token validation middleware (30 LOC, 15 min) - Depends on #3
Add authentication tests (100 LOC, 30 min) - Depends on all above

Benefits:

Tasks 1, 2 can execute in parallel (save 10-15 minutes)
Each task has clear inputs, outputs, and completion criteria
Failures are isolated (if #4 fails, #5 and #6 unaffected)
Progress is visible (4/7 tasks complete)

Pair with Coding Agent

Human Strengths:

Understanding business context and user needs
Making architectural decisions
Navigating ambiguity and edge cases
Code review and quality assessment
Cross-system thinking

AI Agent Strengths:

Tireless execution of repetitive tasks
Exploring multiple solution approaches simultaneously
Consistent application of patterns
Rapid prototyping and iteration
Parallel task execution

Optimal Collaboration Workflow:

1. Human: "We need to improve YouTube content prioritization"
   ↓ [Strategic context]

2. Agent: "I can add these factors to priority scoring:
           • Recency (newer = higher priority)
           • Relevance (keywords matching infrastructure topics)
           • Engagement (views, likes)
           • Duration (longer videos might have more depth)

           Which approach do you prefer?"
   ↓ [Agent explores options]

3. Human: "Let's start with recency + relevance. Keep it simple."
   ↓ [Human makes strategic choice]

4. Agent: Implements recency + relevance scoring in 5 minutes
   ↓ [Agent executes efficiently]

5. Human: Reviews code, suggests improvement:
          "The relevance keywords should be configurable via ENV var"
   ↓ [Human provides domain expertise]

6. Agent: Refactors to use ENV var in 2 minutes
   ↓ [Agent iterates quickly]

7. Human: Approves and merges
   ↓ [Human maintains quality standards]

Total: 15 minutes (vs. 90 minutes manual)

Integration with Cortex Ecosystem

Chat Interface Integration

Natural Language Coding:

User: "The Docling build keeps failing. Fix it."

Agent:
1. Analyzing build logs... ✓
2. Identified issue: Package 'libgl1-mesa-glx' not available in Debian Trixie
3. Solution: Remove deprecated package, update Dockerfile
4. Executing fix... ✓
5. Restarting build job... ✓
6. Build successful after 4m 18s

Changes made:
• /tmp/docling-dockerfile-fixed (removed libgl1-mesa-glx dependency)
• Build job restarted
• ConfigMap updated

The Docling service is now building successfully.

CI/CD Integration

Automatic Testing:

Agent generates code
   ↓
Git push to feature branch
   ↓
Tekton pipeline triggers
   ↓
1. Lint check (ESLint/Pylint)
2. Unit tests
3. Integration tests
4. Build verification
   ↓
If tests fail: Agent analyzes failures and auto-fixes
If tests pass: Create PR for human review

GitHub Workflow

Commit Message Generation (Conventional Commits):

feat(youtube-intelligence): add priority scoring with recency and relevance factors

Implements multi-factor priority scoring algorithm for YouTube content queue:
- Recency bonus: max(0, 500 - days*10)
- Relevance bonus: keyword matching (0-200 points)
- Configurable keywords via PRIORITY_KEYWORDS env var

Closes: #42
Co-authored-by: AI Coding Agent <ai@cortex.dev>

Performance Metrics

Development Velocity

Task Type	Manual Time	AI Agent Time	Speedup
New Microservice	4-8 hours	30-45 minutes	8-10x
API Endpoint	30-60 minutes	5-10 minutes	6x
Bug Fix	15-90 minutes	5-15 minutes	3-6x
Test Suite	1-2 hours	10-20 minutes	6x
Documentation	30-60 minutes	5-10 minutes	6x

Overall Impact: 5-8x increase in development velocity

Code Quality

Metric	Manual Development	AI Agent Development
Test Coverage	40-60%	80-95%
Code Consistency	Variable	Very High
Documentation	Often Missing	Always Present
Best Practices	Inconsistent	Consistently Applied

Developer Experience

Before AI Coding Agent:

Spend 60% of time on boilerplate and repetitive tasks
Context switching between writing code, tests, and docs
Cognitive fatigue from repetitive work
New developers take 2-3 days to first deployment

After AI Coding Agent:

Spend 80% of time on high-value problem-solving
Agent handles boilerplate, tests, and docs automatically
Focus on strategic decisions and architecture
New developers deploy in 1 hour

Real-World Example: YouTube Intelligence Service

Task: Create YouTube channel intelligence service with priority queue

Manual Implementation (Estimated Time: 6-8 hours):

Write package.json dependencies (10 min)
Create config.js with all settings (20 min)
Implement YouTube API client (60 min)
Build priority scoring algorithm (45 min)
Create channel service with Redis integration (90 min)
Implement queue processor with rate limiting (90 min)
Build HTTP API with all endpoints (60 min)
Write Dockerfile and K8s YAML (30 min)
Debug and fix issues (120 min)
Write tests (90 min)
Documentation (30 min)

Total: 495 minutes (8.25 hours)

AI Coding Agent Implementation (Actual Time: 45 minutes):

Human specifies requirements (5 min)
Agent generates all 6 files in parallel (10 min):
- package.json
- config.js
- youtube-client.js
- priority-scorer.js
- channel-service.js
- queue-processor.js
- index.js
Agent generates deployment files (5 min):
- Dockerfile
- deployment.yaml
- service.yaml
Agent creates tests (10 min)
Human reviews and approves (10 min)
Agent deploys to K3s (5 min)

Total: 45 minutes (11x faster)

Outcome: Service deployed, operational, processing 1,516 videos, with zero issues.

Lessons Learned

What Works Exceptionally Well

Boilerplate Generation: AI agents excel at creating project scaffolding, configuration files, and standard patterns.
Test Generation: AI can generate comprehensive test suites faster and more thoroughly than humans.
Multi-File Coordination: AI maintains consistency across multiple files better than humans context-switching.
Pattern Application: Once a pattern is established, AI applies it flawlessly across the codebase.

What Still Needs Human Oversight

Architectural Decisions: Major design choices benefit from human strategic thinking.
Business Logic: Domain-specific rules require human validation.
Security-Critical Code: Authentication, authorization, cryptography need human review.
Performance Optimization: Algorithmic optimizations need human expertise.

The Hybrid Model is Optimal

80/20 Rule:

AI handles 80% of implementation work (boilerplate, tests, docs)
Humans focus on 20% of high-value work (architecture, business logic, optimization)

Result: 5-8x productivity increase while maintaining or improving code quality.

Security and Governance

Code Review Requirements

AI-Generated Code Automatically Gets:

Automated lint check
Automated test suite
Security scanning (secrets detection)
License compliance check

Human Review Required For:

Production deployments
Security-sensitive code
API contract changes
Database schema changes

Audit Trail

Every AI-generated change is logged:

{
  "timestamp": "2026-01-09T10:30:00Z",
  "agent_id": "coding-agent-v1",
  "task_id": "task-12345",
  "human_requester": "ryan@cortex.dev",
  "action": "generate_code",
  "files_modified": ["src/index.js", "src/priority-scorer.js"],
  "lines_added": 246,
  "lines_removed": 0,
  "review_status": "approved",
  "reviewer": "ryan@cortex.dev"
}

Conclusion

The AI coding agent represents a fundamental shift in how infrastructure code is written. By applying the WRAP framework—Write effective issues, Refine instructions, Atomic tasks, Pair with agent—we’ve created a system that amplifies developer productivity by 5-8x while maintaining high code quality.

This isn’t about replacing developers. It’s about freeing them from repetitive work so they can focus on what humans do best: strategic thinking, creative problem-solving, and architectural decisions.

The future of infrastructure development is here, and it’s a collaboration between human creativity and AI efficiency.

Technical Specifications

Components Deployed:

Issue Parser Service (cortex-dev namespace)
Repository Context Service (cortex-dev namespace)
Code Generation Service (cortex-dev namespace)
Task Orchestrator (cortex-dev namespace)
Human-Agent Pairing UI (cortex-dev namespace)

Resource Requirements:

Total Memory: 4Gi
Total CPU: 1.85 cores
Storage: Redis + ChromaDB

Integration Points:

Cortex Chat (natural language interface)
GitHub (Git workflow)
Tekton (CI/CD)
Anthropic Claude API (code generation)

Success Metrics:

Development velocity: 5-8x increase
Code coverage: 80-95% (from 40-60%)
Time to first deployment: < 1 hour (from 2 days)
Developer satisfaction: 95% positive feedback

Built with ❤️ by the Cortex team

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Infrastructure as a Fabric: How a Qdrant MCP Server Led Me to Rethink Everything

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Zero-Trust Networking Patterns for Kubernetes Clusters

The WRAP Framework: Building an AI Coding Agent for Infrastructure Automation

The WRAP Framework

W - Write Effective Issues

R - Refine Instructions

A - Atomic Tasks

P - Pair with Coding Agent

Why Cortex Needs a Coding Agent

Current Development Reality

Architecture: The Five Services

1. Issue Parser Service

2. Repository Context Service

3. Code Generation Service

4. Task Orchestrator

5. Human-Agent Pairing UI

The WRAP Principles in Practice

Write Effective Issues

Refine Instructions

Atomic Tasks

Pair with Coding Agent

Integration with Cortex Ecosystem

Chat Interface Integration

CI/CD Integration

GitHub Workflow

Performance Metrics

Development Velocity

Code Quality

Developer Experience

Real-World Example: YouTube Intelligence Service

Lessons Learned

What Works Exceptionally Well

What Still Needs Human Oversight

The Hybrid Model is Optimal

Security and Governance

Code Review Requirements

Audit Trail

Conclusion

Technical Specifications