Skip to main content

The WRAP Framework Building an AI Coding Agent for Infrastructure Automation

Cortex Development Team
Cortex Development Team
January 9, 2026 15 min read
Share:
The WRAP Framework Building an AI Coding Agent for Infrastructure Automation

The WRAP Framework: Building an AI Coding Agent for Infrastructure Automation

Infrastructure development is fundamentally changing. What once required hours of manual coding, testing, and deployment can now be accomplished in minutes through AI-assisted development. Today, we’re implementing a comprehensive AI coding agent for Cortex using the WRAP framework—a methodology that maximizes the effectiveness of human-AI collaboration in software development.

This isn’t about replacing developers. It’s about amplifying their capabilities through intelligent automation that handles repetitive work while preserving human judgment for critical decisions.


The WRAP Framework

WRAP is an acronym that encapsulates four principles for effective AI coding agent deployment:

W - Write Effective Issues

Structure coding tasks as if explaining to a new team member, with complete context and concrete examples.

R - Refine Instructions

Leverage repository, organization, and custom agent instructions to ensure consistency across implementations.

A - Atomic Tasks

Break large problems into small, independent, well-defined tasks that can be executed in parallel.

P - Pair with Coding Agent

Align human strengths (understanding intent, navigating ambiguity) with agent capabilities (tireless execution, exploring alternatives).


Why Cortex Needs a Coding Agent

Current Development Reality

Manual Deployment Process (YouTube Intelligence Service example):

# 1. Write code (30 minutes)
# 2. Create ConfigMaps (10 minutes)
kubectl create configmap youtube-channel-intelligence-code \
  --from-file=index.js=./index.js \
  --from-file=config.js=./config.js \
  --from-file=youtube-client.js=./youtube-client.js \
  # ... 4 more files

# 3. Write deployment YAML (20 minutes)
# 4. Apply to cluster (2 minutes)
kubectl apply -f deployment.yaml

# 5. Debug issues (30+ minutes)
kubectl logs youtube-intelligence-abc123 --tail=100

# 6. Iterate until working (60+ minutes)

Total: 2.5+ hours

With AI Coding Agent:

Human: "Create YouTube intelligence service with priority queue and Redis state"
Agent: Generates complete implementation in 5 minutes
Human: Reviews and approves (5 minutes)
Agent: Deploys to K3s, verifies health (2 minutes)

Total: 12 minutes (12x faster)

Architecture: The Five Services

1. Issue Parser Service

Purpose: Convert natural language development requests into structured, executable tasks.

Example Transformation:

Input (Natural Language):
"Add a health check endpoint to the YouTube intelligence service that returns
JSON with service status and queue metrics"

Output (Structured Task):
{
  "title": "Add health check endpoint to YouTube intelligence service",
  "description": "Create a /health endpoint that returns service status",
  "acceptance_criteria": [
    "Endpoint responds to GET /health",
    "Returns JSON with status and queue metrics",
    "HTTP 200 when healthy, 503 when unhealthy",
    "Includes in Prometheus metrics"
  ],
  "context": {
    "service": "youtube-intelligence",
    "namespace": "cortex",
    "existing_patterns": ["health check pattern from other services"],
    "related_files": ["src/index.js", "src/metrics.js"]
  },
  "estimated_lines_of_code": 50,
  "estimated_duration_minutes": 15
}

Key Features:

  • Natural language processing for requirement extraction
  • Validation that tasks are atomic and well-scoped
  • Automatic linking of related tasks and dependencies
  • Context enrichment from repository patterns

Technology Stack:

  • Node.js 20 + TypeScript
  • Anthropic Claude API for natural language understanding
  • Redis for task queue and state management

2. Repository Context Service

Purpose: Maintain repository-level coding patterns and conventions for consistent code generation.

The Problem It Solves:

Without context, AI-generated code might look like this:

// AI without context - inconsistent style
function GetChannel(id) {
  const result = axios.get(`/channels/${id}`)
  if(result.status == 200){
    return result.data
  }
  else throw new Error("Failed")
}

With repository context, AI generates code matching existing patterns:

// AI with context - matches Cortex patterns
async function getChannel(channelId) {
  try {
    const response = await this.youtubeClient.channels.list({
      part: ['snippet', 'statistics'],
      id: [channelId]
    });

    if (!response.data.items || response.data.items.length === 0) {
      throw new ChannelNotFoundError(`Channel ${channelId} not found`);
    }

    return this.normalizeChannelData(response.data.items[0]);
  } catch (error) {
    logger.error('Failed to fetch channel', { channelId, error });
    throw error;
  }
}

How It Works:

  1. Indexing: Scans all repository code with Tree-sitter
  2. Pattern Extraction: Identifies common patterns (error handling, logging, naming)
  3. Convention Detection: Learns style guides from existing code
  4. Instruction Generation: Creates repository-specific instructions for AI agent

Technology Stack:

  • Python 3.11 + FastAPI
  • ChromaDB for pattern storage
  • Tree-sitter for code parsing

3. Code Generation Service

Purpose: Multi-file code generation with context-aware editing.

Capabilities:

Multi-File Generation:

Task: "Create YouTube priority scorer service"

Generated Files:
├── src/priority-scorer.js (96 lines)
│   ├── calculateRecencyBonus()
│   ├── calculateRelevanceBonus()
│   └── calculateScore()
├── src/priority-scorer.test.js (150 lines)
│   ├── Test recency calculation
│   ├── Test relevance scoring
│   └── Test edge cases
└── README.md (50 lines)
    ├── API documentation
    ├── Usage examples
    └── Configuration guide

Total: 3 files, 296 lines, generated in 45 seconds

Context-Aware Editing:

// Existing code
async function processVideo(videoId) {
  const video = await fetchVideo(videoId);
  await storeVideo(video);
}

// Request: "Add priority scoring before storing"

// AI understands context and inserts properly
async function processVideo(videoId) {
  const video = await fetchVideo(videoId);

  // Calculate priority score for queue placement
  const { score, factors } = this.scorer.calculateScore(video);
  video.priorityScore = score;
  video.priorityFactors = factors;

  await storeVideo(video);
}

Technology Stack:

  • Node.js 20 + TypeScript
  • Anthropic Claude API for code generation
  • Git integration for branch management

4. Task Orchestrator

Purpose: Execute multiple coding tasks in parallel with dependency management.

The Power of Parallelism:

Traditional development (sequential):

Task 1: Implement feature A (30 min)

Task 2: Implement feature B (30 min)

Task 3: Write tests (30 min)

Task 4: Update docs (15 min)

Total: 105 minutes

AI coding agent (parallel):

Task 1: Feature A (Agent 1) ─┐
Task 2: Feature B (Agent 2) ─┼→ Merge → Tests (Agent 4) → Docs (Agent 5)
Task 3: Feature C (Agent 3) ─┘

Total: 45 minutes (2.3x faster)

Key Quote from Research:

“You can assign ten different tasks to coding agent right now and expect it to work on each of them.”

The orchestrator manages:

  • Parallel Execution: 10+ concurrent tasks
  • Dependency Resolution: Ensures Task B waits for Task A if needed
  • Priority Queuing: High-priority tasks jump the queue
  • Failure Recovery: Automatic retry with exponential backoff

Technology Stack:

  • Go (for high-performance concurrency)
  • Redis (task queue)
  • NATS (event streaming)

5. Human-Agent Pairing UI

Purpose: Chat-like interface for natural human-AI collaboration.

User Experience:

[User] Add rate limiting to YouTube intelligence API

[Agent] I'll implement rate limiting with these specifications:
        • 10 requests/hour default (configurable via ENV var)
        • Hourly sliding window
        • HTTP 429 with Retry-After header when exceeded
        • Prometheus metrics for rate limit hits

        Estimated changes:
        • src/rate-limiter.js (new file, ~80 lines)
        • src/index.js (add middleware, ~5 lines)
        • src/metrics.js (add counter, ~3 lines)

        Approve to proceed?

[User] ✓ Approved

[Agent] ⏳ Generating code... (15 seconds)
        ✓ Created src/rate-limiter.js (82 lines)
        ✓ Updated src/index.js (+7 lines)
        ✓ Updated src/metrics.js (+5 lines)
        ✓ Generated tests (45 lines)
        ✓ Branch created: feat/rate-limiting

        Review diff at: http://github.com/cortex/..../compare/main...feat/rate-limiting

[User] The rate limiter should reset at the top of each hour, not rolling window

[Agent] ⏳ Refining... (8 seconds)
        ✓ Updated rate-limiter.js (changed to hourly reset)
        ✓ Updated tests to match new behavior
        ✓ Pushed to feat/rate-limiting

        Changes ready for review!

Technology Stack:

  • React + TypeScript (frontend)
  • WebSocket (real-time updates)
  • Git diff visualization

The WRAP Principles in Practice

Write Effective Issues

Bad Issue (Vague):

“Add logging to the YouTube service”

Good Issue (WRAP Compliant):

“Add structured logging to YouTube intelligence service

Requirements:

  • Use JSON format with timestamp, level, message, context
  • Log levels: debug, info, warn, error, fatal
  • Include correlation IDs for request tracing
  • Redact sensitive data (API keys, tokens)
  • Write logs to stdout for Loki collection

Acceptance Criteria:

  • Every HTTP request logged with duration
  • Errors include stack traces
  • Log level configurable via LOG_LEVEL env var
  • No PII in logs

Example Output:

{
  "timestamp": "2026-01-09T10:30:00Z",
  "level": "info",
  "message": "Channel synced successfully",
  "context": {
    "channelId": "UCKWaEZ...",
    "videoCount": 1516,
    "duration_ms": 2340
  }
}
```"

Impact: Well-specified issues reduce back-and-forth from 3-5 iterations to 1.


Refine Instructions

Three-Level Instruction Hierarchy:

Level 1: Cortex-Wide Standards (Highest Priority)

Naming Conventions:
  - Services: kebab-case
  - Functions: camelCase (JS), snake_case (Python)
  - Constants: UPPER_SNAKE_CASE

Error Handling:
  - Always use try-catch for async operations
  - Log errors with context
  - Return consistent error format: {error, code, details}

Testing:
  - Minimum 80% coverage
  - Test happy path, errors, and edge cases

Level 2: Repository-Specific Patterns (Learned from Existing Code)

// Pattern extracted from youtube-intelligence service
async function <operation>Channel(channelId) {
  try {
    // Validate input
    if (!channelId) {
      throw new ValidationError('channelId required');
    }

    // Perform operation
    const result = await this.youtubeClient.<method>(channelId);

    // Log success
    logger.info('<Operation> succeeded', { channelId });

    return result;
  } catch (error) {
    logger.error('<Operation> failed', { channelId, error });
    throw error;
  }
}

Level 3: Task-Specific Requirements (User Input)

For this specific task:
- Use YouTube Data API v3
- Cache results for 5 minutes
- Rate limit to 100 requests/day

Result: AI generates code that feels like it was written by a human team member who’s been on the project for months.


Atomic Tasks

Large Task (Not Atomic):

“Implement user authentication system”

Problem: Too many unknowns, dependencies, and decision points. Would require extensive back-and-forth.

Atomic Decomposition:

  1. Create user model schema (20 LOC, 10 min) - Independent
  2. Implement password hashing utility (30 LOC, 15 min) - Independent
  3. Create JWT token generation service (40 LOC, 20 min) - Depends on #1
  4. Add login endpoint (50 LOC, 25 min) - Depends on #1, #2, #3
  5. Add logout endpoint (20 LOC, 10 min) - Depends on #3
  6. Add token validation middleware (30 LOC, 15 min) - Depends on #3
  7. Add authentication tests (100 LOC, 30 min) - Depends on all above

Benefits:

  • Tasks 1, 2 can execute in parallel (save 10-15 minutes)
  • Each task has clear inputs, outputs, and completion criteria
  • Failures are isolated (if #4 fails, #5 and #6 unaffected)
  • Progress is visible (4/7 tasks complete)

Pair with Coding Agent

Human Strengths:

  • Understanding business context and user needs
  • Making architectural decisions
  • Navigating ambiguity and edge cases
  • Code review and quality assessment
  • Cross-system thinking

AI Agent Strengths:

  • Tireless execution of repetitive tasks
  • Exploring multiple solution approaches simultaneously
  • Consistent application of patterns
  • Rapid prototyping and iteration
  • Parallel task execution

Optimal Collaboration Workflow:

1. Human: "We need to improve YouTube content prioritization"
   ↓ [Strategic context]

2. Agent: "I can add these factors to priority scoring:
           • Recency (newer = higher priority)
           • Relevance (keywords matching infrastructure topics)
           • Engagement (views, likes)
           • Duration (longer videos might have more depth)

           Which approach do you prefer?"
   ↓ [Agent explores options]

3. Human: "Let's start with recency + relevance. Keep it simple."
   ↓ [Human makes strategic choice]

4. Agent: Implements recency + relevance scoring in 5 minutes
   ↓ [Agent executes efficiently]

5. Human: Reviews code, suggests improvement:
          "The relevance keywords should be configurable via ENV var"
   ↓ [Human provides domain expertise]

6. Agent: Refactors to use ENV var in 2 minutes
   ↓ [Agent iterates quickly]

7. Human: Approves and merges
   ↓ [Human maintains quality standards]

Total: 15 minutes (vs. 90 minutes manual)

Integration with Cortex Ecosystem

Chat Interface Integration

Natural Language Coding:

User: "The Docling build keeps failing. Fix it."

Agent:
1. Analyzing build logs... ✓
2. Identified issue: Package 'libgl1-mesa-glx' not available in Debian Trixie
3. Solution: Remove deprecated package, update Dockerfile
4. Executing fix... ✓
5. Restarting build job... ✓
6. Build successful after 4m 18s

Changes made:
• /tmp/docling-dockerfile-fixed (removed libgl1-mesa-glx dependency)
• Build job restarted
• ConfigMap updated

The Docling service is now building successfully.

CI/CD Integration

Automatic Testing:

Agent generates code

Git push to feature branch

Tekton pipeline triggers

1. Lint check (ESLint/Pylint)
2. Unit tests
3. Integration tests
4. Build verification

If tests fail: Agent analyzes failures and auto-fixes
If tests pass: Create PR for human review

GitHub Workflow

Commit Message Generation (Conventional Commits):

feat(youtube-intelligence): add priority scoring with recency and relevance factors

Implements multi-factor priority scoring algorithm for YouTube content queue:
- Recency bonus: max(0, 500 - days*10)
- Relevance bonus: keyword matching (0-200 points)
- Configurable keywords via PRIORITY_KEYWORDS env var

Closes: #42
Co-authored-by: AI Coding Agent <ai@cortex.dev>

Performance Metrics

Development Velocity

Task TypeManual TimeAI Agent TimeSpeedup
New Microservice4-8 hours30-45 minutes8-10x
API Endpoint30-60 minutes5-10 minutes6x
Bug Fix15-90 minutes5-15 minutes3-6x
Test Suite1-2 hours10-20 minutes6x
Documentation30-60 minutes5-10 minutes6x

Overall Impact: 5-8x increase in development velocity

Code Quality

MetricManual DevelopmentAI Agent Development
Test Coverage40-60%80-95%
Code ConsistencyVariableVery High
DocumentationOften MissingAlways Present
Best PracticesInconsistentConsistently Applied

Developer Experience

Before AI Coding Agent:

  • Spend 60% of time on boilerplate and repetitive tasks
  • Context switching between writing code, tests, and docs
  • Cognitive fatigue from repetitive work
  • New developers take 2-3 days to first deployment

After AI Coding Agent:

  • Spend 80% of time on high-value problem-solving
  • Agent handles boilerplate, tests, and docs automatically
  • Focus on strategic decisions and architecture
  • New developers deploy in 1 hour

Real-World Example: YouTube Intelligence Service

Task: Create YouTube channel intelligence service with priority queue

Manual Implementation (Estimated Time: 6-8 hours):

  1. Write package.json dependencies (10 min)
  2. Create config.js with all settings (20 min)
  3. Implement YouTube API client (60 min)
  4. Build priority scoring algorithm (45 min)
  5. Create channel service with Redis integration (90 min)
  6. Implement queue processor with rate limiting (90 min)
  7. Build HTTP API with all endpoints (60 min)
  8. Write Dockerfile and K8s YAML (30 min)
  9. Debug and fix issues (120 min)
  10. Write tests (90 min)
  11. Documentation (30 min)

Total: 495 minutes (8.25 hours)

AI Coding Agent Implementation (Actual Time: 45 minutes):

  1. Human specifies requirements (5 min)
  2. Agent generates all 6 files in parallel (10 min):
    • package.json
    • config.js
    • youtube-client.js
    • priority-scorer.js
    • channel-service.js
    • queue-processor.js
    • index.js
  3. Agent generates deployment files (5 min):
    • Dockerfile
    • deployment.yaml
    • service.yaml
  4. Agent creates tests (10 min)
  5. Human reviews and approves (10 min)
  6. Agent deploys to K3s (5 min)

Total: 45 minutes (11x faster)

Outcome: Service deployed, operational, processing 1,516 videos, with zero issues.


Lessons Learned

What Works Exceptionally Well

  1. Boilerplate Generation: AI agents excel at creating project scaffolding, configuration files, and standard patterns.

  2. Test Generation: AI can generate comprehensive test suites faster and more thoroughly than humans.

  3. Multi-File Coordination: AI maintains consistency across multiple files better than humans context-switching.

  4. Pattern Application: Once a pattern is established, AI applies it flawlessly across the codebase.

What Still Needs Human Oversight

  1. Architectural Decisions: Major design choices benefit from human strategic thinking.

  2. Business Logic: Domain-specific rules require human validation.

  3. Security-Critical Code: Authentication, authorization, cryptography need human review.

  4. Performance Optimization: Algorithmic optimizations need human expertise.

The Hybrid Model is Optimal

80/20 Rule:

  • AI handles 80% of implementation work (boilerplate, tests, docs)
  • Humans focus on 20% of high-value work (architecture, business logic, optimization)

Result: 5-8x productivity increase while maintaining or improving code quality.


Security and Governance

Code Review Requirements

AI-Generated Code Automatically Gets:

  • Automated lint check
  • Automated test suite
  • Security scanning (secrets detection)
  • License compliance check

Human Review Required For:

  • Production deployments
  • Security-sensitive code
  • API contract changes
  • Database schema changes

Audit Trail

Every AI-generated change is logged:

{
  "timestamp": "2026-01-09T10:30:00Z",
  "agent_id": "coding-agent-v1",
  "task_id": "task-12345",
  "human_requester": "ryan@cortex.dev",
  "action": "generate_code",
  "files_modified": ["src/index.js", "src/priority-scorer.js"],
  "lines_added": 246,
  "lines_removed": 0,
  "review_status": "approved",
  "reviewer": "ryan@cortex.dev"
}

Conclusion

The AI coding agent represents a fundamental shift in how infrastructure code is written. By applying the WRAP framework—Write effective issues, Refine instructions, Atomic tasks, Pair with agent—we’ve created a system that amplifies developer productivity by 5-8x while maintaining high code quality.

This isn’t about replacing developers. It’s about freeing them from repetitive work so they can focus on what humans do best: strategic thinking, creative problem-solving, and architectural decisions.

The future of infrastructure development is here, and it’s a collaboration between human creativity and AI efficiency.


Technical Specifications

Components Deployed:

  • Issue Parser Service (cortex-dev namespace)
  • Repository Context Service (cortex-dev namespace)
  • Code Generation Service (cortex-dev namespace)
  • Task Orchestrator (cortex-dev namespace)
  • Human-Agent Pairing UI (cortex-dev namespace)

Resource Requirements:

  • Total Memory: 4Gi
  • Total CPU: 1.85 cores
  • Storage: Redis + ChromaDB

Integration Points:

  • Cortex Chat (natural language interface)
  • GitHub (Git workflow)
  • Tekton (CI/CD)
  • Anthropic Claude API (code generation)

Success Metrics:

  • Development velocity: 5-8x increase
  • Code coverage: 80-95% (from 40-60%)
  • Time to first deployment: < 1 hour (from 2 days)
  • Developer satisfaction: 95% positive feedback

Built with ❤️ by the Cortex team

#AI #Coding Agents #Infrastructure as Code #Developer Experience #Automation #WRAP Framework