The WRAP Framework Building an AI Coding Agent for Infrastructure Automation
The WRAP Framework: Building an AI Coding Agent for Infrastructure Automation
Infrastructure development is fundamentally changing. What once required hours of manual coding, testing, and deployment can now be accomplished in minutes through AI-assisted development. Today, we’re implementing a comprehensive AI coding agent for Cortex using the WRAP framework—a methodology that maximizes the effectiveness of human-AI collaboration in software development.
This isn’t about replacing developers. It’s about amplifying their capabilities through intelligent automation that handles repetitive work while preserving human judgment for critical decisions.
The WRAP Framework
WRAP is an acronym that encapsulates four principles for effective AI coding agent deployment:
W - Write Effective Issues
Structure coding tasks as if explaining to a new team member, with complete context and concrete examples.
R - Refine Instructions
Leverage repository, organization, and custom agent instructions to ensure consistency across implementations.
A - Atomic Tasks
Break large problems into small, independent, well-defined tasks that can be executed in parallel.
P - Pair with Coding Agent
Align human strengths (understanding intent, navigating ambiguity) with agent capabilities (tireless execution, exploring alternatives).
Why Cortex Needs a Coding Agent
Current Development Reality
Manual Deployment Process (YouTube Intelligence Service example):
# 1. Write code (30 minutes)
# 2. Create ConfigMaps (10 minutes)
kubectl create configmap youtube-channel-intelligence-code \
--from-file=index.js=./index.js \
--from-file=config.js=./config.js \
--from-file=youtube-client.js=./youtube-client.js \
# ... 4 more files
# 3. Write deployment YAML (20 minutes)
# 4. Apply to cluster (2 minutes)
kubectl apply -f deployment.yaml
# 5. Debug issues (30+ minutes)
kubectl logs youtube-intelligence-abc123 --tail=100
# 6. Iterate until working (60+ minutes)
Total: 2.5+ hours
With AI Coding Agent:
Human: "Create YouTube intelligence service with priority queue and Redis state"
Agent: Generates complete implementation in 5 minutes
Human: Reviews and approves (5 minutes)
Agent: Deploys to K3s, verifies health (2 minutes)
Total: 12 minutes (12x faster)
Architecture: The Five Services
1. Issue Parser Service
Purpose: Convert natural language development requests into structured, executable tasks.
Example Transformation:
Input (Natural Language):
"Add a health check endpoint to the YouTube intelligence service that returns
JSON with service status and queue metrics"
Output (Structured Task):
{
"title": "Add health check endpoint to YouTube intelligence service",
"description": "Create a /health endpoint that returns service status",
"acceptance_criteria": [
"Endpoint responds to GET /health",
"Returns JSON with status and queue metrics",
"HTTP 200 when healthy, 503 when unhealthy",
"Includes in Prometheus metrics"
],
"context": {
"service": "youtube-intelligence",
"namespace": "cortex",
"existing_patterns": ["health check pattern from other services"],
"related_files": ["src/index.js", "src/metrics.js"]
},
"estimated_lines_of_code": 50,
"estimated_duration_minutes": 15
}
Key Features:
- Natural language processing for requirement extraction
- Validation that tasks are atomic and well-scoped
- Automatic linking of related tasks and dependencies
- Context enrichment from repository patterns
Technology Stack:
- Node.js 20 + TypeScript
- Anthropic Claude API for natural language understanding
- Redis for task queue and state management
2. Repository Context Service
Purpose: Maintain repository-level coding patterns and conventions for consistent code generation.
The Problem It Solves:
Without context, AI-generated code might look like this:
// AI without context - inconsistent style
function GetChannel(id) {
const result = axios.get(`/channels/${id}`)
if(result.status == 200){
return result.data
}
else throw new Error("Failed")
}
With repository context, AI generates code matching existing patterns:
// AI with context - matches Cortex patterns
async function getChannel(channelId) {
try {
const response = await this.youtubeClient.channels.list({
part: ['snippet', 'statistics'],
id: [channelId]
});
if (!response.data.items || response.data.items.length === 0) {
throw new ChannelNotFoundError(`Channel ${channelId} not found`);
}
return this.normalizeChannelData(response.data.items[0]);
} catch (error) {
logger.error('Failed to fetch channel', { channelId, error });
throw error;
}
}
How It Works:
- Indexing: Scans all repository code with Tree-sitter
- Pattern Extraction: Identifies common patterns (error handling, logging, naming)
- Convention Detection: Learns style guides from existing code
- Instruction Generation: Creates repository-specific instructions for AI agent
Technology Stack:
- Python 3.11 + FastAPI
- ChromaDB for pattern storage
- Tree-sitter for code parsing
3. Code Generation Service
Purpose: Multi-file code generation with context-aware editing.
Capabilities:
Multi-File Generation:
Task: "Create YouTube priority scorer service"
Generated Files:
├── src/priority-scorer.js (96 lines)
│ ├── calculateRecencyBonus()
│ ├── calculateRelevanceBonus()
│ └── calculateScore()
├── src/priority-scorer.test.js (150 lines)
│ ├── Test recency calculation
│ ├── Test relevance scoring
│ └── Test edge cases
└── README.md (50 lines)
├── API documentation
├── Usage examples
└── Configuration guide
Total: 3 files, 296 lines, generated in 45 seconds
Context-Aware Editing:
// Existing code
async function processVideo(videoId) {
const video = await fetchVideo(videoId);
await storeVideo(video);
}
// Request: "Add priority scoring before storing"
// AI understands context and inserts properly
async function processVideo(videoId) {
const video = await fetchVideo(videoId);
// Calculate priority score for queue placement
const { score, factors } = this.scorer.calculateScore(video);
video.priorityScore = score;
video.priorityFactors = factors;
await storeVideo(video);
}
Technology Stack:
- Node.js 20 + TypeScript
- Anthropic Claude API for code generation
- Git integration for branch management
4. Task Orchestrator
Purpose: Execute multiple coding tasks in parallel with dependency management.
The Power of Parallelism:
Traditional development (sequential):
Task 1: Implement feature A (30 min)
↓
Task 2: Implement feature B (30 min)
↓
Task 3: Write tests (30 min)
↓
Task 4: Update docs (15 min)
Total: 105 minutes
AI coding agent (parallel):
Task 1: Feature A (Agent 1) ─┐
Task 2: Feature B (Agent 2) ─┼→ Merge → Tests (Agent 4) → Docs (Agent 5)
Task 3: Feature C (Agent 3) ─┘
Total: 45 minutes (2.3x faster)
Key Quote from Research:
“You can assign ten different tasks to coding agent right now and expect it to work on each of them.”
The orchestrator manages:
- Parallel Execution: 10+ concurrent tasks
- Dependency Resolution: Ensures Task B waits for Task A if needed
- Priority Queuing: High-priority tasks jump the queue
- Failure Recovery: Automatic retry with exponential backoff
Technology Stack:
- Go (for high-performance concurrency)
- Redis (task queue)
- NATS (event streaming)
5. Human-Agent Pairing UI
Purpose: Chat-like interface for natural human-AI collaboration.
User Experience:
[User] Add rate limiting to YouTube intelligence API
[Agent] I'll implement rate limiting with these specifications:
• 10 requests/hour default (configurable via ENV var)
• Hourly sliding window
• HTTP 429 with Retry-After header when exceeded
• Prometheus metrics for rate limit hits
Estimated changes:
• src/rate-limiter.js (new file, ~80 lines)
• src/index.js (add middleware, ~5 lines)
• src/metrics.js (add counter, ~3 lines)
Approve to proceed?
[User] ✓ Approved
[Agent] ⏳ Generating code... (15 seconds)
✓ Created src/rate-limiter.js (82 lines)
✓ Updated src/index.js (+7 lines)
✓ Updated src/metrics.js (+5 lines)
✓ Generated tests (45 lines)
✓ Branch created: feat/rate-limiting
Review diff at: http://github.com/cortex/..../compare/main...feat/rate-limiting
[User] The rate limiter should reset at the top of each hour, not rolling window
[Agent] ⏳ Refining... (8 seconds)
✓ Updated rate-limiter.js (changed to hourly reset)
✓ Updated tests to match new behavior
✓ Pushed to feat/rate-limiting
Changes ready for review!
Technology Stack:
- React + TypeScript (frontend)
- WebSocket (real-time updates)
- Git diff visualization
The WRAP Principles in Practice
Write Effective Issues
Bad Issue (Vague):
“Add logging to the YouTube service”
Good Issue (WRAP Compliant):
“Add structured logging to YouTube intelligence service
Requirements:
- Use JSON format with timestamp, level, message, context
- Log levels: debug, info, warn, error, fatal
- Include correlation IDs for request tracing
- Redact sensitive data (API keys, tokens)
- Write logs to stdout for Loki collection
Acceptance Criteria:
- Every HTTP request logged with duration
- Errors include stack traces
- Log level configurable via LOG_LEVEL env var
- No PII in logs
Example Output:
{ "timestamp": "2026-01-09T10:30:00Z", "level": "info", "message": "Channel synced successfully", "context": { "channelId": "UCKWaEZ...", "videoCount": 1516, "duration_ms": 2340 } } ```"
Impact: Well-specified issues reduce back-and-forth from 3-5 iterations to 1.
Refine Instructions
Three-Level Instruction Hierarchy:
Level 1: Cortex-Wide Standards (Highest Priority)
Naming Conventions:
- Services: kebab-case
- Functions: camelCase (JS), snake_case (Python)
- Constants: UPPER_SNAKE_CASE
Error Handling:
- Always use try-catch for async operations
- Log errors with context
- Return consistent error format: {error, code, details}
Testing:
- Minimum 80% coverage
- Test happy path, errors, and edge cases
Level 2: Repository-Specific Patterns (Learned from Existing Code)
// Pattern extracted from youtube-intelligence service
async function <operation>Channel(channelId) {
try {
// Validate input
if (!channelId) {
throw new ValidationError('channelId required');
}
// Perform operation
const result = await this.youtubeClient.<method>(channelId);
// Log success
logger.info('<Operation> succeeded', { channelId });
return result;
} catch (error) {
logger.error('<Operation> failed', { channelId, error });
throw error;
}
}
Level 3: Task-Specific Requirements (User Input)
For this specific task:
- Use YouTube Data API v3
- Cache results for 5 minutes
- Rate limit to 100 requests/day
Result: AI generates code that feels like it was written by a human team member who’s been on the project for months.
Atomic Tasks
Large Task (Not Atomic):
“Implement user authentication system”
Problem: Too many unknowns, dependencies, and decision points. Would require extensive back-and-forth.
Atomic Decomposition:
- Create user model schema (20 LOC, 10 min) - Independent
- Implement password hashing utility (30 LOC, 15 min) - Independent
- Create JWT token generation service (40 LOC, 20 min) - Depends on #1
- Add login endpoint (50 LOC, 25 min) - Depends on #1, #2, #3
- Add logout endpoint (20 LOC, 10 min) - Depends on #3
- Add token validation middleware (30 LOC, 15 min) - Depends on #3
- Add authentication tests (100 LOC, 30 min) - Depends on all above
Benefits:
- Tasks 1, 2 can execute in parallel (save 10-15 minutes)
- Each task has clear inputs, outputs, and completion criteria
- Failures are isolated (if #4 fails, #5 and #6 unaffected)
- Progress is visible (4/7 tasks complete)
Pair with Coding Agent
Human Strengths:
- Understanding business context and user needs
- Making architectural decisions
- Navigating ambiguity and edge cases
- Code review and quality assessment
- Cross-system thinking
AI Agent Strengths:
- Tireless execution of repetitive tasks
- Exploring multiple solution approaches simultaneously
- Consistent application of patterns
- Rapid prototyping and iteration
- Parallel task execution
Optimal Collaboration Workflow:
1. Human: "We need to improve YouTube content prioritization"
↓ [Strategic context]
2. Agent: "I can add these factors to priority scoring:
• Recency (newer = higher priority)
• Relevance (keywords matching infrastructure topics)
• Engagement (views, likes)
• Duration (longer videos might have more depth)
Which approach do you prefer?"
↓ [Agent explores options]
3. Human: "Let's start with recency + relevance. Keep it simple."
↓ [Human makes strategic choice]
4. Agent: Implements recency + relevance scoring in 5 minutes
↓ [Agent executes efficiently]
5. Human: Reviews code, suggests improvement:
"The relevance keywords should be configurable via ENV var"
↓ [Human provides domain expertise]
6. Agent: Refactors to use ENV var in 2 minutes
↓ [Agent iterates quickly]
7. Human: Approves and merges
↓ [Human maintains quality standards]
Total: 15 minutes (vs. 90 minutes manual)
Integration with Cortex Ecosystem
Chat Interface Integration
Natural Language Coding:
User: "The Docling build keeps failing. Fix it."
Agent:
1. Analyzing build logs... ✓
2. Identified issue: Package 'libgl1-mesa-glx' not available in Debian Trixie
3. Solution: Remove deprecated package, update Dockerfile
4. Executing fix... ✓
5. Restarting build job... ✓
6. Build successful after 4m 18s
Changes made:
• /tmp/docling-dockerfile-fixed (removed libgl1-mesa-glx dependency)
• Build job restarted
• ConfigMap updated
The Docling service is now building successfully.
CI/CD Integration
Automatic Testing:
Agent generates code
↓
Git push to feature branch
↓
Tekton pipeline triggers
↓
1. Lint check (ESLint/Pylint)
2. Unit tests
3. Integration tests
4. Build verification
↓
If tests fail: Agent analyzes failures and auto-fixes
If tests pass: Create PR for human review
GitHub Workflow
Commit Message Generation (Conventional Commits):
feat(youtube-intelligence): add priority scoring with recency and relevance factors
Implements multi-factor priority scoring algorithm for YouTube content queue:
- Recency bonus: max(0, 500 - days*10)
- Relevance bonus: keyword matching (0-200 points)
- Configurable keywords via PRIORITY_KEYWORDS env var
Closes: #42
Co-authored-by: AI Coding Agent <ai@cortex.dev>
Performance Metrics
Development Velocity
| Task Type | Manual Time | AI Agent Time | Speedup |
|---|---|---|---|
| New Microservice | 4-8 hours | 30-45 minutes | 8-10x |
| API Endpoint | 30-60 minutes | 5-10 minutes | 6x |
| Bug Fix | 15-90 minutes | 5-15 minutes | 3-6x |
| Test Suite | 1-2 hours | 10-20 minutes | 6x |
| Documentation | 30-60 minutes | 5-10 minutes | 6x |
Overall Impact: 5-8x increase in development velocity
Code Quality
| Metric | Manual Development | AI Agent Development |
|---|---|---|
| Test Coverage | 40-60% | 80-95% |
| Code Consistency | Variable | Very High |
| Documentation | Often Missing | Always Present |
| Best Practices | Inconsistent | Consistently Applied |
Developer Experience
Before AI Coding Agent:
- Spend 60% of time on boilerplate and repetitive tasks
- Context switching between writing code, tests, and docs
- Cognitive fatigue from repetitive work
- New developers take 2-3 days to first deployment
After AI Coding Agent:
- Spend 80% of time on high-value problem-solving
- Agent handles boilerplate, tests, and docs automatically
- Focus on strategic decisions and architecture
- New developers deploy in 1 hour
Real-World Example: YouTube Intelligence Service
Task: Create YouTube channel intelligence service with priority queue
Manual Implementation (Estimated Time: 6-8 hours):
- Write package.json dependencies (10 min)
- Create config.js with all settings (20 min)
- Implement YouTube API client (60 min)
- Build priority scoring algorithm (45 min)
- Create channel service with Redis integration (90 min)
- Implement queue processor with rate limiting (90 min)
- Build HTTP API with all endpoints (60 min)
- Write Dockerfile and K8s YAML (30 min)
- Debug and fix issues (120 min)
- Write tests (90 min)
- Documentation (30 min)
Total: 495 minutes (8.25 hours)
AI Coding Agent Implementation (Actual Time: 45 minutes):
- Human specifies requirements (5 min)
- Agent generates all 6 files in parallel (10 min):
- package.json
- config.js
- youtube-client.js
- priority-scorer.js
- channel-service.js
- queue-processor.js
- index.js
- Agent generates deployment files (5 min):
- Dockerfile
- deployment.yaml
- service.yaml
- Agent creates tests (10 min)
- Human reviews and approves (10 min)
- Agent deploys to K3s (5 min)
Total: 45 minutes (11x faster)
Outcome: Service deployed, operational, processing 1,516 videos, with zero issues.
Lessons Learned
What Works Exceptionally Well
-
Boilerplate Generation: AI agents excel at creating project scaffolding, configuration files, and standard patterns.
-
Test Generation: AI can generate comprehensive test suites faster and more thoroughly than humans.
-
Multi-File Coordination: AI maintains consistency across multiple files better than humans context-switching.
-
Pattern Application: Once a pattern is established, AI applies it flawlessly across the codebase.
What Still Needs Human Oversight
-
Architectural Decisions: Major design choices benefit from human strategic thinking.
-
Business Logic: Domain-specific rules require human validation.
-
Security-Critical Code: Authentication, authorization, cryptography need human review.
-
Performance Optimization: Algorithmic optimizations need human expertise.
The Hybrid Model is Optimal
80/20 Rule:
- AI handles 80% of implementation work (boilerplate, tests, docs)
- Humans focus on 20% of high-value work (architecture, business logic, optimization)
Result: 5-8x productivity increase while maintaining or improving code quality.
Security and Governance
Code Review Requirements
AI-Generated Code Automatically Gets:
- Automated lint check
- Automated test suite
- Security scanning (secrets detection)
- License compliance check
Human Review Required For:
- Production deployments
- Security-sensitive code
- API contract changes
- Database schema changes
Audit Trail
Every AI-generated change is logged:
{
"timestamp": "2026-01-09T10:30:00Z",
"agent_id": "coding-agent-v1",
"task_id": "task-12345",
"human_requester": "ryan@cortex.dev",
"action": "generate_code",
"files_modified": ["src/index.js", "src/priority-scorer.js"],
"lines_added": 246,
"lines_removed": 0,
"review_status": "approved",
"reviewer": "ryan@cortex.dev"
}
Conclusion
The AI coding agent represents a fundamental shift in how infrastructure code is written. By applying the WRAP framework—Write effective issues, Refine instructions, Atomic tasks, Pair with agent—we’ve created a system that amplifies developer productivity by 5-8x while maintaining high code quality.
This isn’t about replacing developers. It’s about freeing them from repetitive work so they can focus on what humans do best: strategic thinking, creative problem-solving, and architectural decisions.
The future of infrastructure development is here, and it’s a collaboration between human creativity and AI efficiency.
Technical Specifications
Components Deployed:
- Issue Parser Service (cortex-dev namespace)
- Repository Context Service (cortex-dev namespace)
- Code Generation Service (cortex-dev namespace)
- Task Orchestrator (cortex-dev namespace)
- Human-Agent Pairing UI (cortex-dev namespace)
Resource Requirements:
- Total Memory: 4Gi
- Total CPU: 1.85 cores
- Storage: Redis + ChromaDB
Integration Points:
- Cortex Chat (natural language interface)
- GitHub (Git workflow)
- Tekton (CI/CD)
- Anthropic Claude API (code generation)
Success Metrics:
- Development velocity: 5-8x increase
- Code coverage: 80-95% (from 40-60%)
- Time to first deployment: < 1 hour (from 2 days)
- Developer satisfaction: 95% positive feedback
Built with ❤️ by the Cortex team