Infrastructure as Training Data: When AI Systems Learn Like Organizations Do
The Conversation That Changed Everything
I had an opportunity to sit down with Steven Dastoor this week to discuss his journey building technology companies over the past decades.
The arc of his career tells a remarkable story about organizational evolution: starting with Telcologix (TLX), then NetTel, followed by Citon Computer Corp, then ACP CreativeIT, and what is now Tusker. Each transition wasn’t a pivot — it was a graduation.
One of the many things we spoke about was both vision and culture, but what fascinated me most was the pattern underneath it all.
Steven didn’t start each company with a grand master plan and a team of specialists. He started with general capabilities, deployed them into real-world operations, and let expertise crystallize naturally through experience. TLX learned from operating, NetTel graduated from those learnings, Citon specialized further, ACP CreativeIT refined the domain expertise, and Tusker represents the distilled knowledge of decades of real-world operation.
Each company was a layer that learned, specialized, and graduated into the next evolution.
What struck me most was how the best technology companies don’t just build systems — they cultivate environments where expertise emerges organically. Steven’s journey wasn’t about having all the answers upfront. It was about creating the conditions for specialization to develop through real-world operation.
That conversation got me thinking about how we architect AI systems.
We typically build AI the opposite way: train a specialist first, then deploy it. But what if infrastructure could mirror how Steven actually built these companies — how great organizations actually develop expertise?
The Evolution Path
Here’s an idea I’ve been prototyping:
Instead of centralized AI services that try to do everything, deploy self-contained “layers” — infrastructure, security, networking — each with its own stack that:
- Scales to zero when idle (why pay for what you’re not using?)
- Bursts on demand when activated
- Learns through operation — capturing every routing decision, every successful pattern
- Graduates into specialization when it’s proven itself
Architecture Transformation
The layer accumulates domain expertise the same way a team does: through reps, feedback loops, and pattern recognition. When it’s handled thousands of real tasks successfully, you distill that operational knowledge into a specialized model.
From Centralized to Distributed
Before (Centralized):
- Shared infrastructure always running
- Single point of failure
- Inefficient resource usage
- Generalized approach
After (Distributed Burst):
- Independent layers with their own stacks
- Each layer contains: MoE routing, Qdrant vectors, MCP tools
- Scales to zero when idle
- Bursts on demand
- Graduates to specialized models
Telemetry Becomes Training Data
I’m calling this “Infrastructure as Training Data.”
What We Capture
Every operation generates valuable training data:
- MoE Routing Decisions → Institutional Knowledge
- Vector Query Patterns → Domain Structure
- Tool Chain Successes → Repeatable Playbooks
This operational data becomes the foundation for specialized models through LoRA fine-tuning or full model training.
The Cultural Parallel
It’s the technical manifestation of what Steven has built culturally at Tusker — you don’t start with specialists. You create the environment, let people operate in a domain, and expertise crystallizes naturally.
Vision sets the direction. Culture creates the conditions. The system teaches itself.
How Great Organizations Develop Talent
- Deploy someone into a role
- Operate - let them handle real work
- Learn - capture patterns and feedback
- Distill - codify best practices
- Graduate - recognize specialization
This same pattern now applies to AI infrastructure.
Why This Matters
Traditional AI deployment follows a waterfall model:
- Identify need
- Collect training data
- Train specialized model
- Deploy to production
- Hope it works
Infrastructure as Training Data inverts this:
- Deploy general-purpose layer
- Capture operational telemetry
- Learn from real workloads
- Distill into specialization
- Graduate when proven
Key Advantages
Cost Efficiency: Scale to zero means you only pay for what you use. No idle infrastructure burning cash.
Real-World Learning: Training data comes from actual production workloads, not synthetic datasets.
Continuous Improvement: The system gets better with every task it handles.
Natural Specialization: Expertise emerges organically based on what the layer actually does.
Reduced Risk: Start with general capabilities, specialize only after proving value.
Technical Implementation
Each layer is a complete stack:
┌─────────────────────────┐
│ Layer Stack │
├─────────────────────────┤
│ MoE Router (General) │
│ ↓ │
│ Qdrant (Vectors) │
│ ↓ │
│ MCP (Tools) │
│ ↓ │
│ Telemetry Capture │
└─────────────────────────┘
The Learning Loop
- Request arrives → MoE router selects expert
- Vector search → Find relevant context
- Tool execution → Perform the task
- Capture telemetry → Log decisions and outcomes
- Analyze patterns → Identify successful strategies
- Distill knowledge → Train specialized model
- Graduate layer → Deploy specialist
Graduation Criteria
A layer graduates to specialist status when:
- ✅ Handled 10,000+ real tasks
- ✅ Maintains 95%+ success rate
- ✅ Clear domain patterns emerge
- ✅ Specialized model outperforms general routing
- ✅ ROI justifies dedicated resources
Where Culture and Architecture Intersect
This approach mirrors organizational development:
Hiring Junior Talent
Traditional: Hire only senior specialists (expensive, limited pool)
Modern: Hire promising juniors, create growth environment, cultivate expertise
AI Deployment
Traditional: Deploy only specialized models (expensive, long training cycles)
Modern: Deploy general layers, capture operational data, distill specialization
The Pattern
Both recognize that expertise is emergent, not imported.
You can’t shortcut the learning process. You create conditions for excellence, then let the system (human or AI) prove itself through operation.
Real-World Applications
Infrastructure Layer
Deploy: General infrastructure agent with Kubernetes tools
Operate: Handle deployment requests, scaling, monitoring
Learn: Capture successful patterns, failure modes, optimization strategies
Distill: Train specialized infrastructure model
Graduate: Purpose-built infra specialist with institutional knowledge
Security Layer
Deploy: General security agent with scanning, monitoring, compliance tools
Operate: Handle security events, audit requests, vulnerability scans
Learn: Capture threat patterns, false positive signatures, remediation playbooks
Distill: Train specialized security model
Graduate: Expert security agent with domain-specific knowledge
Data Layer
Deploy: General data agent with query, ETL, analysis tools
Operate: Handle data requests, transformations, analysis tasks
Learn: Capture query patterns, optimization strategies, common transformations
Distill: Train specialized data model
Graduate: Expert data agent with database-specific optimizations
The Economics
Traditional Specialized Model
- Training: $50,000+ (data collection, labeling, training)
- Deployment: $500/month (always-on infrastructure)
- Maintenance: $10,000/year (retraining, updates)
- Risk: High (may not fit actual use cases)
Infrastructure as Training Data
- Training: $0 (learns from operation)
- Deployment: $50/month (scales to zero)
- Maintenance: $1,000/year (automated distillation)
- Risk: Low (proven through real usage)
10x cost reduction with lower risk and better fit.
Implementation Roadmap
Phase 1: Deploy General Layers (Week 1)
- Set up MoE routing infrastructure
- Deploy Qdrant for vector storage
- Configure MCP tool chains
- Implement telemetry capture
Phase 2: Operational Learning (Months 1-3)
- Route real workloads through layers
- Capture all routing decisions
- Log successful tool chains
- Monitor performance metrics
Phase 3: Pattern Analysis (Month 4)
- Analyze telemetry data
- Identify specialization opportunities
- Determine graduation readiness
- Design specialized model architecture
Phase 4: Distillation (Month 5)
- Prepare training datasets from telemetry
- Train specialized models (LoRA or full fine-tune)
- Validate against operational benchmarks
- A/B test general vs. specialized
Phase 5: Graduation (Month 6)
- Deploy specialized models
- Transition workloads
- Maintain general fallback
- Continue learning loop
Key Insights
1. Expertise is Earned, Not Installed
You can’t shortcut the learning process. Whether it’s a human team member or an AI layer, real expertise comes from handling real work.
2. Scale-to-Zero Changes Everything
When infrastructure doesn’t cost money while idle, you can afford to deploy speculatively. Deploy first, validate through operation, specialize when proven.
3. Telemetry is Gold
Every routing decision, every successful tool chain, every pattern that emerges — this is training data you couldn’t buy. It’s specific to your domain, your workloads, your requirements.
4. Culture Eats Strategy
Steven’s insight about organizational culture applies to AI systems. You can have the best architecture plan, but if you don’t create an environment where learning happens naturally, you’ll fail.
5. Vision Provides Direction, System Provides Feedback
Set the direction (vision), create the conditions (culture/architecture), let the system teach itself through operation. Don’t micromanage the learning process.
Common Questions
”Why not just train a specialist from the start?”
Because you don’t know what specialist you need until you’ve operated in the domain. Training data from real workloads is more valuable than synthetic data from assumptions.
”Isn’t this slower than deploying a pre-trained model?”
Initially, yes. But the specialized model you graduate is better fitted to your actual needs, and you’ve eliminated the risk of training the wrong thing.
”What if the layer never accumulates enough data to graduate?”
Then you saved money by not training a specialist you didn’t need. Scale-to-zero means minimal waste.
”How do you prevent the layer from learning bad patterns?”
Same way you prevent junior team members from bad habits — supervision, code review, validation gates. The difference is AI can be supervised at scale.
”Does this work for all AI use cases?”
No. Some use cases need specialists immediately (safety-critical systems). This works best for operational domains where patterns emerge through use.
The Future
Imagine infrastructure where:
- Every deployment generates training data
- Every successful pattern gets codified automatically
- Every layer evolves based on what it actually does
- Specialization emerges organically from operation
- Costs align perfectly with value (scale-to-zero)
This isn’t science fiction. The technology exists today:
- ✅ MoE routing (Mixtral, GPT-4)
- ✅ Vector databases (Qdrant, Pinecone, Weaviate)
- ✅ Tool frameworks (MCP, LangChain, AutoGen)
- ✅ Scale-to-zero (Kubernetes, Knative, serverless)
- ✅ Model distillation (LoRA, fine-tuning APIs)
What’s missing isn’t technology. What’s missing is the mindset shift.
The Mindset Shift
From: “Train then Deploy”
- Assume you know what you need
- Invest heavily upfront
- Deploy specialist
- Hope it works
- High risk, high cost
To: “Deploy then Train”
- Admit you don’t know yet
- Deploy general layer
- Learn from operation
- Distill specialization
- Low risk, low cost
From: “Always-On Infrastructure”
- Pay for idle capacity
- Fixed costs
- Over-provisioned
- Wasteful
To: “Scale-to-Zero Architecture”
- Pay only for use
- Variable costs
- Right-sized
- Efficient
From: “AI as Tool”
- AI does specific tasks
- Human orchestrates everything
- AI is static
To: “AI as Apprentice”
- AI learns domain
- AI improves over time
- AI graduates to specialist
- Human sets direction
Conclusion
Steven Dastoor taught me that great organizations don’t start with specialists. They create environments where expertise emerges naturally through operation.
The same principle applies to AI infrastructure.
Don’t train specialists and hope they fit your needs.
Deploy general layers, let them operate in your domain, capture their learning, and graduate them to specialists when they’ve proven themselves.
This is Infrastructure as Training Data.
Vision sets the direction. Culture creates the conditions. The system teaches itself.
Join the Conversation
Would love to hear from others thinking about where organizational culture and AI architecture intersect.
What patterns have you seen in your organization? How could they translate to AI systems?
Let’s build the future together.
Follow my work:
- LinkedIn: Ryan Dahlberg
- GitHub: Cortex Project
- Blog: More deep dives on AI orchestration
Special thanks to Steven Dastoor and the Tusker team for the conversation that inspired this post.
#AI #Leadership #Culture #Kubernetes #Infrastructure #MachineLearning #Vision #MLOps