From 80% Memory Panic to Optimized Excellence: Our K3s Cluster Transformation

The Crisis That Wasn’t

It started with a simple observation: all seven of our K3s cluster VMs were showing 80%+ RAM utilization in Proxmox. Red indicators everywhere. Time to add more RAM, right? Or maybe shut down some services? Perhaps even remove nodes?

Wrong.

What looked like a memory crisis turned out to be a masterclass in understanding Linux memory management, Kubernetes resource allocation, and the importance of measuring what actually matters.

This is the story of how we transformed our 7-node K3s cluster from perceived chaos to optimized excellence—and what we learned along the way.

The Investigation

The Alarming Numbers

Our Proxmox dashboard showed:

All 7 VMs: 80-85% memory “used”
Red warning indicators across the board
Growing concern about cluster stability
Questions about whether we needed more hardware

The natural instinct was to panic. More RAM? Fewer services? Remove nodes?

SSH to the Rescue

Instead of rushing to solutions, we SSH’d into each node and ran a simple command:

free -h

The revelation:

k3s-master01:
               total        used        free      shared  buff/cache   available
Mem:           5.8Gi       2.7Gi       304Mi       928Ki       3.1Gi       3.1Gi

Wait. 3.1 GB available? But Proxmox said we were at 80%!

The Truth About Linux Memory

Here’s what we discovered: Linux uses “free” RAM for disk caching. This is not waste—it’s brilliance.

The breakdown:

Used by applications: 2.7 GB (47%)
Used for cache/buffers: 3.1 GB (53%)
Actually available: 3.1 GB (53%)

That cache is instantly reclaimable. When applications need memory, Linux frees the cache immediately. The system appears to use 80% RAM, but really has 50%+ available.

Monitoring tools lie when they show:

Used = Total - Free

The correct metric is:

Available = Free + Reclaimable Cache

Cluster-Wide Analysis

We SSH’d into all 7 nodes and found:

Node	”Used” (Misleading)	Available (Truth)	Status
k3s-master01	82%	53%	✅ Healthy
k3s-master02	84%	55%	✅ Healthy
k3s-master03	73%	76%	✅ Outstanding
k3s-worker01	84%	57%	✅ Healthy
k3s-worker02	86%	72%	✅ Outstanding
k3s-worker03	88%	45%	✅ Working hard
k3s-worker04	85%	53%	✅ Healthy

Cluster Average: 59% available memory

Conclusion: No crisis. No additional RAM needed. Just a misunderstanding of metrics.

The Real Issues We Found

While our cluster was healthier than we thought, the investigation revealed actual optimization opportunities:

Issue 1: Workload Imbalance

k3s-worker03 was carrying the heaviest load:

Memory usage: 3.2 GB (55% of node)
Top consumer: Prometheus (1444 Mi)
Other heavy services: Multiple Longhorn components

Meanwhile, k3s-master03 was nearly idle:

Memory usage: 1.4 GB (24% of node)
Pods running: 1 (just fleet-controller)
Available capacity: 4.4 GB (76%) wasted
Why? Tainted with CriticalAddonsOnly=true:NoExecute

Issue 2: Memory Limits Overcommitment

The Kubernetes scheduler saw this on k3s-worker03:

Memory Capacity:    5.9 GB
Memory Limits Allocated: 9.4 GB
Overcommitment:     159%

Translation: If every pod tried to use its memory limit simultaneously, the node would have 159% more demand than capacity. OOM killer chaos would ensue.

Issue 3: Unbounded Memory Growth

Major services had no memory limits:

Prometheus: 1444 Mi (unlimited)
Wazuh Indexer: 894 Mi (unlimited)
Rancher: 751 Mi (unlimited)
Grafana: 671 Mi (unlimited)
Longhorn Managers: ~175 Mi each × 4 (unlimited)

Without limits, these could consume all available memory during traffic spikes.

The Optimization Strategy

We developed a comprehensive, multi-phase approach:

Phase 1: Workload Redistribution

Objective: Move heavy workloads from overloaded worker03 to underutilized nodes.

Key Action: Prometheus Migration

Prometheus was consuming 1444 Mi on the busiest node. We needed to:

Add memory limits (1Gi request / 2Gi limit)
Reduce retention (10 days → 7 days)
Move to a less-loaded node

Solution:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus-kube-prometheus-stack-prometheus
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: kubernetes.io/hostname
                operator: NotIn
                values:
                - k3s-worker03  # Avoid overloaded node
      containers:
      - name: prometheus
        resources:
          requests:
            memory: 1Gi
            cpu: 100m
          limits:
            memory: 2Gi
            cpu: 500m

Result:

Prometheus memory: 1444 Mi → 466 Mi (68% reduction)
Location: Moved to k3s-master03
Worker03 freed: 1.4 GB of memory

Phase 2: The k3s-master03 Decision

The Question: Should we remove k3s-master03 to free up resources?

The Temptation:

6 GB RAM back to Proxmox host
One less VM to maintain
It’s barely being used anyway (24% memory)

The Problem: Kubernetes HA architecture requires an odd number of master nodes.

Understanding Etcd Quorum

Kubernetes uses etcd for distributed consensus. Etcd requires a majority (quorum) to function:

With 3 masters (current):

Quorum needed: 2 out of 3
Fault tolerance: Can lose 1 master
Failure scenario: master01 dies → Cluster stays UP ✅ (master02 + master03)

With 2 masters (if we removed master03):

Quorum needed: 2 out of 2
Fault tolerance: Cannot lose ANY master
Failure scenario: master01 dies → Cluster goes DOWN ❌ (only master02)

With 1 master:

No HA - Single point of failure
Cluster dies if master fails

Why 2 Masters is Worse Than 1

Two masters creates a split-brain risk:

Network partition splits cluster:
Side A: master01 (1/2 quorum) ❌
Side B: master02 (1/2 quorum) ❌

Result: Both sides fight for control, data corruption possible

With 1 master, you at least know you have no HA. With 2, you think you have HA but actually have a ticking time bomb.

Our Decision: Keep All 3 Masters ✅

But remove the taint to utilize master03’s capacity.

Phase 3: Implement Resource Governance

Created LimitRanges for 6 key namespaces:

apiVersion: v1
kind: LimitRange
metadata:
  name: monitoring-limits
  namespace: monitoring
spec:
  limits:
  - max:
      memory: 4Gi
      cpu: 2000m
    min:
      memory: 64Mi
      cpu: 100m
    default:
      memory: 512Mi
      cpu: 500m
    defaultRequest:
      memory: 256Mi
      cpu: 250m
    type: Container

Coverage:

monitoring
longhorn-system
wazuh-security
cortex-system
cattle-system
monitoring-exporters

Impact:

All new pods get sensible defaults
Prevents unbounded memory growth
Overcommitment reduced from 159% → ~100%

Phase 4: Deploy Vertical Pod Autoscaler (VPA)

Future-proofing with automation:

helm install vpa fairwinds-stable/vpa --namespace vpa-system --create-namespace

Created VPA objects for 6 critical workloads:

Prometheus
Grafana
Rancher
Wazuh Indexer
Wazuh Manager
Wazuh Dashboard

Mode: Recommendation-only for first week

Benefits:

Automatic right-sizing based on actual usage
Prevents over-allocation
Adapts to changing workload patterns
Reduces manual tuning effort

The Results

Cluster-Wide Improvements

Memory Utilization (Actual):

Node	Before	After	Change
k3s-worker03	52% (overloaded)	29% (balanced)	-44% 🎯
k3s-master03	24% (wasted)	51% (utilized)	+113% 🎯
k3s-worker01	42%	54%	+29%
k3s-worker02	31%	54%	+74%
k3s-worker04	48%	31%	-35%

Cluster Balance:

Before: One node at 52%, one at 24% (28-point spread)
After: All nodes 29-54% (25-point spread, better distributed)

Overcommitment:

Before: k3s-worker03 at 159%
After: All nodes <100%

Service-Level Improvements

Prometheus:

Memory: 1444 Mi → 466 Mi (-68%)
Retention: 10 days → 7 days
Limits: None → 1Gi/2Gi
Location: worker03 → master03
Status: Stable and healthy

Rancher:

Memory: 751 Mi → 512 Mi (-32%)
Limits: None → 512Mi/1Gi
Status: No performance degradation

Longhorn:

Per-manager: 175 Mi → 128 Mi (-27%)
Total (5 instances): 875 Mi → 640 Mi
Limits: None → 128Mi/256Mi
Status: Storage performance unaffected

Lessons Learned

1. Measure What Matters

Wrong Metric:

Memory Used = Total - Free
Shows: 80% (panic!)

Right Metric:

Memory Available = Free + Reclaimable Cache
Shows: 59% (healthy!)

Lesson: Understand what your monitoring tools are actually measuring. “Used” memory includes beneficial caching.

2. Linux Memory is Smart

Linux doesn’t waste RAM. It uses “free” memory for caching to improve performance. This cache is:

Instantly reclaimable
Improves disk I/O performance
Transparent to applications
A feature, not a bug

Lesson: High “used” memory is often a sign of a well-tuned system, not a problem.

3. High Availability Isn’t Optional

The temptation to remove master03 and reclaim 6 GB RAM was strong. But:

Saving 6 GB isn’t worth cluster fragility
2 masters = split-brain risk
Can’t do zero-downtime upgrades
One failure = total cluster outage

Lesson: HA costs resources but saves businesses. Choose wisely.

4. Overcommitment is Dangerous

Kubernetes lets you allocate more limits than capacity exists. This works until it doesn’t:

159% limits allocated
One spike = OOM killer rampage
Unpredictable pod evictions
Service disruptions

Lesson: Memory limits should sum to ≤100% of node capacity.

5. Automation > Manual Tuning

Manual right-sizing is:

Time-consuming
Error-prone
Becomes stale as workloads change
Requires constant attention

VPA automation:

Learns from actual usage
Adjusts continuously
Scales with cluster growth
Frees up engineering time

Lesson: Invest in automation for long-term efficiency.

Best Practices Established

Resource Management

✅ Always set memory requests and limits

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

✅ Use LimitRanges for defaults

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - default:
      memory: 512Mi
    defaultRequest:
      memory: 256Mi
    type: Container

High Availability

✅ Always use odd number of masters (3, 5, 7)

✅ Never use 2 masters (worse than 1)

✅ Maintain quorum requirements

3 masters = tolerate 1 failure
5 masters = tolerate 2 failures

Monitoring & Alerting

✅ Monitor available memory, not used

node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes

✅ Alert on actual pressure, not cache

- alert: RealMemoryPressure
  expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.20

The Final Architecture

Cluster Configuration

Nodes: 7 total

3 masters (control plane + workloads)
4 workers (dedicated workloads)

Memory Distribution:

Total Capacity:     42 GB (7 × 6 GB)
Used by Apps:       17 GB (40%)
Used for Cache:     14 GB (33%)
Available:          25 GB (59%)
Overcommitment:     <100% on all nodes

High Availability:

3-master control plane
Etcd quorum: 2/3
Fault tolerance: 1 master failure
Zero-downtime upgrades: ✅

Resource Governance:

LimitRanges: 6 namespaces
Memory limits: All major workloads
VPA monitoring: 6 critical services
Overcommitment eliminated: ✅

Conclusion: From Panic to Excellence

What started as an apparent crisis—80% memory usage across all nodes—turned into a comprehensive optimization journey that taught us:

Monitoring matters: Measure what matters (available, not used)
Linux is smart: Cache is a feature, not a problem
HA requires investment: 3 masters costs resources but saves businesses
Limits prevent chaos: Unbounded growth = eventual disaster
Automation scales: VPA > manual tuning
Investigation > assumption: SSH revealed the truth

Our 7-node K3s cluster went from:

❌ Perceived crisis → ✅ Actual health
❌ Workload imbalance → ✅ Even distribution
❌ Dangerous overcommitment → ✅ Safe limits
❌ Wasted capacity → ✅ Efficient utilization
❌ Manual management → ✅ Automated optimization

The Numbers:

Worker03: 52% → 29% memory (freed 1.4 GB)
Master03: 24% → 51% memory (utilized 1.6 GB)
Prometheus: 1444 Mi → 466 Mi (68% reduction)
Overcommitment: 159% → <100% (eliminated risk)
HA architecture: Preserved (3 masters)
Total cost: $0 (pure optimization)

The Outcome: A production-ready, highly available, efficiently optimized Kubernetes cluster that’s ready to scale with our needs—without adding a single GB of RAM.

Sometimes the best optimization is understanding what you already have.

Cluster: 7-node K3s cluster (3 masters, 4 workers) Duration: 1 week optimization project Status: Production, Optimized, Highly Available Cost Savings: $0 spent on hardware upgrades

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Watching Infrastructure Learn From Itself: A Claude Code Reflection

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Concept: Homomorphic encryption techniques for secure computation on encrypted data