HOMELAB-547: fix(monitoring): adjust memory headroom alert thresholds for homelab #145

Open
aaron wants to merge 1 commit from plane/HOMELAB-547-memory-headroom-alerts into live
Owner

Summary

  • Adjust cluster memory headroom alert thresholds to reduce false alarms in homelab environment
  • Warning threshold: 4 GiB → 2 GiB
  • Critical threshold: 2 GiB → 1 GiB

Root Cause

Previous thresholds were too conservative for the resource-constrained homelab environment, causing alert fatigue. The homelab normally operates at higher memory utilization (80-90%+) as evidenced by recent memory pressure fixes for multiple services.

Recent Context

  • HOMELAB-545: Fixed Reflector (91% utilization, 76 OOMKills) and Langfuse Zookeeper (93% utilization)
  • HOMELAB-542: Fixed ArgoCD OOM issues with increased limits
  • HOMELAB-544: Fixed kube-state-metrics restart loops

Test Plan

  • Verify Prometheus rule syntax is valid (YAML linting)
  • Deploy to cluster and verify alerts update correctly
  • Monitor for 24-48h to confirm reduced false alarms
  • Validate critical threshold still triggers appropriately under genuine memory pressure

Impact

Self-merge eligible - Simple configuration change, Helm values adjustment

  • Reduces alert fatigue from false positives
  • Better aligns thresholds with homelab operational reality
  • Still provides safety margin with 1 GiB critical threshold

🤖 Generated with Claude Code

## Summary - Adjust cluster memory headroom alert thresholds to reduce false alarms in homelab environment - **Warning threshold:** 4 GiB → 2 GiB - **Critical threshold:** 2 GiB → 1 GiB ## Root Cause Previous thresholds were too conservative for the resource-constrained homelab environment, causing alert fatigue. The homelab normally operates at higher memory utilization (80-90%+) as evidenced by recent memory pressure fixes for multiple services. ## Recent Context - HOMELAB-545: Fixed Reflector (91% utilization, 76 OOMKills) and Langfuse Zookeeper (93% utilization) - HOMELAB-542: Fixed ArgoCD OOM issues with increased limits - HOMELAB-544: Fixed kube-state-metrics restart loops ## Test Plan - [x] Verify Prometheus rule syntax is valid (YAML linting) - [ ] Deploy to cluster and verify alerts update correctly - [ ] Monitor for 24-48h to confirm reduced false alarms - [ ] Validate critical threshold still triggers appropriately under genuine memory pressure ## Impact ✅ **Self-merge eligible** - Simple configuration change, Helm values adjustment - Reduces alert fatigue from false positives - Better aligns thresholds with homelab operational reality - Still provides safety margin with 1 GiB critical threshold 🤖 Generated with [Claude Code](https://claude.com/claude-code)
HOMELAB-547: fix(monitoring): adjust memory headroom alert thresholds for homelab
Some checks failed
0/0 projects applied successfully.
CI Review / ai-review (pull_request) Has been cancelled
CI Review / helm-validate (pull_request) Has been cancelled
CI Review / pr-title (pull_request) Has been cancelled
Lint & Validate / yaml-lint (pull_request) Has been cancelled
Lint & Validate / terraform-validate (pull_request) Has been cancelled
Lint & Validate / shellcheck (pull_request) Has been cancelled
60b7c1db24
Reduce false alarms by adjusting cluster memory headroom thresholds:
- Warning: 4 GiB → 2 GiB
- Critical: 2 GiB → 1 GiB

Root cause: Previous thresholds too conservative for resource-constrained
homelab environment, causing alert fatigue while running normally at
higher utilization.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Some checks failed
0/0 projects applied successfully.
CI Review / ai-review (pull_request) Has been cancelled
CI Review / helm-validate (pull_request) Has been cancelled
CI Review / pr-title (pull_request) Has been cancelled
Lint & Validate / yaml-lint (pull_request) Has been cancelled
Lint & Validate / terraform-validate (pull_request) Has been cancelled
Lint & Validate / shellcheck (pull_request) Has been cancelled
This pull request can be merged automatically.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin plane/HOMELAB-547-memory-headroom-alerts:plane/HOMELAB-547-memory-headroom-alerts
git switch plane/HOMELAB-547-memory-headroom-alerts
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aaron/infra-core!145
No description provided.