HOMELAB-549: fix(longhorn): reduce replica rebuild pressure to fix volume degradation alerts #139

Open
aaron wants to merge 1 commit from plane/HOMELAB-549-fix-longhorn-alert into live
Owner

Summary

  • Reduce concurrentReplicaRebuildPerNodeLimit from 5 to 2
  • Increase storageMinimalAvailablePercentage from 1% to 10%

Fixes recurring LonghornVolumesDegraded alerts caused by node resource exhaustion. Investigation showed correlation with CPU overcommit, throttling, and OOM events.

Test plan

  • Configuration changes applied
  • Pre-commit checks passed
  • Monitor alert status after deployment
  • Verify volume degradation count decreases

Closes: HOMELAB-549

🤖 Generated with Claude Code

## Summary - Reduce concurrentReplicaRebuildPerNodeLimit from 5 to 2 - Increase storageMinimalAvailablePercentage from 1% to 10% Fixes recurring LonghornVolumesDegraded alerts caused by node resource exhaustion. Investigation showed correlation with CPU overcommit, throttling, and OOM events. ## Test plan - [x] Configuration changes applied - [x] Pre-commit checks passed - [ ] Monitor alert status after deployment - [ ] Verify volume degradation count decreases Closes: HOMELAB-549 🤖 Generated with [Claude Code](https://claude.com/claude-code)
HOMELAB-549: fix(longhorn): reduce replica rebuild pressure to fix volume degradation alerts
Some checks failed
0/0 projects applied successfully.
CI Review / ai-review (pull_request) Has been cancelled
CI Review / helm-validate (pull_request) Has been cancelled
CI Review / pr-title (pull_request) Has been cancelled
Lint & Validate / shellcheck (pull_request) Has been cancelled
Lint & Validate / yaml-lint (pull_request) Has been cancelled
Lint & Validate / terraform-validate (pull_request) Has been cancelled
0bc362b355
- Reduce concurrentReplicaRebuildPerNodeLimit from 5 to 2
- Increase storageMinimalAvailablePercentage from 1% to 10%

Fixes recurring LonghornVolumesDegraded alerts caused by node resource exhaustion.
Investigation showed correlation with CPU overcommit, throttling, and OOM events.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Some checks failed
0/0 projects applied successfully.
CI Review / ai-review (pull_request) Has been cancelled
CI Review / helm-validate (pull_request) Has been cancelled
CI Review / pr-title (pull_request) Has been cancelled
Lint & Validate / shellcheck (pull_request) Has been cancelled
Lint & Validate / yaml-lint (pull_request) Has been cancelled
Lint & Validate / terraform-validate (pull_request) Has been cancelled
This pull request has changes conflicting with the target branch.
  • core/charts/platform/longhorn/values.yaml
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin plane/HOMELAB-549-fix-longhorn-alert:plane/HOMELAB-549-fix-longhorn-alert
git switch plane/HOMELAB-549-fix-longhorn-alert
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aaron/infra-core!139
No description provided.