HOMELAB-280: feat(modules): add per-worker overrides and GPU passthrough #43

Merged
aaron merged 11 commits from plane/HOMELAB-280-add-worker-node-03 into main 2026-03-23 18:59:06 +00:00
Owner

Summary

  • Add per-worker cpu_cores, memory_mb, boot_disk_gb, data_disk_gb, gpu_pci_id overrides to talos-cluster module
  • Add GPU PCI passthrough support to proxmox-vm module (dynamic hostpci block, q35 machine type)
  • Fixes Terraform state drift for existing worker VMs

Test plan

  • terraform plan shows no destructive changes to existing VMs
  • terraform apply creates prod-wk-03 successfully
  • Node joins cluster as Ready
## Summary - Add per-worker cpu_cores, memory_mb, boot_disk_gb, data_disk_gb, gpu_pci_id overrides to talos-cluster module - Add GPU PCI passthrough support to proxmox-vm module (dynamic hostpci block, q35 machine type) - Fixes Terraform state drift for existing worker VMs ## Test plan - [x] terraform plan shows no destructive changes to existing VMs - [x] terraform apply creates prod-wk-03 successfully - [x] Node joins cluster as Ready
HOMELAB-272: feat(eso): add External Secrets Operator deployment and ExternalSecret manifests
Some checks failed
CI Review / pr-title (pull_request) Successful in 0s
0/0 projects applied successfully.
CI Review / helm-validate (pull_request) Failing after 2s
CI Review / ai-review (pull_request) Failing after 1s
Lint & Validate / terraform-validate (pull_request) Failing after 1s
Lint & Validate / yaml-lint (pull_request) Failing after 1s
Lint & Validate / shellcheck (pull_request) Failing after 1s
17c38d0328
Deploy ESO via ArgoCD with Kubernetes provider. Source secrets live in
"secrets" namespace, ESO syncs to app namespaces via ExternalSecret CRDs.
Replaces per-app Terraform kubernetes_secret resources.

- ESO Helm chart values (v2.2.0) with resource limits and ServiceMonitor
- ClusterSecretStore pointing to "secrets" namespace
- ServiceAccount + RBAC for store reader
- ExternalSecret manifests for all 13 app secrets across 9 namespaces
- Secrets namespace manifest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOMELAB-272: fix(eso): use v1 API version for ESO CRDs (not v1beta1)
Some checks failed
0/0 projects applied successfully.
CI Review / pr-title (pull_request) Successful in 0s
CI Review / helm-validate (pull_request) Failing after 2s
CI Review / ai-review (pull_request) Failing after 1s
Lint & Validate / terraform-validate (pull_request) Failing after 1s
Lint & Validate / yaml-lint (pull_request) Failing after 1s
Lint & Validate / shellcheck (pull_request) Failing after 1s
e595b5e98c
ESO v2.2.0 ships v1 CRDs. v1beta1 is no longer available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOMELAB-272: feat(eso): add ExternalSecrets for atlantis and dev-agents
Some checks failed
CI Review / pr-title (pull_request) Successful in 0s
0/0 projects applied successfully.
CI Review / helm-validate (pull_request) Failing after 1s
CI Review / ai-review (pull_request) Failing after 1s
Lint & Validate / terraform-validate (pull_request) Failing after 1s
Lint & Validate / yaml-lint (pull_request) Failing after 1s
Lint & Validate / shellcheck (pull_request) Failing after 1s
139ceaf5d5
- Atlantis VCS credentials synced from secrets namespace
- All 6 dev-agent secrets synced via ESO (replacing manual script)
- Uses dataFrom/extract for dev-agent secrets (binary data like SSH keys)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HOMELAB-254: feat(helm): add deployment, PVC, ServiceAccount, LimitRange templates
Some checks failed
0/0 projects applied successfully.
CI Review / pr-title (pull_request) Successful in 0s
CI Review / helm-validate (pull_request) Failing after 2s
CI Review / ai-review (pull_request) Failing after 1s
Lint & Validate / terraform-validate (pull_request) Failing after 1s
Lint & Validate / yaml-lint (pull_request) Failing after 1s
Lint & Validate / shellcheck (pull_request) Failing after 1s
4b6e7d49c9
HOMELAB-280: feat(modules): add per-worker overrides and GPU passthrough
Some checks failed
Release / release (pull_request) Failing after 2s
Plan failed.
b3bf989620
- talos-cluster: workers now support per-node cpu_cores, memory_mb,
  boot_disk_gb, data_disk_gb, and gpu_pci_id overrides (falling back
  to global defaults via coalesce)
- proxmox-vm: add GPU PCI passthrough support with dynamic hostpci
  block and automatic q35 machine type when GPU is present

Fixes Terraform state drift where manually-upgraded VMs (24GB RAM,
100GB boot disk, GPU passthrough) didn't match tfvars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aaron merged commit bce858cc22 into main 2026-03-23 18:59:06 +00:00
Collaborator

Plan Error

running git clone --depth=1 --branch plane/HOMELAB-280-add-worker-node-03 --single-branch https://atlantis:<redacted>@forgejo.aaron.reynoza.org/aaron/infra-core.git /atlantis-data/repos/aaron/infra-core/43/default: Cloning into '/atlantis-data/repos/aaron/infra-core/43/default'...
error: could not lock config file /atlantis-data/repos/aaron/infra-core/43/default/.git/config: Read-only file system
fatal: could not set 'core.logallrefupdates' to 'true'
: exit status 128
**Plan Error** ``` running git clone --depth=1 --branch plane/HOMELAB-280-add-worker-node-03 --single-branch https://atlantis:<redacted>@forgejo.aaron.reynoza.org/aaron/infra-core.git /atlantis-data/repos/aaron/infra-core/43/default: Cloning into '/atlantis-data/repos/aaron/infra-core/43/default'... error: could not lock config file /atlantis-data/repos/aaron/infra-core/43/default/.git/config: Read-only file system fatal: could not set 'core.logallrefupdates' to 'true' : exit status 128 ```
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
aaron/infra-core!43
No description provided.