Commit Graph

269 Commits

Author SHA1 Message Date
4609ec5e89 Update getting-started.md with manual changes
- Add monitoring services section with Dockge and Uptime Kuma
- Simplify stack removal instructions in service management section
- Streamline documentation structure
2026-01-15 20:11:41 -05:00
3d07ddac26 Add comprehensive Docker stack management documentation
- Document safe stack removal process with proper cleanup steps
- Explain consequences of just deleting folders without stopping containers
- Add restoration instructions for accidentally removed stacks
- Include warnings about data loss and dependency checking
2026-01-15 19:57:41 -05:00
52e3d6e2af Add comprehensive SSL certificate documentation to getting-started.md
- Explain Let's Encrypt + DuckDNS integration
- Document staging vs production certificate servers
- Add troubleshooting guide for certificate issues
- Include best practices and validation commands
- Cover wildcard certificates and DNS challenge process
2026-01-15 19:34:58 -05:00
75906bc043 Add Let's Encrypt staging configuration for testing environments
- Include commented staging caServer in config template
- Add troubleshooting section for test environment certificate conflicts
- Document rate limit avoidance strategies for development/testing
2026-01-15 19:24:06 -05:00
8894d05f3b Update Traefik config template and docs to reflect working SSL certificate setup
- Remove explicit DNS resolvers from dnsChallenge to fix propagation check failures
- Add note about resolvers causing issues with DuckDNS TXT record resolution
- Preserve knowledge from certificate debugging session
2026-01-15 19:20:26 -05:00
87790c45ef Update Uptime Kuma installation process and homepage configuration 2026-01-15 16:06:56 -05:00
ac27a073e3 Fix setup script to ensure Docker Compose is installed 2026-01-15 15:28:06 -05:00
ab43dc79d5 Update docker-compose files from test system changes 2026-01-15 15:09:54 -05:00
0cf25e09a1 Update repo with latest changes 2026-01-15 14:46:49 -05:00
ef5868b481 Update deploy-homelab.sh script 2026-01-15 03:32:21 -05:00
1086357acc Refresh all documentation for current stack, workflows, and AI management 2026-01-15 03:24:10 -05:00
5e1f7d25bf Update homepage config with improved layout and service organization 2026-01-15 03:17:52 -05:00
29ce66aeca Fix: Add Docker socket permissions for Homepage container status
- Added user field with DOCKER_GID to allow homepage to read Docker socket
- Ensures container status monitoring works properly
- DOCKER_GID defaults to 999, should be set to actual docker group ID in .env
2026-01-15 01:59:36 -05:00
0b90bce7d0 Fix: Update vaultwarden subdomain from bitwarden to vault
- Changed Traefik routing to use vault. instead of bitwarden.
- Matches homepage dashboard configuration
- Ensures consistent URL naming across services
2026-01-15 01:19:00 -05:00
f95275d5c0 Refactor: Create downloaders stack for VPN-routed services
- Created new downloaders stack with Gluetun + qBittorrent unified
- Moved Gluetun from core stack to downloaders stack
- Moved qBittorrent from media-management to downloaders stack
- Uses network_mode: service:gluetun for better maintainability
- Eliminates cross-stack container ID dependencies
- Both services now start/stop together as a logical unit
2026-01-15 00:53:53 -05:00
14421a8a9e Fix Traefik routing for qbittorrent and vaultwarden
- Add tls=true label to vaultwarden for HTTPS routing
- Add Traefik routing labels to Gluetun for qbittorrent access
- Move qbittorrent service to media-management stack (proper location)
- Update copilot-instructions.md with project-specific architecture details
- Clean up outdated gluetun.yml references in media.yml template

Both services now accessible via HTTPS with proper SSL certificates.
2026-01-15 00:25:32 -05:00
adb894d35e Round 10: Add Traefik routing to monitoring services
- Added Traefik labels and routing to prometheus, grafana, loki, cadvisor
- Fixed Grafana ROOT_URL to use domain-based URL (https://grafana.${DOMAIN})
- Added uptime-kuma bypass rule in Authelia (needs initial setup)
- Updated all services to use traefik-network
- Synced domain from kelin-hass to kelin-casa across all configs
- Fixed missing tls=true label on uptime-kuma
- Note: Loki is API-only service (no web UI, accessed via Grafana)
2026-01-14 23:08:37 -05:00
258e8eec94 Refactor scripts for improved maintainability
- setup-homelab.sh: Fixed syntax errors, placeholder detection, and hardcoded paths
- deploy-homelab.sh: Refactored from inline code to function-based structure
- Both scripts now use consistent function organization for better readability
- Enhanced credential handling and error checking
- All scripts validated for syntax correctness
2026-01-14 18:10:23 -05:00
650700ed0a Re-enable Watchtower with correct Docker API version
Fixes:
- docker-compose/infrastructure.yml:
  - Uncommented Watchtower service
  - Updated image from 1.7.1 to latest
  - Changed DOCKER_API_VERSION from 1.44 to 1.52 (current Docker version)
  - Added default empty value for WATCHTOWER_NOTIFICATION_URL

- scripts/deploy-homelab.sh:
  - Removed "temporarily disabled" note
  - Added Watchtower to infrastructure stack list

- docs/services-overview.md:
  - Updated infrastructure stack count from 7 to 8
  - Added Watchtower to service list

Watchtower now runs successfully with scheduled updates at 4 AM daily
2026-01-14 02:25:20 -05:00
3e53cc3225 Remove automatic deployment prompt from setup script
Changes:
- scripts/setup-homelab.sh: Remove interactive deployment prompt
  - Users must now run deploy script manually
  - Simplifies both scripts (no sudo workarounds needed)
  - Clearer two-step process: setup then deploy

- Documentation updates:
  - README.md: Updated step 3-4 with manual deployment
  - docs/getting-started.md: Removed step 6 (log out), clarified steps
  - docs/manual-setup.md: Added sudo to deploy command
  - docs/troubleshooting/COMMON-ISSUES.md: Added sudo to all deploy commands

Rationale:
- Automatic deployment via 'su -' cannot work with sudo requirement
- Manual two-step process is clearer and more reliable
- Setup focuses on configuration, deploy focuses on services
2026-01-14 02:04:56 -05:00
ff454d35c6 Update deploy script to use media-management.yml instead of media-extended.yml 2026-01-14 01:41:23 -05:00
f79a2ab6f1 Fix infrastructure.yml networks section
- Added missing 'networks:' header
- Changed homelab-network and dockerproxy-network to external: true
- Added missing traefik-network definition
- Removed incorrect 'driver: bridge' declarations
2026-01-14 01:39:59 -05:00
4c6ed1c6f4 Clear completed tasks 2026-01-14 01:32:33 -05:00
aa3f927b2c Reorganize docker-compose stacks for better service grouping
Stack changes:
- Renamed media-extended.yml → media-management.yml (better clarity)
- Moved Plex from media → alternatives (Jellyfin is primary)
- Moved code-server from utilities → infrastructure
- Moved Sonarr, Radarr, Prowlarr from media → media-management
- Moved Calibre-web from media-management → media

New stack organization:
- media.yml (3): Jellyfin, Calibre-web, qBittorrent
- media-management.yml (13): All *arr apps, transcoders
- alternatives.yml (6): Plex, Portainer, Authentik
- infrastructure.yml (7): Added code-server
- utilities.yml (6): Removed code-server

Documentation updated:
- README.md: Updated stack descriptions
- services-overview.md: Updated service counts and locations
- All service docs: Updated file paths media-extended → media-management
2026-01-14 01:32:20 -05:00
e6c8f25275 Fix password hash extraction bug in deploy script and sudo issue in setup script
- deploy-homelab.sh: Fix password hash extraction from Docker output
  - Changed from 'grep || tail' fallback to 'sed | grep' pipeline
  - Properly strips 'Digest: ' prefix before extracting hash
  - Prevents corrupted hash format that caused Authelia crash loop

- setup-homelab.sh: Fix automatic deployment call
  - Added 'sudo' when running deploy script from setup
  - Prevents 'Please run as root' error during automatic deployment
2026-01-14 01:23:44 -05:00
d12706fda2 feat: persist Authelia credentials to .env file
- setup-homelab.sh: Save AUTHELIA_ADMIN_* credentials to .env file
- deploy-homelab.sh: Check .env file as fallback if temp files don't exist
- .env.example: Document auto-generated Authelia admin variables

This ensures credentials survive reboots (e.g., when NVIDIA drivers are installed)
and the deploy script can find them even when run manually after reboot.
2026-01-14 00:10:38 -05:00
56604b77e9 fix: store Authelia credentials in persistent location
- setup-homelab.sh: Store temp files in /opt/stacks/.setup-temp instead of /tmp
- deploy-homelab.sh: Read credentials from new persistent location
- reset-test-environment.sh: Clean up new temp directory

This fixes the issue where credentials were inaccessible when deploy script
runs via 'su -' (login shell) from setup script, as /tmp files created by
root are not accessible across the su boundary.
2026-01-14 00:03:34 -05:00
8b2f534c3c docs: user manual edits to getting-started.md
- Updated Getting Started Checklist with clone repo as first step
- Clarified deployment script description
- Added VS Code SSH tip in Simple Setup
- Enhanced VS Code integration section
- Added Debloat/custom service section with AI agent guidance
2026-01-13 23:48:56 -05:00
a916d48776 docs: explicitly document wildcard SSL certificate usage
- README.md: Updated Traefik feature to mention wildcard certificates via DNS challenge
- README.md: Added wildcard cert note to deployment script section
- getting-started.md: Explicitly mention wildcard certificate generation in deploy step

All documentation now clearly states the project uses wildcard SSL certificates with DNS challenge.
2026-01-13 23:15:23 -05:00
9f122af4b5 feat: implement task list updates
- getting-started.md: Moved checklist before Simple Setup, removed Round 4 section
- authelia-customization.md: Updated Authentik reference to alternatives stack
- services-overview.md: Added clickable links to all stack compose files
- setup-homelab.sh: Added prompt to run deployment script after setup (defaults to yes)
- traefik.yml: Changed default to DNS challenge for wildcard certificates (DuckDNS)

All documentation now reflects wildcard certificate usage with DNS challenge.
2026-01-13 23:14:25 -05:00
5325ada46f chore: remove completed tasks.txt 2026-01-13 22:36:45 -05:00
3bad39567d docs: implement user feedback from tasks.txt
- README.md: Fixed .env step order, updated to 60+ services
- getting-started.md: Service count updates, credential clarifications, moved Manual Setup to separate file
- manual-setup.md: Created comprehensive manual setup guide
- authelia-customization.md: Moved Authelia customization from services-overview
- services-overview.md: Added clickable links to service docs, removed disabled section and Quick Deployment
- quick-reference.md: Linked to scripts/README.md instead of duplicating content
- Removed services-reference.md as requested
2026-01-13 22:36:37 -05:00
ea4af44726 Update services-overview.md with current deployment structure
Changes:
- Updated infrastructure.yaml section (6 active services)
- Moved Portainer and Authentik to alternatives.yaml section
- Added note about Watchtower being disabled (Docker API issue)
- Clarified which services are deployed by default vs available
- Corrected stack organization to match actual deployment
2026-01-13 21:46:28 -05:00
afdc99a5a2 Round 9: Complete documentation overhaul
Major documentation updates reflecting current deployment state:

README.md:
 Updated features list with automated setup capabilities
 Accurate deployment workflow (12 containers, 7 additional stacks)
 Corrected setup script description (automated Authelia secrets)
 Removed manual secret generation steps
 Added note about optional image pre-pull
 Clarified .env requirements (Authelia secrets now automated)

docs/getting-started.md:
 Streamlined quick setup workflow (removed manual secrets)
 Comprehensive setup script capabilities list
 Detailed deployment script behavior
 Added interactive Authelia configuration details
 Clarified argon2id password hash generation process
 Updated login credentials location
 Removed outdated .env location warnings

docs/quick-reference.md:
 Added deployment scripts reference section
 Complete setup-homelab.sh documentation
 Complete deploy-homelab.sh documentation
 Added reset-test-environment.sh with warnings
 Updated stack overview with container counts
 Clarified which stacks deploy by default vs available in Dockge
 Noted Watchtower temporary disable status

docs/troubleshooting/COMMON-ISSUES.md (NEW):
 Installation issues (Docker permissions, timeouts, port conflicts)
 Deployment issues (Authelia loops, Watchtower status, Homepage URLs)
 Service-specific issues (Gluetun, Pi-hole, Dockge)
 Performance troubleshooting
 Reset and recovery procedures with warnings
 Complete getting help section with commands

Total changes: 456 additions, 71 modifications across 4 files
Documentation now accurately reflects Round 9 deployment capabilities
2026-01-13 21:46:01 -05:00
487f645652 Round 9: Homepage variable replacement and additional stack deployment
Features added:
 Homepage config variable replacement - Fixed HOMEPAGE_VAR_DOMAIN substitution
  - Homepage doesn't support environment variables in configs
  - Deploy script now uses sed to replace {{HOMEPAGE_VAR_DOMAIN}} with actual domain
  - All homepage/*.yaml files processed after template copy

 Additional stacks deployment to Dockge
  - 7 additional stacks now copied to /opt/stacks/: media, media-extended,
    homeassistant, productivity, monitoring, utilities, alternatives
  - Stacks are NOT started automatically - user deploys via Dockge UI as needed
  - Optional image pre-pull with user prompt (defaults to no)
  - Significantly improves first-time Dockge experience

 Watchtower temporarily disabled
  - Documented Docker API v1.44 compatibility issue with Docker 29.x
  - Added clear instructions for re-enabling when issue is resolved
  - Infrastructure stack now deploys 6 services (was 7)

Deployment workflow:
1. Core stack (4 services) - DuckDNS, Traefik, Authelia, Gluetun
2. Infrastructure stack (6 services) - Dockge, Pi-hole, Dozzle, Glances, Docker Proxy
3. Dashboards stack (2 services) - Homepage (configured), Homarr
4. Additional stacks (7 stacks copied, not started)

Tested: All 11 active containers healthy, all stacks visible in Dockge
2026-01-13 21:36:38 -05:00
cf061f35d2 Fix: Resolve password hash corruption in Authelia users_database.yml
Critical fix for argon2 password hash preservation:
- Root cause: Bash variable expansion of $ characters in argon2id hashes
- Solution: Write hash directly from Docker output to file, bypass bash variables entirely
- setup-homelab.sh: Stream Docker output directly to /tmp/authelia_password_hash.tmp
- deploy-homelab.sh: Read hash file in Python to avoid any bash expansion
- Result: Password hash correctly preserved with full $argon2id$v=19$m=... format

Other changes:
- Added DOCKER_API_VERSION=1.44 env var for watchtower (API compatibility)
- Watchtower still has issues with Docker 29.1.4 - keeping version pinned for investigation

Tested on Debian 12 with Docker 29.1.4:
 All 11 critical containers healthy
 Authelia authentication working correctly
 Password hash preserved through entire deployment workflow
⚠️  Watchtower restart loop (non-critical, under investigation)
2026-01-13 21:02:49 -05:00
659d580d14 Round 8: Attempt to fix sed escaping for password hash
Issue: sed with | delimiter still has problems with $ in argon2 hash
Attempted fix: Escape special characters before sed replacement

Note: Manual sed with double quotes works, suggesting escaping strategy
may need refinement. Need to test if this resolves the issue.
2026-01-13 20:15:21 -05:00
daaf18b7f9 Complete Round 7 and prepare for Round 8
Round 7 Summary:
 Safe reset script validated - no system crashes
 Full deployment workflow successful
 Password hash corruption fixed (heredoc issue)
 All containers running healthily
 System stability confirmed

Round 8 Preparation:
- Focus on production readiness
- Comprehensive integration testing
- Edge case validation
- SSL certificate generation testing
- Documentation accuracy verification
- User experience polish

Testing scenarios defined for:
- Fresh installation (happy path)
- Re-run detection (idempotency)
- Password reset workflows
- Clean slate between tests
- Service access validation
- SSL certificate validation
- Edge cases (invalid config, network issues, resource constraints)

Success criteria: Scripts ready for public release (v1.0.0-rc1)
2026-01-13 20:10:06 -05:00
ee8a359542 Fix password hash corruption in users_database.yml
Issue: Heredoc variable expansion was mangling password hashes containing $ characters
Solution: Use quoted heredoc ('EOF') with placeholders, then sed replace

The unquoted heredoc was interpreting $ in the argon2 hash as shell variable
expansion, corrupting the hash format.
2026-01-13 20:06:43 -05:00
8b5ba494dd Round 7 Prep: Add safe cleanup procedures to prevent system crashes
CRITICAL: Previous rounds caused system crashes during cleanup operations

New Safe Reset Script:
- Gracefully stops all containers before cleanup
- Waits for proper shutdown sequences
- Removes Docker volumes only after containers stopped
- Prevents filesystem corruption from aggressive rm operations
- Includes confirmation prompts for safety

Deploy Script Improvements:
- Stops existing containers before config file operations
- Removes dangerous auto-cleanup of Docker volumes
- Adds safety checks before directory removal
- Warns about existing databases instead of auto-removing

Dangerous Operations Removed:
- No more rm -rf while containers running
- No more automatic volume deletion
- No more blind directory removal
- No more container restart during volume operations

Testing Guidelines:
- Always use reset-test-environment.sh for cleanup
- Never run cleanup while containers active
- Monitor system health during operations
- Proper shutdown sequence documented

This prevents the BIOS-level crashes experienced in previous rounds.
2026-01-13 20:02:04 -05:00
12df3a1ae2 Round 6: Fix deployment script reliability and credential handling
- Add pre-flight validation checks (internet, disk space, Docker availability)
- Fix Authelia password hash extraction (handle 'Digest:' prefix format)
- Improve credential flow between setup and deploy scripts
- Save plain password for user reference in ADMIN_PASSWORD.txt
- Add cleanup for directory/file conflicts on re-runs
- Add automatic Authelia database cleanup for encryption key mismatches
- Add error recovery guidance with cleanup trap
- Display credentials prominently after deployment
- Update step numbering (now 10 steps with pre-flight)
- Update documentation to Round 6

Tested on fresh Debian 12 installation - both scripts now complete successfully.
2026-01-13 19:57:45 -05:00
ac0e39d091 Round 5 improvements: complete automation and documentation fixes
- Fix password file ownership (user can now read without sudo)
- Add dashboards stack to automated deployment (Step 5/6)
- Add SSL certificate notes to deploy script output
- Clarify .env file location in documentation (stays in repo folder)
- Update README and getting-started.md with accurate deployment steps
- Add Watchtower notification URL documentation
- Improve user feedback with admin credentials and dashboard URLs
- Remove dashboards from 'Next Steps' since it's now automated

User experience improvements:
- Password file readable by user immediately
- Homepage and Homarr deployed automatically
- Clear guidance on .env file management
- Better SSL certificate expectations
2026-01-13 18:43:10 -05:00
f0a3907002 Round 4 improvements: automated config, relative paths, simplified deployment
- Automate Traefik email substitution in deploy script
- Auto-generate Authelia admin password (saved to ADMIN_PASSWORD.txt)
- Standardize all volume paths to use relative paths (./service/config)
- Switch Traefik to HTTP challenge by default (DNS challenge optional)
- Update documentation with improved setup instructions
- Enhance troubleshooting guide
- Update AGENT_INSTRUCTIONS with new conventions
- Simplify .env.example with clearer guidance

These changes reduce manual configuration steps and improve deployment reliability.
2026-01-13 18:30:06 -05:00
f92424ed6d Fix critical deployment issues for Round 4
- Add DOCKER_API_VERSION=1.44 to Watchtower (fixes crash loop)
- Add dockerproxy-network creation to deploy script (fixes dashboard deployment)
- Add explicit acme.json file creation with 600 permissions (fixes SSL cert acquisition)
- Fix setup script to correctly resolve user home directory when run with sudo

These fixes resolve all critical blockers discovered in Round 3 testing.
2026-01-13 17:36:47 -05:00
a53effad10 Add docker-compose configurations and SSL troubleshooting docs
- Added compose files for core, infrastructure, and dashboards stacks
- Added Traefik, Authelia, and DuckDNS configuration files
- Added dockge.managed and dockge.url labels to all services
- Updated Watchtower to latest version with DOCKER_API_VERSION=1.44
- Created comprehensive SSL certificate troubleshooting guide for DuckDNS issues
2026-01-13 16:40:13 -05:00
bbcc4c19c9 Update Homepage dashboard and deployment scripts
- Homepage: Reorganize services by stack instead of by category
- Homepage: Add comprehensive Available to Install sections for all stacks
- Homepage: Update config templates with {{HOMEPAGE_VAR_DOMAIN}} placeholder
- Homepage: Change layout from row to column style
- Scripts: Add sudo requirement to deploy-homelab.sh
- Scripts: Replace NVIDIA driver installation with official installer method
- Scripts: Add build prerequisites and nouveau blacklisting
- Docs: Add AI Automation Guidelines section to docker-guidelines.md
- Docs: Document Homepage auto-update requirements and workflow
- Config: Add bookmarks.yaml template for Homepage
- Config: Add alternatives.yml compose file (Portainer, Authentik)
- Config: Update .env.example and authelia configuration
2026-01-13 00:04:43 -05:00
37a093189e Update documentation for wildcard SSL certificates
- Add wildcard certificate configuration to Traefik docs
- Document DuckDNS DNS challenge limitations
- Add SSL troubleshooting commands to quick reference
- Update getting-started with certificate verification steps
- Emphasize single wildcard cert vs individual certs best practice

Documentation now reflects production wildcard certificate setup.
2026-01-12 23:24:38 -05:00
90462cd179 Fix SSL wildcard certificate setup
- Remove individual certresolver labels from all services except Traefik
- Configure wildcard certificate (*.kelin-hass.duckdns.org) on Traefik only
- Remove AUTHELIA_NOTIFIER_SMTP_PASSWORD env var (filesystem notifier only)
- Fix infrastructure.yml networks section syntax
- Add wildcard SSL certificate setup action report

All services now use single wildcard Let's Encrypt certificate.
Resolves DNS challenge conflicts with DuckDNS provider.
2026-01-12 23:19:27 -05:00
Kelin
de1ac31274 Revise getting started guide for setup process
Updated setup instructions for clarity and added steps for installing git and generating secrets.
2026-01-12 18:52:19 -05:00
Kelin
a1b78ef6d2 Clean up empty lines in services-overview.md
Removed empty lines in services overview documentation.
2026-01-12 18:20:05 -05:00