- Added Traefik labels and routing to prometheus, grafana, loki, cadvisor - Fixed Grafana ROOT_URL to use domain-based URL (https://grafana.${DOMAIN}) - Added uptime-kuma bypass rule in Authelia (needs initial setup) - Updated all services to use traefik-network - Synced domain from kelin-hass to kelin-casa across all configs - Fixed missing tls=true label on uptime-kuma - Note: Loki is API-only service (no web UI, accessed via Grafana)
23 KiB
AI Agent Instructions - Repository Development Focus
Mission Statement
You are an AI agent specialized in developing and testing the AI-Homelab repository. Your primary focus is on improving the codebase, scripts, documentation, and configuration templates - not managing a production homelab. You are working with a test environment to validate repository functionality.
Context: Development Phase
- Current Phase: Testing and development
- Repository:
/home/kelin/AI-Homelab/ - Purpose: Validate automated deployment, improve scripts, enhance documentation
- Test System: Local Debian 12 environment for validation
- User:
kelin(PUID=1000, PGID=1000) - Key Insight: You're building the tool (repository), not using it in production
Primary Objectives
1. Repository Quality
- Scripts: Ensure robust error handling, idempotency, and clear user feedback
- Documentation: Maintain accurate, comprehensive, beginner-friendly docs
- Templates: Provide production-ready Docker Compose configurations
- Consistency: Maintain uniform patterns across all files
2. Testing Validation
- Fresh Install: Verify complete workflow on clean systems
- Edge Cases: Test error conditions, network failures, invalid inputs
- Idempotency: Ensure scripts handle re-runs gracefully
- User Experience: Clear messages, helpful error guidance, smooth flow
3. Code Maintainability
- Comments: Document non-obvious logic and design decisions
- Modular Design: Keep functions focused and reusable
- Version Control: Make atomic, well-described commits
- Standards: Follow bash best practices and YAML conventions
Repository Structure
~/AI-Homelab/
├── .github/
│ └── copilot-instructions.md # GitHub Copilot guidelines for homelab management
├── docker-compose/ # Service stack templates
│ ├── core/ # DuckDNS, Traefik, Authelia, Gluetun (deploy first)
│ ├── infrastructure/ # Dockge, Portainer, Pi-hole, monitoring
│ ├── dashboards/ # Homepage, Homarr
│ ├── media/ # Plex, Jellyfin, *arr services
│ ├── monitoring/ # Prometheus, Grafana, Loki
│ ├── productivity/ # Nextcloud, Paperless-ngx, etc.
│ └── *.yml # Individual service stacks
├── config-templates/ # Service configuration files
│ ├── authelia/ # SSO configuration
│ ├── traefik/ # Reverse proxy config
│ ├── homepage/ # Dashboard config
│ └── [other-services]/
├── docs/ # Comprehensive documentation
│ ├── getting-started.md # Installation guide
│ ├── services-overview.md # Service descriptions
│ ├── docker-guidelines.md # Docker best practices
│ ├── proxying-external-hosts.md # External host integration
│ ├── quick-reference.md # Command reference
│ ├── troubleshooting/ # Problem-solving guides
│ └── service-docs/ # Per-service documentation
├── scripts/ # Automation scripts
│ ├── setup-homelab.sh # First-run system setup
│ ├── deploy-homelab.sh # Deploy core + infrastructure + dashboards
│ └── reset-test-environment.sh # Clean slate for testing
├── .env.example # Environment template with documentation
├── .gitignore # Git exclusions
├── README.md # Project overview
├── AGENT_INSTRUCTIONS.md # Original homelab management instructions
└── AGENT_INSTRUCTIONS_DEV.md # This file - development focus
Core Development Principles
1. Test-Driven Approach
- Write tests first: Consider edge cases before implementing
- Validate thoroughly: Test fresh installs, re-runs, failures, edge cases
- Document testing: Record test results and findings
- Clean between tests: Use reset script for reproducible testing
2. User Experience First
- Clear messages: Every script output should be helpful and actionable
- Error guidance: Don't just say "failed" - explain why and what to do
- Progress indicators: Show users what's happening (Step X/Y format)
- Safety checks: Validate prerequisites before making changes
3. Maintainable Code
- Comments: Explain WHY, not just WHAT
- Functions: Small, focused, single-responsibility
- Variables: Descriptive names, clear purpose
- Constants: Define at top of scripts
- Error handling: set -e, trap handlers, validation
4. Documentation Standards
- Beginner-friendly: Assume user is new to Docker/Linux
- Step-by-step: Clear numbered instructions
- Examples: Show actual commands and expected output
- Troubleshooting: Pre-emptively address common issues
- Up-to-date: Validate docs match current script behavior
Script Development Guidelines
setup-homelab.sh - First-Run Setup
Purpose: Prepare system and configure Authelia on fresh installations
Key Responsibilities:
- Install Docker Engine + Compose V2
- Configure user groups (docker, sudo)
- Set up firewall (UFW) with ports 80, 443, 22
- Generate Authelia secrets (JWT, session, encryption key)
- Create admin user with secure password hash
- Create directory structure (/opt/stacks/, /opt/dockge/)
- Set up Docker networks
- Detect and offer NVIDIA GPU driver installation
Development Focus:
- Idempotency: Detect existing installations, skip completed steps
- Error handling: Validate each step, provide clear failure messages
- User interaction: Prompt for admin username, password, email
- Security: Generate strong secrets, validate password complexity
- Documentation: Display credentials clearly at end
Testing Checklist:
- Fresh system: All steps complete successfully
- Re-run: Detects existing setup, skips appropriately
- Invalid input: Handles empty passwords, invalid emails
- Network failure: Clear error messages, retry guidance
- Low disk space: Pre-flight check catches issue
deploy-homelab.sh - Stack Deployment
Purpose: Deploy core infrastructure, infrastructure, and dashboards
Key Responsibilities:
- Validate prerequisites (.env file, Docker running)
- Create Docker networks (homelab, traefik, dockerproxy, media)
- Copy .env to stack directories
- Configure Traefik with domain and email
- Deploy core stack (DuckDNS, Traefik, Authelia, Gluetun)
- Deploy infrastructure stack (Dockge, Pi-hole, monitoring)
- Deploy dashboards stack (Homepage, Homarr)
- Wait for services to become healthy
- Display access URLs and login information
Development Focus:
- Sequential deployment: Core first, then infrastructure, then dashboards
- Health checks: Verify services are running before proceeding
- Certificate generation: Wait for Let's Encrypt wildcard cert (2-5 min)
- Error recovery: Clear guidance if deployment fails
- User feedback: Show progress, success messages, next steps
Testing Checklist:
- Fresh deployment: All containers start and stay healthy
- Re-deployment: Handles existing containers gracefully
- Missing .env: Clear error with instructions
- Docker not running: Helpful troubleshooting steps
- Port conflicts: Detect and report clearly
reset-test-environment.sh - Clean Slate
Purpose: Safely remove test deployment for fresh testing
Key Responsibilities:
- Stop and remove all homelab containers
- Remove Docker networks (homelab, traefik, dockerproxy, media)
- Remove deployment directories (/opt/stacks/, /opt/dockge/)
- Preserve system packages and Docker installation
- Preserve user credentials and repository
Development Focus:
- Safety: Only remove homelab resources, not system files
- Completeness: Remove all traces for clean re-deployment
- Confirmation: Prompt before destructive operations
- Documentation: Explain what will and won't be removed
Testing Checklist:
- Removes all containers and networks
- Preserves Docker engine and packages
- Doesn't affect user home directory
- Allows immediate re-deployment
- Clear confirmation messages
Docker Compose Template Standards
Service Definition Best Practices
services:
service-name:
image: namespace/image:tag # Pin versions (no :latest)
container_name: service-name # Explicit container name
restart: unless-stopped # Standard restart policy
networks:
- homelab-network # Use shared networks
ports: # Only if not using Traefik
- "8080:8080"
volumes:
- ./service-name/config:/config # Relative paths for configs
- service-data:/data # Named volumes for data
# Large data on separate drives:
# - /mnt/media:/media
# - /mnt/downloads:/downloads
environment:
- PUID=1000 # User ID for file permissions
- PGID=1000 # Group ID for file permissions
- TZ=America/New_York # Consistent timezone
- UMASK=022 # File creation mask
labels:
# Traefik routing
- "traefik.enable=true"
- "traefik.http.routers.service-name.rule=Host(`service.${DOMAIN}`)"
- "traefik.http.routers.service-name.entrypoints=websecure"
- "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
# SSO protection (ENABLED BY DEFAULT - security first)
- "traefik.http.routers.service-name.middlewares=authelia@docker"
# Only Plex and Jellyfin bypass SSO for app compatibility
# Organization
- "homelab.category=category-name"
- "homelab.description=Service description"
volumes:
service-data:
driver: local
networks:
homelab-network:
external: true
Volume Path Conventions
- Config files: Relative paths (
./service/config:/config) - Large data: Absolute paths (
/mnt/media:/media,/mnt/downloads:/downloads) - Named volumes: For application data (
service-data:/data) - Rationale: Relative paths work correctly in Dockge's
/opt/stacks/structure
Security-First Defaults
- SSO enabled by default: All services start with Authelia middleware
- Exceptions: Only Plex and Jellyfin bypass SSO (for app/device access)
- Comment pattern:
# - "traefik.http.routers.service.middlewares=authelia@docker" - Philosophy: Users should explicitly disable SSO when ready, not add it later
Configuration File Standards
Traefik Configuration
Static Config (traefik.yml):
- Entry points (web, websecure)
- Certificate resolvers (Let's Encrypt DNS challenge)
- Providers (Docker, File)
- Dashboard configuration
Dynamic Config (dynamic/routes.yml):
- Custom route definitions
- External host proxying
- Middleware definitions (beyond Docker labels)
Authelia Configuration
Main Config (configuration.yml):
- JWT secret, session secret, encryption key
- Session settings (domain, expiration)
- Access control rules (bypass for specific services)
- Storage backend (local file)
- Notifier settings (file-based for local testing)
Users Database (users_database.yml):
- Admin user credentials
- Password hash (argon2id)
- Email address for notifications
Homepage Dashboard Configuration
services.yaml:
- Service listings organized by category
- Use
${DOMAIN}variable for domain replacement - Icons and descriptions for each service
- Links to service web UIs
Template Pattern:
- Infrastructure:
- Dockge:
icon: docker.svg
href: https://dockge.${DOMAIN}
description: Docker Compose stack manager
Documentation Standards
Getting Started Guide
Target Audience: Complete beginners to Docker and homelabs
Structure:
- Prerequisites (system requirements, accounts needed)
- Quick setup (simple step-by-step)
- Detailed explanation (what each step does)
- Troubleshooting (common issues and solutions)
- Next steps (using the homelab)
Writing Style:
- Clear, simple language
- Numbered steps
- Code blocks with syntax highlighting
- Expected output examples
- Warning/info callouts for important notes
Service Documentation
Per-Service Pattern:
- Overview: What the service does
- Access: URL pattern (
https://service.${DOMAIN}) - Default Credentials: Username/password if applicable
- Configuration: Key settings to configure
- Integration: How it connects with other services
- Troubleshooting: Common issues
Quick Reference
Content:
- Common commands (Docker, docker-compose)
- File locations (configs, logs, data)
- Port mappings (service to host)
- Network architecture diagram
- Troubleshooting quick checks
Testing Methodology
Test Rounds
Follow the structured testing approach documented in ROUND_*_PREP.md files:
- Fresh Installation: Clean Debian 12 system
- Re-run Detection: Idempotency validation
- Edge Cases: Invalid inputs, network failures, resource constraints
- Service Validation: All services accessible and functional
- SSL Validation: Certificate generation and renewal
- SSO Validation: Authentication working correctly
- Documentation Validation: Instructions match reality
Test Environment Management
# Reset to clean slate
sudo ./scripts/reset-test-environment.sh
# Fresh deployment
sudo ./scripts/setup-homelab.sh
sudo ./scripts/deploy-homelab.sh
# Validate deployment
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
docker network ls | grep homelab
Test Documentation
Record findings in ROUND_*_PREP.md files:
- Objectives: What you're testing
- Procedure: Exact commands and steps
- Results: Success/failure, unexpected behavior
- Fixes: Changes made to resolve issues
- Validation: How you confirmed the fix
Common Development Tasks
Adding a New Service Stack
- Create compose file:
docker-compose/service-name.yml - Define service: Follow template standards
- Add configuration:
config-templates/service-name/ - Document service:
docs/service-docs/service-name.md - Update overview: Add to
docs/services-overview.md - Test deployment: Validate on test system
- Update README: If adding major category
Improving Script Reliability
- Identify issue: Document current failure mode
- Add validation: Pre-flight checks for prerequisites
- Improve errors: Clear messages with actionable guidance
- Add recovery: Handle partial failures gracefully
- Test edge cases: Invalid inputs, network issues, conflicts
- Document behavior: Update comments and docs
Updating Documentation
- Identify drift: Find docs that don't match reality
- Test procedure: Follow docs exactly, note discrepancies
- Update content: Fix inaccuracies, add missing steps
- Validate changes: Have someone else follow new docs
- Cross-reference: Update related docs for consistency
Refactoring Code
- Identify smell: Duplicated code, complex functions, unclear logic
- Plan refactor: Design cleaner structure
- Extract functions: Create small, focused functions
- Improve names: Use descriptive variable/function names
- Add comments: Document design decisions
- Test thoroughly: Ensure behavior unchanged
- Update docs: Reflect any user-facing changes
File Permission Safety (CRITICAL)
The Permission Problem
Round 4 testing revealed that careless sudo usage causes permission issues:
- Scripts create files as root
- User can't edit files in their own home directory
- Requires manual chown to fix
Safe Practices
DO:
- Check ownership before editing:
ls -la /home/kelin/AI-Homelab/ - Keep files owned by
kelin:kelinin user directories - Use sudo only for Docker operations and system directories (/opt/)
- Let scripts handle file creation without sudo when possible
DON'T:
- Use sudo for file operations in
/home/kelin/ - Blindly escalate privileges on "permission denied"
- Assume root ownership is needed
- Ignore ownership in
ls -laoutput
Diagnosis Before Escalation
# Check file ownership
ls -la /home/kelin/AI-Homelab/
# Expected: kelin:kelin ownership
# If root:root, something went wrong
# Fix if needed (user runs this, not scripts)
sudo chown -R kelin:kelin /home/kelin/AI-Homelab/
AI Agent Workflow
When Asked to Add a Service
- Research service: Purpose, requirements, dependencies
- Check existing patterns: Review similar services in repo
- Create compose file: Follow template standards
- Add configuration: Create config templates if needed
- Write documentation: Service-specific guide
- Update references: Add to services overview
- Test deployment: Validate on test system
When Asked to Improve Scripts
- Understand current behavior: Read script, test execution
- Identify issues: Document problems and edge cases
- Design solution: Plan improvements
- Implement changes: Follow bash best practices
- Add error handling: Validate inputs, check prerequisites
- Improve messages: Clear, actionable feedback
- Test thoroughly: Fresh install, re-run, edge cases
- Document changes: Update comments and docs
When Asked to Update Documentation
- Locate affected docs: Find all related files
- Test current instructions: Follow docs exactly
- Note discrepancies: Where docs don't match reality
- Update content: Fix errors, add missing info
- Validate changes: Test updated instructions
- Check cross-references: Update related docs
- Review consistency: Ensure uniform terminology
When Asked to Debug an Issue
- Reproduce problem: Follow exact steps to trigger issue
- Gather context: Logs, file contents, system state
- Identify root cause: Trace back to source of failure
- Design fix: Consider edge cases and side effects
- Implement solution: Make minimal, targeted changes
- Test fix: Validate issue is resolved
- Prevent recurrence: Add checks or documentation
- Document finding: Update troubleshooting docs
Quality Checklist
Before Committing Changes
- Code follows repository conventions
- Scripts have error handling and validation
- New files have appropriate permissions
- Documentation is updated
- Changes are tested on clean system
- Comments explain non-obvious decisions
- Commit message describes why, not just what
Before Marking Task Complete
- Primary objective achieved
- Edge cases handled
- Documentation updated
- Tests pass on fresh system
- No regressions in existing functionality
- Code reviewed for quality
- User experience improved
Key Repository Files
.env.example
Purpose: Template for user configuration with documentation
Required Variables:
DOMAIN- DuckDNS domain (yourdomain.duckdns.org)DUCKDNS_TOKEN- Token from duckdns.orgACME_EMAIL- Email for Let's EncryptPUID=1000- User ID for file permissionsPGID=1000- Group ID for file permissionsTZ=America/New_York- Timezone
Auto-Generated (by setup script):
AUTHELIA_JWT_SECRETAUTHELIA_SESSION_SECRETAUTHELIA_STORAGE_ENCRYPTION_KEY
Optional (for VPN features):
SURFSHARK_USERNAMESURFSHARK_PASSWORDWIREGUARD_PRIVATE_KEYWIREGUARD_ADDRESSES
docker-compose/core/docker-compose.yml
Purpose: Core infrastructure that must deploy first
Services:
- DuckDNS: Dynamic DNS updater for Let's Encrypt
- Traefik: Reverse proxy with automatic SSL
- Authelia: SSO authentication for all services
- Gluetun: VPN client (Surfshark WireGuard)
Why Combined:
- These services depend on each other
- Simplifies initial deployment (one command)
- Easier to manage core infrastructure together
- All core services in
/opt/stacks/core/directory
config-templates/traefik/traefik.yml
Purpose: Traefik static configuration
Key Sections:
- Entry Points: HTTP (80) and HTTPS (443)
- Certificate Resolvers: Let's Encrypt with DNS challenge
- Providers: Docker (automatic service discovery), File (custom routes)
- Dashboard: Traefik monitoring UI
config-templates/authelia/configuration.yml
Purpose: Authelia SSO configuration
Key Sections:
- Secrets: JWT, session, encryption key (from .env)
- Session: Domain, expiration, inactivity timeout
- Access Control: Rules for bypass (Plex, Jellyfin) vs protected services
- Storage: Local file backend
- Notifier: File-based for local testing
Remember: Development Focus
You are building the repository, not managing a production homelab:
- Test Thoroughly: Fresh installs, re-runs, edge cases
- Document Everything: Assume user is a beginner
- Handle Errors Gracefully: Clear messages, actionable guidance
- Follow Conventions: Maintain consistency across all files
- Validate Changes: Test on clean system before committing
- Think About Users: Make their experience smooth and simple
- Preserve Context: Comment WHY, not just WHAT
- Stay Focused: You're improving the tool, not using it
Quick Reference Commands
Testing Workflow
# Reset test environment
sudo ./scripts/reset-test-environment.sh
# Fresh setup
sudo ./scripts/setup-homelab.sh
# Deploy infrastructure
sudo ./scripts/deploy-homelab.sh
# Check deployment
docker ps --format "table {{.Names}}\t{{.Status}}"
docker network ls | grep homelab
docker logs <container-name>
# Access Dockge
# https://dockge.${DOMAIN}
Repository Management
# Check file ownership
ls -la ~/AI-Homelab/
# Fix permissions if needed
sudo chown -R kelin:kelin ~/AI-Homelab/
# Validate YAML syntax
docker-compose -f docker-compose/core/docker-compose.yml config
# Test environment variable substitution
docker-compose -f docker-compose/core/docker-compose.yml config | grep DOMAIN
Docker Operations
# View all containers
docker ps -a
# View logs
docker logs <container> --tail 50 -f
# Restart service
docker restart <container>
# Remove container
docker rm -f <container>
# View networks
docker network ls
# Inspect network
docker network inspect <network>
Success Criteria
A successful repository provides:
- Reliable Scripts: Work on fresh systems, handle edge cases
- Clear Documentation: Beginners can follow successfully
- Production-Ready Templates: Services work out of the box
- Excellent UX: Clear messages, helpful errors, smooth flow
- Maintainability: Code is clean, commented, consistent
- Testability: Easy to validate changes on test system
- Completeness: All necessary services and configs included
Your mission: Make AI-Homelab the best automated homelab deployment tool possible.