Files
EZ-Homelab/AGENT_INSTRUCTIONS_DEV.md
kelin adb894d35e Round 10: Add Traefik routing to monitoring services
- Added Traefik labels and routing to prometheus, grafana, loki, cadvisor
- Fixed Grafana ROOT_URL to use domain-based URL (https://grafana.${DOMAIN})
- Added uptime-kuma bypass rule in Authelia (needs initial setup)
- Updated all services to use traefik-network
- Synced domain from kelin-hass to kelin-casa across all configs
- Fixed missing tls=true label on uptime-kuma
- Note: Loki is API-only service (no web UI, accessed via Grafana)
2026-01-14 23:08:37 -05:00

23 KiB

AI Agent Instructions - Repository Development Focus

Mission Statement

You are an AI agent specialized in developing and testing the AI-Homelab repository. Your primary focus is on improving the codebase, scripts, documentation, and configuration templates - not managing a production homelab. You are working with a test environment to validate repository functionality.

Context: Development Phase

  • Current Phase: Testing and development
  • Repository: /home/kelin/AI-Homelab/
  • Purpose: Validate automated deployment, improve scripts, enhance documentation
  • Test System: Local Debian 12 environment for validation
  • User: kelin (PUID=1000, PGID=1000)
  • Key Insight: You're building the tool (repository), not using it in production

Primary Objectives

1. Repository Quality

  • Scripts: Ensure robust error handling, idempotency, and clear user feedback
  • Documentation: Maintain accurate, comprehensive, beginner-friendly docs
  • Templates: Provide production-ready Docker Compose configurations
  • Consistency: Maintain uniform patterns across all files

2. Testing Validation

  • Fresh Install: Verify complete workflow on clean systems
  • Edge Cases: Test error conditions, network failures, invalid inputs
  • Idempotency: Ensure scripts handle re-runs gracefully
  • User Experience: Clear messages, helpful error guidance, smooth flow

3. Code Maintainability

  • Comments: Document non-obvious logic and design decisions
  • Modular Design: Keep functions focused and reusable
  • Version Control: Make atomic, well-described commits
  • Standards: Follow bash best practices and YAML conventions

Repository Structure

~/AI-Homelab/
├── .github/
│   └── copilot-instructions.md        # GitHub Copilot guidelines for homelab management
├── docker-compose/                    # Service stack templates
│   ├── core/                          # DuckDNS, Traefik, Authelia, Gluetun (deploy first)
│   ├── infrastructure/                # Dockge, Portainer, Pi-hole, monitoring
│   ├── dashboards/                    # Homepage, Homarr
│   ├── media/                         # Plex, Jellyfin, *arr services
│   ├── monitoring/                    # Prometheus, Grafana, Loki
│   ├── productivity/                  # Nextcloud, Paperless-ngx, etc.
│   └── *.yml                          # Individual service stacks
├── config-templates/                  # Service configuration files
│   ├── authelia/                      # SSO configuration
│   ├── traefik/                       # Reverse proxy config
│   ├── homepage/                      # Dashboard config
│   └── [other-services]/
├── docs/                              # Comprehensive documentation
│   ├── getting-started.md             # Installation guide
│   ├── services-overview.md           # Service descriptions
│   ├── docker-guidelines.md           # Docker best practices
│   ├── proxying-external-hosts.md     # External host integration
│   ├── quick-reference.md             # Command reference
│   ├── troubleshooting/               # Problem-solving guides
│   └── service-docs/                  # Per-service documentation
├── scripts/                           # Automation scripts
│   ├── setup-homelab.sh               # First-run system setup
│   ├── deploy-homelab.sh              # Deploy core + infrastructure + dashboards
│   └── reset-test-environment.sh      # Clean slate for testing
├── .env.example                       # Environment template with documentation
├── .gitignore                         # Git exclusions
├── README.md                          # Project overview
├── AGENT_INSTRUCTIONS.md              # Original homelab management instructions
└── AGENT_INSTRUCTIONS_DEV.md          # This file - development focus

Core Development Principles

1. Test-Driven Approach

  • Write tests first: Consider edge cases before implementing
  • Validate thoroughly: Test fresh installs, re-runs, failures, edge cases
  • Document testing: Record test results and findings
  • Clean between tests: Use reset script for reproducible testing

2. User Experience First

  • Clear messages: Every script output should be helpful and actionable
  • Error guidance: Don't just say "failed" - explain why and what to do
  • Progress indicators: Show users what's happening (Step X/Y format)
  • Safety checks: Validate prerequisites before making changes

3. Maintainable Code

  • Comments: Explain WHY, not just WHAT
  • Functions: Small, focused, single-responsibility
  • Variables: Descriptive names, clear purpose
  • Constants: Define at top of scripts
  • Error handling: set -e, trap handlers, validation

4. Documentation Standards

  • Beginner-friendly: Assume user is new to Docker/Linux
  • Step-by-step: Clear numbered instructions
  • Examples: Show actual commands and expected output
  • Troubleshooting: Pre-emptively address common issues
  • Up-to-date: Validate docs match current script behavior

Script Development Guidelines

setup-homelab.sh - First-Run Setup

Purpose: Prepare system and configure Authelia on fresh installations

Key Responsibilities:

  • Install Docker Engine + Compose V2
  • Configure user groups (docker, sudo)
  • Set up firewall (UFW) with ports 80, 443, 22
  • Generate Authelia secrets (JWT, session, encryption key)
  • Create admin user with secure password hash
  • Create directory structure (/opt/stacks/, /opt/dockge/)
  • Set up Docker networks
  • Detect and offer NVIDIA GPU driver installation

Development Focus:

  • Idempotency: Detect existing installations, skip completed steps
  • Error handling: Validate each step, provide clear failure messages
  • User interaction: Prompt for admin username, password, email
  • Security: Generate strong secrets, validate password complexity
  • Documentation: Display credentials clearly at end

Testing Checklist:

  • Fresh system: All steps complete successfully
  • Re-run: Detects existing setup, skips appropriately
  • Invalid input: Handles empty passwords, invalid emails
  • Network failure: Clear error messages, retry guidance
  • Low disk space: Pre-flight check catches issue

deploy-homelab.sh - Stack Deployment

Purpose: Deploy core infrastructure, infrastructure, and dashboards

Key Responsibilities:

  • Validate prerequisites (.env file, Docker running)
  • Create Docker networks (homelab, traefik, dockerproxy, media)
  • Copy .env to stack directories
  • Configure Traefik with domain and email
  • Deploy core stack (DuckDNS, Traefik, Authelia, Gluetun)
  • Deploy infrastructure stack (Dockge, Pi-hole, monitoring)
  • Deploy dashboards stack (Homepage, Homarr)
  • Wait for services to become healthy
  • Display access URLs and login information

Development Focus:

  • Sequential deployment: Core first, then infrastructure, then dashboards
  • Health checks: Verify services are running before proceeding
  • Certificate generation: Wait for Let's Encrypt wildcard cert (2-5 min)
  • Error recovery: Clear guidance if deployment fails
  • User feedback: Show progress, success messages, next steps

Testing Checklist:

  • Fresh deployment: All containers start and stay healthy
  • Re-deployment: Handles existing containers gracefully
  • Missing .env: Clear error with instructions
  • Docker not running: Helpful troubleshooting steps
  • Port conflicts: Detect and report clearly

reset-test-environment.sh - Clean Slate

Purpose: Safely remove test deployment for fresh testing

Key Responsibilities:

  • Stop and remove all homelab containers
  • Remove Docker networks (homelab, traefik, dockerproxy, media)
  • Remove deployment directories (/opt/stacks/, /opt/dockge/)
  • Preserve system packages and Docker installation
  • Preserve user credentials and repository

Development Focus:

  • Safety: Only remove homelab resources, not system files
  • Completeness: Remove all traces for clean re-deployment
  • Confirmation: Prompt before destructive operations
  • Documentation: Explain what will and won't be removed

Testing Checklist:

  • Removes all containers and networks
  • Preserves Docker engine and packages
  • Doesn't affect user home directory
  • Allows immediate re-deployment
  • Clear confirmation messages

Docker Compose Template Standards

Service Definition Best Practices

services:
  service-name:
    image: namespace/image:tag          # Pin versions (no :latest)
    container_name: service-name        # Explicit container name
    restart: unless-stopped             # Standard restart policy
    networks:
      - homelab-network                 # Use shared networks
    ports:                              # Only if not using Traefik
      - "8080:8080"
    volumes:
      - ./service-name/config:/config   # Relative paths for configs
      - service-data:/data              # Named volumes for data
      # Large data on separate drives:
      # - /mnt/media:/media
      # - /mnt/downloads:/downloads
    environment:
      - PUID=1000                       # User ID for file permissions
      - PGID=1000                       # Group ID for file permissions
      - TZ=America/New_York             # Consistent timezone
      - UMASK=022                       # File creation mask
    labels:
      # Traefik routing
      - "traefik.enable=true"
      - "traefik.http.routers.service-name.rule=Host(`service.${DOMAIN}`)"
      - "traefik.http.routers.service-name.entrypoints=websecure"
      - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
      # SSO protection (ENABLED BY DEFAULT - security first)
      - "traefik.http.routers.service-name.middlewares=authelia@docker"
      # Only Plex and Jellyfin bypass SSO for app compatibility
      # Organization
      - "homelab.category=category-name"
      - "homelab.description=Service description"

volumes:
  service-data:
    driver: local

networks:
  homelab-network:
    external: true

Volume Path Conventions

  • Config files: Relative paths (./service/config:/config)
  • Large data: Absolute paths (/mnt/media:/media, /mnt/downloads:/downloads)
  • Named volumes: For application data (service-data:/data)
  • Rationale: Relative paths work correctly in Dockge's /opt/stacks/ structure

Security-First Defaults

  • SSO enabled by default: All services start with Authelia middleware
  • Exceptions: Only Plex and Jellyfin bypass SSO (for app/device access)
  • Comment pattern: # - "traefik.http.routers.service.middlewares=authelia@docker"
  • Philosophy: Users should explicitly disable SSO when ready, not add it later

Configuration File Standards

Traefik Configuration

Static Config (traefik.yml):

  • Entry points (web, websecure)
  • Certificate resolvers (Let's Encrypt DNS challenge)
  • Providers (Docker, File)
  • Dashboard configuration

Dynamic Config (dynamic/routes.yml):

  • Custom route definitions
  • External host proxying
  • Middleware definitions (beyond Docker labels)

Authelia Configuration

Main Config (configuration.yml):

  • JWT secret, session secret, encryption key
  • Session settings (domain, expiration)
  • Access control rules (bypass for specific services)
  • Storage backend (local file)
  • Notifier settings (file-based for local testing)

Users Database (users_database.yml):

  • Admin user credentials
  • Password hash (argon2id)
  • Email address for notifications

Homepage Dashboard Configuration

services.yaml:

  • Service listings organized by category
  • Use ${DOMAIN} variable for domain replacement
  • Icons and descriptions for each service
  • Links to service web UIs

Template Pattern:

- Infrastructure:
    - Dockge:
        icon: docker.svg
        href: https://dockge.${DOMAIN}
        description: Docker Compose stack manager

Documentation Standards

Getting Started Guide

Target Audience: Complete beginners to Docker and homelabs

Structure:

  1. Prerequisites (system requirements, accounts needed)
  2. Quick setup (simple step-by-step)
  3. Detailed explanation (what each step does)
  4. Troubleshooting (common issues and solutions)
  5. Next steps (using the homelab)

Writing Style:

  • Clear, simple language
  • Numbered steps
  • Code blocks with syntax highlighting
  • Expected output examples
  • Warning/info callouts for important notes

Service Documentation

Per-Service Pattern:

  1. Overview: What the service does
  2. Access: URL pattern (https://service.${DOMAIN})
  3. Default Credentials: Username/password if applicable
  4. Configuration: Key settings to configure
  5. Integration: How it connects with other services
  6. Troubleshooting: Common issues

Quick Reference

Content:

  • Common commands (Docker, docker-compose)
  • File locations (configs, logs, data)
  • Port mappings (service to host)
  • Network architecture diagram
  • Troubleshooting quick checks

Testing Methodology

Test Rounds

Follow the structured testing approach documented in ROUND_*_PREP.md files:

  1. Fresh Installation: Clean Debian 12 system
  2. Re-run Detection: Idempotency validation
  3. Edge Cases: Invalid inputs, network failures, resource constraints
  4. Service Validation: All services accessible and functional
  5. SSL Validation: Certificate generation and renewal
  6. SSO Validation: Authentication working correctly
  7. Documentation Validation: Instructions match reality

Test Environment Management

# Reset to clean slate
sudo ./scripts/reset-test-environment.sh

# Fresh deployment
sudo ./scripts/setup-homelab.sh
sudo ./scripts/deploy-homelab.sh

# Validate deployment
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
docker network ls | grep homelab

Test Documentation

Record findings in ROUND_*_PREP.md files:

  • Objectives: What you're testing
  • Procedure: Exact commands and steps
  • Results: Success/failure, unexpected behavior
  • Fixes: Changes made to resolve issues
  • Validation: How you confirmed the fix

Common Development Tasks

Adding a New Service Stack

  1. Create compose file: docker-compose/service-name.yml
  2. Define service: Follow template standards
  3. Add configuration: config-templates/service-name/
  4. Document service: docs/service-docs/service-name.md
  5. Update overview: Add to docs/services-overview.md
  6. Test deployment: Validate on test system
  7. Update README: If adding major category

Improving Script Reliability

  1. Identify issue: Document current failure mode
  2. Add validation: Pre-flight checks for prerequisites
  3. Improve errors: Clear messages with actionable guidance
  4. Add recovery: Handle partial failures gracefully
  5. Test edge cases: Invalid inputs, network issues, conflicts
  6. Document behavior: Update comments and docs

Updating Documentation

  1. Identify drift: Find docs that don't match reality
  2. Test procedure: Follow docs exactly, note discrepancies
  3. Update content: Fix inaccuracies, add missing steps
  4. Validate changes: Have someone else follow new docs
  5. Cross-reference: Update related docs for consistency

Refactoring Code

  1. Identify smell: Duplicated code, complex functions, unclear logic
  2. Plan refactor: Design cleaner structure
  3. Extract functions: Create small, focused functions
  4. Improve names: Use descriptive variable/function names
  5. Add comments: Document design decisions
  6. Test thoroughly: Ensure behavior unchanged
  7. Update docs: Reflect any user-facing changes

File Permission Safety (CRITICAL)

The Permission Problem

Round 4 testing revealed that careless sudo usage causes permission issues:

  • Scripts create files as root
  • User can't edit files in their own home directory
  • Requires manual chown to fix

Safe Practices

DO:

  • Check ownership before editing: ls -la /home/kelin/AI-Homelab/
  • Keep files owned by kelin:kelin in user directories
  • Use sudo only for Docker operations and system directories (/opt/)
  • Let scripts handle file creation without sudo when possible

DON'T:

  • Use sudo for file operations in /home/kelin/
  • Blindly escalate privileges on "permission denied"
  • Assume root ownership is needed
  • Ignore ownership in ls -la output

Diagnosis Before Escalation

# Check file ownership
ls -la /home/kelin/AI-Homelab/

# Expected: kelin:kelin ownership
# If root:root, something went wrong

# Fix if needed (user runs this, not scripts)
sudo chown -R kelin:kelin /home/kelin/AI-Homelab/

AI Agent Workflow

When Asked to Add a Service

  1. Research service: Purpose, requirements, dependencies
  2. Check existing patterns: Review similar services in repo
  3. Create compose file: Follow template standards
  4. Add configuration: Create config templates if needed
  5. Write documentation: Service-specific guide
  6. Update references: Add to services overview
  7. Test deployment: Validate on test system

When Asked to Improve Scripts

  1. Understand current behavior: Read script, test execution
  2. Identify issues: Document problems and edge cases
  3. Design solution: Plan improvements
  4. Implement changes: Follow bash best practices
  5. Add error handling: Validate inputs, check prerequisites
  6. Improve messages: Clear, actionable feedback
  7. Test thoroughly: Fresh install, re-run, edge cases
  8. Document changes: Update comments and docs

When Asked to Update Documentation

  1. Locate affected docs: Find all related files
  2. Test current instructions: Follow docs exactly
  3. Note discrepancies: Where docs don't match reality
  4. Update content: Fix errors, add missing info
  5. Validate changes: Test updated instructions
  6. Check cross-references: Update related docs
  7. Review consistency: Ensure uniform terminology

When Asked to Debug an Issue

  1. Reproduce problem: Follow exact steps to trigger issue
  2. Gather context: Logs, file contents, system state
  3. Identify root cause: Trace back to source of failure
  4. Design fix: Consider edge cases and side effects
  5. Implement solution: Make minimal, targeted changes
  6. Test fix: Validate issue is resolved
  7. Prevent recurrence: Add checks or documentation
  8. Document finding: Update troubleshooting docs

Quality Checklist

Before Committing Changes

  • Code follows repository conventions
  • Scripts have error handling and validation
  • New files have appropriate permissions
  • Documentation is updated
  • Changes are tested on clean system
  • Comments explain non-obvious decisions
  • Commit message describes why, not just what

Before Marking Task Complete

  • Primary objective achieved
  • Edge cases handled
  • Documentation updated
  • Tests pass on fresh system
  • No regressions in existing functionality
  • Code reviewed for quality
  • User experience improved

Key Repository Files

.env.example

Purpose: Template for user configuration with documentation

Required Variables:

  • DOMAIN - DuckDNS domain (yourdomain.duckdns.org)
  • DUCKDNS_TOKEN - Token from duckdns.org
  • ACME_EMAIL - Email for Let's Encrypt
  • PUID=1000 - User ID for file permissions
  • PGID=1000 - Group ID for file permissions
  • TZ=America/New_York - Timezone

Auto-Generated (by setup script):

  • AUTHELIA_JWT_SECRET
  • AUTHELIA_SESSION_SECRET
  • AUTHELIA_STORAGE_ENCRYPTION_KEY

Optional (for VPN features):

  • SURFSHARK_USERNAME
  • SURFSHARK_PASSWORD
  • WIREGUARD_PRIVATE_KEY
  • WIREGUARD_ADDRESSES

docker-compose/core/docker-compose.yml

Purpose: Core infrastructure that must deploy first

Services:

  1. DuckDNS: Dynamic DNS updater for Let's Encrypt
  2. Traefik: Reverse proxy with automatic SSL
  3. Authelia: SSO authentication for all services
  4. Gluetun: VPN client (Surfshark WireGuard)

Why Combined:

  • These services depend on each other
  • Simplifies initial deployment (one command)
  • Easier to manage core infrastructure together
  • All core services in /opt/stacks/core/ directory

config-templates/traefik/traefik.yml

Purpose: Traefik static configuration

Key Sections:

  • Entry Points: HTTP (80) and HTTPS (443)
  • Certificate Resolvers: Let's Encrypt with DNS challenge
  • Providers: Docker (automatic service discovery), File (custom routes)
  • Dashboard: Traefik monitoring UI

config-templates/authelia/configuration.yml

Purpose: Authelia SSO configuration

Key Sections:

  • Secrets: JWT, session, encryption key (from .env)
  • Session: Domain, expiration, inactivity timeout
  • Access Control: Rules for bypass (Plex, Jellyfin) vs protected services
  • Storage: Local file backend
  • Notifier: File-based for local testing

Remember: Development Focus

You are building the repository, not managing a production homelab:

  1. Test Thoroughly: Fresh installs, re-runs, edge cases
  2. Document Everything: Assume user is a beginner
  3. Handle Errors Gracefully: Clear messages, actionable guidance
  4. Follow Conventions: Maintain consistency across all files
  5. Validate Changes: Test on clean system before committing
  6. Think About Users: Make their experience smooth and simple
  7. Preserve Context: Comment WHY, not just WHAT
  8. Stay Focused: You're improving the tool, not using it

Quick Reference Commands

Testing Workflow

# Reset test environment
sudo ./scripts/reset-test-environment.sh

# Fresh setup
sudo ./scripts/setup-homelab.sh

# Deploy infrastructure
sudo ./scripts/deploy-homelab.sh

# Check deployment
docker ps --format "table {{.Names}}\t{{.Status}}"
docker network ls | grep homelab
docker logs <container-name>

# Access Dockge
# https://dockge.${DOMAIN}

Repository Management

# Check file ownership
ls -la ~/AI-Homelab/

# Fix permissions if needed
sudo chown -R kelin:kelin ~/AI-Homelab/

# Validate YAML syntax
docker-compose -f docker-compose/core/docker-compose.yml config

# Test environment variable substitution
docker-compose -f docker-compose/core/docker-compose.yml config | grep DOMAIN

Docker Operations

# View all containers
docker ps -a

# View logs
docker logs <container> --tail 50 -f

# Restart service
docker restart <container>

# Remove container
docker rm -f <container>

# View networks
docker network ls

# Inspect network
docker network inspect <network>

Success Criteria

A successful repository provides:

  1. Reliable Scripts: Work on fresh systems, handle edge cases
  2. Clear Documentation: Beginners can follow successfully
  3. Production-Ready Templates: Services work out of the box
  4. Excellent UX: Clear messages, helpful errors, smooth flow
  5. Maintainability: Code is clean, commented, consistent
  6. Testability: Easy to validate changes on test system
  7. Completeness: All necessary services and configs included

Your mission: Make AI-Homelab the best automated homelab deployment tool possible.