EZ-Homelab/AGENT_INSTRUCTIONS_DEV.md

# AI Agent Instructions - Repository Development Focus

## Mission Statement
You are an AI agent specialized in **developing and testing** the AI-Homelab repository. Your primary focus is on improving the codebase, scripts, documentation, and configuration templates - **not managing a production homelab**. You are working with a test environment to validate repository functionality.

## Context: Development Phase
- **Current Phase**: Testing and development
- **Repository**: `/home/kelin/AI-Homelab/`
- **Purpose**: Validate automated deployment, improve scripts, enhance documentation
- **Test System**: Local Debian 12 environment for validation
- **User**: `kelin` (PUID=1000, PGID=1000)
- **Key Insight**: You're building the **tool** (repository), not using it in production

## Primary Objectives

### 1. Repository Quality
- **Scripts**: Ensure robust error handling, idempotency, and clear user feedback
- **Documentation**: Maintain accurate, comprehensive, beginner-friendly docs
- **Templates**: Provide production-ready Docker Compose configurations
- **Consistency**: Maintain uniform patterns across all files

### 2. Testing Validation
- **Fresh Install**: Verify complete workflow on clean systems
- **Edge Cases**: Test error conditions, network failures, invalid inputs
- **Idempotency**: Ensure scripts handle re-runs gracefully
- **User Experience**: Clear messages, helpful error guidance, smooth flow

### 3. Code Maintainability
- **Comments**: Document non-obvious logic and design decisions
- **Modular Design**: Keep functions focused and reusable
- **Version Control**: Make atomic, well-described commits
- **Standards**: Follow bash best practices and YAML conventions

## Repository Structure

```
~/AI-Homelab/
├── .github/
│   └── copilot-instructions.md        # GitHub Copilot guidelines for homelab management
├── docker-compose/                    # Service stack templates
│   ├── core/                          # DuckDNS, Traefik, Authelia, Gluetun (deploy first)
│   ├── infrastructure/                # Dockge, Portainer, Pi-hole, monitoring
│   ├── dashboards/                    # Homepage, Homarr
│   ├── media/                         # Plex, Jellyfin, *arr services
│   ├── monitoring/                    # Prometheus, Grafana, Loki
│   ├── productivity/                  # Nextcloud, Paperless-ngx, etc.
│   └── *.yml                          # Individual service stacks
├── config-templates/                  # Service configuration files
│   ├── authelia/                      # SSO configuration
│   ├── traefik/                       # Reverse proxy config
│   ├── homepage/                      # Dashboard config
│   └── [other-services]/
├── docs/                              # Comprehensive documentation
│   ├── getting-started.md             # Installation guide
│   ├── services-overview.md           # Service descriptions
│   ├── docker-guidelines.md           # Docker best practices
│   ├── proxying-external-hosts.md     # External host integration
│   ├── quick-reference.md             # Command reference
│   ├── troubleshooting/               # Problem-solving guides
│   └── service-docs/                  # Per-service documentation
├── scripts/                           # Automation scripts
│   ├── setup-homelab.sh               # First-run system setup
│   ├── deploy-homelab.sh              # Deploy core + infrastructure + dashboards
│   └── reset-test-environment.sh      # Clean slate for testing
├── .env.example                       # Environment template with documentation
├── .gitignore                         # Git exclusions
├── README.md                          # Project overview
├── AGENT_INSTRUCTIONS.md              # Original homelab management instructions
└── AGENT_INSTRUCTIONS_DEV.md          # This file - development focus
```

## Core Development Principles

### 1. Test-Driven Approach
- **Write tests first**: Consider edge cases before implementing
- **Validate thoroughly**: Test fresh installs, re-runs, failures, edge cases
- **Document testing**: Record test results and findings
- **Clean between tests**: Use reset script for reproducible testing

### 2. User Experience First
- **Clear messages**: Every script output should be helpful and actionable
- **Error guidance**: Don't just say "failed" - explain why and what to do
- **Progress indicators**: Show users what's happening (Step X/Y format)
- **Safety checks**: Validate prerequisites before making changes

### 3. Maintainable Code
- **Comments**: Explain WHY, not just WHAT
- **Functions**: Small, focused, single-responsibility
- **Variables**: Descriptive names, clear purpose
- **Constants**: Define at top of scripts
- **Error handling**: set -e, trap handlers, validation

### 4. Documentation Standards
- **Beginner-friendly**: Assume user is new to Docker/Linux
- **Step-by-step**: Clear numbered instructions
- **Examples**: Show actual commands and expected output
- **Troubleshooting**: Pre-emptively address common issues
- **Up-to-date**: Validate docs match current script behavior

## Script Development Guidelines

### setup-homelab.sh - First-Run Setup
**Purpose**: Prepare system and configure Authelia on fresh installations

**Key Responsibilities:**
- Install Docker Engine + Compose V2
- Configure user groups (docker, sudo)
- Set up firewall (UFW) with ports 80, 443, 22
- Generate Authelia secrets (JWT, session, encryption key)
- Create admin user with secure password hash
- Create directory structure (/opt/stacks/, /opt/dockge/)
- Set up Docker networks
- Detect and offer NVIDIA GPU driver installation

**Development Focus:**
- **Idempotency**: Detect existing installations, skip completed steps
- **Error handling**: Validate each step, provide clear failure messages
- **User interaction**: Prompt for admin username, password, email
- **Security**: Generate strong secrets, validate password complexity
- **Documentation**: Display credentials clearly at end

**Testing Checklist:**
- [ ] Fresh system: All steps complete successfully
- [ ] Re-run: Detects existing setup, skips appropriately
- [ ] Invalid input: Handles empty passwords, invalid emails
- [ ] Network failure: Clear error messages, retry guidance
- [ ] Low disk space: Pre-flight check catches issue

### deploy-homelab.sh - Stack Deployment
**Purpose**: Deploy core infrastructure, infrastructure, and dashboards

**Key Responsibilities:**
- Validate prerequisites (.env file, Docker running)
- Create Docker networks (homelab, traefik, dockerproxy, media)
- Copy .env to stack directories
- Configure Traefik with domain and email
- Deploy core stack (DuckDNS, Traefik, Authelia, Gluetun)
- Deploy infrastructure stack (Dockge, Pi-hole, monitoring)
- Deploy dashboards stack (Homepage, Homarr)
- Wait for services to become healthy
- Display access URLs and login information

**Development Focus:**
- **Sequential deployment**: Core first, then infrastructure, then dashboards
- **Health checks**: Verify services are running before proceeding
- **Certificate generation**: Wait for Let's Encrypt wildcard cert (2-5 min)
- **Error recovery**: Clear guidance if deployment fails
- **User feedback**: Show progress, success messages, next steps

**Testing Checklist:**
- [ ] Fresh deployment: All containers start and stay healthy
- [ ] Re-deployment: Handles existing containers gracefully
- [ ] Missing .env: Clear error with instructions
- [ ] Docker not running: Helpful troubleshooting steps
- [ ] Port conflicts: Detect and report clearly

### reset-test-environment.sh - Clean Slate
**Purpose**: Safely remove test deployment for fresh testing

**Key Responsibilities:**
- Stop and remove all homelab containers
- Remove Docker networks (homelab, traefik, dockerproxy, media)
- Remove deployment directories (/opt/stacks/, /opt/dockge/)
- Preserve system packages and Docker installation
- Preserve user credentials and repository

**Development Focus:**
- **Safety**: Only remove homelab resources, not system files
- **Completeness**: Remove all traces for clean re-deployment
- **Confirmation**: Prompt before destructive operations
- **Documentation**: Explain what will and won't be removed

**Testing Checklist:**
- [ ] Removes all containers and networks
- [ ] Preserves Docker engine and packages
- [ ] Doesn't affect user home directory
- [ ] Allows immediate re-deployment
- [ ] Clear confirmation messages

## Docker Compose Template Standards

### Service Definition Best Practices

```yaml
services:
  service-name:
    image: namespace/image:tag          # Pin versions (no :latest)
    container_name: service-name        # Explicit container name
    restart: unless-stopped             # Standard restart policy
    networks:
      - homelab-network                 # Use shared networks
    ports:                              # Only if not using Traefik
      - "8080:8080"
    volumes:
      - ./service-name/config:/config   # Relative paths for configs
      - service-data:/data              # Named volumes for data
      # Large data on separate drives:
      # - /mnt/media:/media
      # - /mnt/downloads:/downloads
    environment:
      - PUID=1000                       # User ID for file permissions
      - PGID=1000                       # Group ID for file permissions
      - TZ=America/New_York             # Consistent timezone
      - UMASK=022                       # File creation mask
    labels:
      # Traefik routing
      - "traefik.enable=true"
      - "traefik.http.routers.service-name.rule=Host(`service.${DOMAIN}`)"
      - "traefik.http.routers.service-name.entrypoints=websecure"
      - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
      # SSO protection (ENABLED BY DEFAULT - security first)
      - "traefik.http.routers.service-name.middlewares=authelia@docker"
      # Only Plex and Jellyfin bypass SSO for app compatibility
      # Organization
      - "homelab.category=category-name"
      - "homelab.description=Service description"

volumes:
  service-data:
    driver: local

networks:
  homelab-network:
    external: true
```

### Volume Path Conventions
- **Config files**: Relative paths (`./service/config:/config`)
- **Large data**: Absolute paths (`/mnt/media:/media`, `/mnt/downloads:/downloads`)
- **Named volumes**: For application data (`service-data:/data`)
- **Rationale**: Relative paths work correctly in Dockge's `/opt/stacks/` structure

### Security-First Defaults
- **SSO enabled by default**: All services start with Authelia middleware
- **Exceptions**: Only Plex and Jellyfin bypass SSO (for app/device access)
- **Comment pattern**: `# - "traefik.http.routers.service.middlewares=authelia@docker"`
- **Philosophy**: Users should explicitly disable SSO when ready, not add it later

## Configuration File Standards

### Traefik Configuration
**Static Config** (`traefik.yml`):
- Entry points (web, websecure)
- Certificate resolvers (Let's Encrypt DNS challenge)
- Providers (Docker, File)
- Dashboard configuration

**Dynamic Config** (`dynamic/routes.yml`):
- Custom route definitions
- External host proxying
- Middleware definitions (beyond Docker labels)

### Authelia Configuration
**Main Config** (`configuration.yml`):
- JWT secret, session secret, encryption key
- Session settings (domain, expiration)
- Access control rules (bypass for specific services)
- Storage backend (local file)
- Notifier settings (file-based for local testing)

**Users Database** (`users_database.yml`):
- Admin user credentials
- Password hash (argon2id)
- Email address for notifications

### Homepage Dashboard Configuration
**services.yaml**:
- Service listings organized by category
- Use `${DOMAIN}` variable for domain replacement
- Icons and descriptions for each service
- Links to service web UIs

**Template Pattern**:
```yaml
- Infrastructure:
    - Dockge:
        icon: docker.svg
        href: https://dockge.${DOMAIN}
        description: Docker Compose stack manager
```

## Documentation Standards

### Getting Started Guide
**Target Audience**: Complete beginners to Docker and homelabs

**Structure**:
1. Prerequisites (system requirements, accounts needed)
2. Quick setup (simple step-by-step)
3. Detailed explanation (what each step does)
4. Troubleshooting (common issues and solutions)
5. Next steps (using the homelab)

**Writing Style**:
- Clear, simple language
- Numbered steps
- Code blocks with syntax highlighting
- Expected output examples
- Warning/info callouts for important notes

### Service Documentation
**Per-Service Pattern**:
1. **Overview**: What the service does
2. **Access**: URL pattern (`https://service.${DOMAIN}`)
3. **Default Credentials**: Username/password if applicable
4. **Configuration**: Key settings to configure
5. **Integration**: How it connects with other services
6. **Troubleshooting**: Common issues

### Quick Reference
**Content**:
- Common commands (Docker, docker-compose)
- File locations (configs, logs, data)
- Port mappings (service to host)
- Network architecture diagram
- Troubleshooting quick checks

## Testing Methodology

### Test Rounds
Follow the structured testing approach documented in `ROUND_*_PREP.md` files:

1. **Fresh Installation**: Clean Debian 12 system
2. **Re-run Detection**: Idempotency validation
3. **Edge Cases**: Invalid inputs, network failures, resource constraints
4. **Service Validation**: All services accessible and functional
5. **SSL Validation**: Certificate generation and renewal
6. **SSO Validation**: Authentication working correctly
7. **Documentation Validation**: Instructions match reality

### Test Environment Management
```bash
# Reset to clean slate
sudo ./scripts/reset-test-environment.sh

# Fresh deployment
sudo ./scripts/setup-homelab.sh
sudo ./scripts/deploy-homelab.sh

# Validate deployment
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
docker network ls | grep homelab
```

### Test Documentation
Record findings in `ROUND_*_PREP.md` files:
- **Objectives**: What you're testing
- **Procedure**: Exact commands and steps
- **Results**: Success/failure, unexpected behavior
- **Fixes**: Changes made to resolve issues
- **Validation**: How you confirmed the fix

## Common Development Tasks

### Adding a New Service Stack
1. **Create compose file**: `docker-compose/service-name.yml`
2. **Define service**: Follow template standards
3. **Add configuration**: `config-templates/service-name/`
4. **Document service**: `docs/service-docs/service-name.md`
5. **Update overview**: Add to `docs/services-overview.md`
6. **Test deployment**: Validate on test system
7. **Update README**: If adding major category

### Improving Script Reliability
1. **Identify issue**: Document current failure mode
2. **Add validation**: Pre-flight checks for prerequisites
3. **Improve errors**: Clear messages with actionable guidance
4. **Add recovery**: Handle partial failures gracefully
5. **Test edge cases**: Invalid inputs, network issues, conflicts
6. **Document behavior**: Update comments and docs

### Updating Documentation
1. **Identify drift**: Find docs that don't match reality
2. **Test procedure**: Follow docs exactly, note discrepancies
3. **Update content**: Fix inaccuracies, add missing steps
4. **Validate changes**: Have someone else follow new docs
5. **Cross-reference**: Update related docs for consistency

### Refactoring Code
1. **Identify smell**: Duplicated code, complex functions, unclear logic
2. **Plan refactor**: Design cleaner structure
3. **Extract functions**: Create small, focused functions
4. **Improve names**: Use descriptive variable/function names
5. **Add comments**: Document design decisions
6. **Test thoroughly**: Ensure behavior unchanged
7. **Update docs**: Reflect any user-facing changes

## File Permission Safety (CRITICAL)

### The Permission Problem
Round 4 testing revealed that careless sudo usage causes permission issues:
- Scripts create files as root
- User can't edit files in their own home directory
- Requires manual chown to fix

### Safe Practices
**DO:**
- Check ownership before editing: `ls -la /home/kelin/AI-Homelab/`
- Keep files owned by `kelin:kelin` in user directories
- Use sudo only for Docker operations and system directories (/opt/)
- Let scripts handle file creation without sudo when possible

**DON'T:**
- Use sudo for file operations in `/home/kelin/`
- Blindly escalate privileges on "permission denied"
- Assume root ownership is needed
- Ignore ownership in `ls -la` output

### Diagnosis Before Escalation
```bash
# Check file ownership
ls -la /home/kelin/AI-Homelab/

# Expected: kelin:kelin ownership
# If root:root, something went wrong

# Fix if needed (user runs this, not scripts)
sudo chown -R kelin:kelin /home/kelin/AI-Homelab/
```

## AI Agent Workflow

### When Asked to Add a Service
1. **Research service**: Purpose, requirements, dependencies
2. **Check existing patterns**: Review similar services in repo
3. **Create compose file**: Follow template standards
4. **Add configuration**: Create config templates if needed
5. **Write documentation**: Service-specific guide
6. **Update references**: Add to services overview
7. **Test deployment**: Validate on test system

### When Asked to Improve Scripts
1. **Understand current behavior**: Read script, test execution
2. **Identify issues**: Document problems and edge cases
3. **Design solution**: Plan improvements
4. **Implement changes**: Follow bash best practices
5. **Add error handling**: Validate inputs, check prerequisites
6. **Improve messages**: Clear, actionable feedback
7. **Test thoroughly**: Fresh install, re-run, edge cases
8. **Document changes**: Update comments and docs

### When Asked to Update Documentation
1. **Locate affected docs**: Find all related files
2. **Test current instructions**: Follow docs exactly
3. **Note discrepancies**: Where docs don't match reality
4. **Update content**: Fix errors, add missing info
5. **Validate changes**: Test updated instructions
6. **Check cross-references**: Update related docs
7. **Review consistency**: Ensure uniform terminology

### When Asked to Debug an Issue
1. **Reproduce problem**: Follow exact steps to trigger issue
2. **Gather context**: Logs, file contents, system state
3. **Identify root cause**: Trace back to source of failure
4. **Design fix**: Consider edge cases and side effects
5. **Implement solution**: Make minimal, targeted changes
6. **Test fix**: Validate issue is resolved
7. **Prevent recurrence**: Add checks or documentation
8. **Document finding**: Update troubleshooting docs

## Quality Checklist

### Before Committing Changes
- [ ] Code follows repository conventions
- [ ] Scripts have error handling and validation
- [ ] New files have appropriate permissions
- [ ] Documentation is updated
- [ ] Changes are tested on clean system
- [ ] Comments explain non-obvious decisions
- [ ] Commit message describes why, not just what

### Before Marking Task Complete
- [ ] Primary objective achieved
- [ ] Edge cases handled
- [ ] Documentation updated
- [ ] Tests pass on fresh system
- [ ] No regressions in existing functionality
- [ ] Code reviewed for quality
- [ ] User experience improved

## Key Repository Files

### .env.example
**Purpose**: Template for user configuration with documentation

**Required Variables**:
- `DOMAIN` - DuckDNS domain (yourdomain.duckdns.org)
- `DUCKDNS_TOKEN` - Token from duckdns.org
- `ACME_EMAIL` - Email for Let's Encrypt
- `PUID=1000` - User ID for file permissions
- `PGID=1000` - Group ID for file permissions
- `TZ=America/New_York` - Timezone

**Auto-Generated** (by setup script):
- `AUTHELIA_JWT_SECRET`
- `AUTHELIA_SESSION_SECRET`
- `AUTHELIA_STORAGE_ENCRYPTION_KEY`

**Optional** (for VPN features):
- `SURFSHARK_USERNAME`
- `SURFSHARK_PASSWORD`
- `WIREGUARD_PRIVATE_KEY`
- `WIREGUARD_ADDRESSES`

### docker-compose/core/docker-compose.yml
**Purpose**: Core infrastructure that must deploy first

**Services**:
1. **DuckDNS**: Dynamic DNS updater for Let's Encrypt
2. **Traefik**: Reverse proxy with automatic SSL
3. **Authelia**: SSO authentication for all services
4. **Gluetun**: VPN client (Surfshark WireGuard)

**Why Combined**:
- These services depend on each other
- Simplifies initial deployment (one command)
- Easier to manage core infrastructure together
- All core services in `/opt/stacks/core/` directory

### config-templates/traefik/traefik.yml
**Purpose**: Traefik static configuration

**Key Sections**:
- **Entry Points**: HTTP (80) and HTTPS (443)
- **Certificate Resolvers**: Let's Encrypt with DNS challenge
- **Providers**: Docker (automatic service discovery), File (custom routes)
- **Dashboard**: Traefik monitoring UI

### config-templates/authelia/configuration.yml
**Purpose**: Authelia SSO configuration

**Key Sections**:
- **Secrets**: JWT, session, encryption key (from .env)
- **Session**: Domain, expiration, inactivity timeout
- **Access Control**: Rules for bypass (Plex, Jellyfin) vs protected services
- **Storage**: Local file backend
- **Notifier**: File-based for local testing

## Remember: Development Focus

You are **building the repository**, not managing a production homelab:

1. **Test Thoroughly**: Fresh installs, re-runs, edge cases
2. **Document Everything**: Assume user is a beginner
3. **Handle Errors Gracefully**: Clear messages, actionable guidance
4. **Follow Conventions**: Maintain consistency across all files
5. **Validate Changes**: Test on clean system before committing
6. **Think About Users**: Make their experience smooth and simple
7. **Preserve Context**: Comment WHY, not just WHAT
8. **Stay Focused**: You're improving the tool, not using it

## Quick Reference Commands

### Testing Workflow
```bash
# Reset test environment
sudo ./scripts/reset-test-environment.sh

# Fresh setup
sudo ./scripts/setup-homelab.sh

# Deploy infrastructure
sudo ./scripts/deploy-homelab.sh

# Check deployment
docker ps --format "table {{.Names}}\t{{.Status}}"
docker network ls | grep homelab
docker logs <container-name>

# Access Dockge
# https://dockge.${DOMAIN}
```

### Repository Management
```bash
# Check file ownership
ls -la ~/AI-Homelab/

# Fix permissions if needed
sudo chown -R kelin:kelin ~/AI-Homelab/

# Validate YAML syntax
docker-compose -f docker-compose/core/docker-compose.yml config

# Test environment variable substitution
docker-compose -f docker-compose/core/docker-compose.yml config | grep DOMAIN
```

### Docker Operations
```bash
# View all containers
docker ps -a

# View logs
docker logs <container> --tail 50 -f

# Restart service
docker restart <container>

# Remove container
docker rm -f <container>

# View networks
docker network ls

# Inspect network
docker network inspect <network>
```

## Success Criteria

A successful repository provides:

1. **Reliable Scripts**: Work on fresh systems, handle edge cases
2. **Clear Documentation**: Beginners can follow successfully
3. **Production-Ready Templates**: Services work out of the box
4. **Excellent UX**: Clear messages, helpful errors, smooth flow
5. **Maintainability**: Code is clean, commented, consistent
6. **Testability**: Easy to validate changes on test system
7. **Completeness**: All necessary services and configs included

Your mission: Make AI-Homelab the best automated homelab deployment tool possible.