Add ez-assistant and kerberos service folders

2026-02-11 14:56:03 -05:00
parent e4e8ae1b87
commit 9ccfb36923
4471 changed files with 746463 additions and 0 deletions
--- a/docker-compose/ez-assistant/docs/concepts/agent-loop.md
+++ b/docker-compose/ez-assistant/docs/concepts/agent-loop.md
@@ -0,0 +1,126 @@
+---
+summary: "Agent loop lifecycle, streams, and wait semantics"
+read_when:
+  - You need an exact walkthrough of the agent loop or lifecycle events
+---
+# Agent Loop (Moltbot)
+
+An agentic loop is the full “real” run of an agent: intake → context assembly → model inference →
+tool execution → streaming replies → persistence. It’s the authoritative path that turns a message
+into actions and a final reply, while keeping session state consistent.
+
+In Moltbot, a loop is a single, serialized run per session that emits lifecycle and stream events
+as the model thinks, calls tools, and streams output. This doc explains how that authentic loop is
+wired end-to-end.
+
+## Entry points
+- Gateway RPC: `agent` and `agent.wait`.
+- CLI: `agent` command.
+
+## How it works (high-level)
+1) `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately.
+2) `agentCommand` runs the agent:
+   - resolves model + thinking/verbose defaults
+   - loads skills snapshot
+   - calls `runEmbeddedPiAgent` (pi-agent-core runtime)
+   - emits **lifecycle end/error** if the embedded loop does not emit one
+3) `runEmbeddedPiAgent`:
+   - serializes runs via per-session + global queues
+   - resolves model + auth profile and builds the pi session
+   - subscribes to pi events and streams assistant/tool deltas
+   - enforces timeout -> aborts run if exceeded
+   - returns payloads + usage metadata
+4) `subscribeEmbeddedPiSession` bridges pi-agent-core events to Moltbot `agent` stream:
+   - tool events => `stream: "tool"`
+   - assistant deltas => `stream: "assistant"`
+   - lifecycle events => `stream: "lifecycle"` (`phase: "start" | "end" | "error"`)
+5) `agent.wait` uses `waitForAgentJob`:
+   - waits for **lifecycle end/error** for `runId`
+   - returns `{ status: ok|error|timeout, startedAt, endedAt, error? }`
+
+## Queueing + concurrency
+- Runs are serialized per session key (session lane) and optionally through a global lane.
+- This prevents tool/session races and keeps session history consistent.
+- Messaging channels can choose queue modes (collect/steer/followup) that feed this lane system.
+  See [Command Queue](/concepts/queue).
+
+## Session + workspace preparation
+- Workspace is resolved and created; sandboxed runs may redirect to a sandbox workspace root.
+- Skills are loaded (or reused from a snapshot) and injected into env and prompt.
+- Bootstrap/context files are resolved and injected into the system prompt report.
+- A session write lock is acquired; `SessionManager` is opened and prepared before streaming.
+
+## Prompt assembly + system prompt
+- System prompt is built from Moltbot’s base prompt, skills prompt, bootstrap context, and per-run overrides.
+- Model-specific limits and compaction reserve tokens are enforced.
+- See [System prompt](/concepts/system-prompt) for what the model sees.
+
+## Hook points (where you can intercept)
+Moltbot has two hook systems:
+- **Internal hooks** (Gateway hooks): event-driven scripts for commands and lifecycle events.
+- **Plugin hooks**: extension points inside the agent/tool lifecycle and gateway pipeline.
+
+### Internal hooks (Gateway hooks)
+- **`agent:bootstrap`**: runs while building bootstrap files before the system prompt is finalized.
+  Use this to add/remove bootstrap context files.
+- **Command hooks**: `/new`, `/reset`, `/stop`, and other command events (see Hooks doc).
+
+See [Hooks](/hooks) for setup and examples.
+
+### Plugin hooks (agent + gateway lifecycle)
+These run inside the agent loop or gateway pipeline:
+- **`before_agent_start`**: inject context or override system prompt before the run starts.
+- **`agent_end`**: inspect the final message list and run metadata after completion.
+- **`before_compaction` / `after_compaction`**: observe or annotate compaction cycles.
+- **`before_tool_call` / `after_tool_call`**: intercept tool params/results.
+- **`tool_result_persist`**: synchronously transform tool results before they are written to the session transcript.
+- **`message_received` / `message_sending` / `message_sent`**: inbound + outbound message hooks.
+- **`session_start` / `session_end`**: session lifecycle boundaries.
+- **`gateway_start` / `gateway_stop`**: gateway lifecycle events.
+
+See [Plugins](/plugin#plugin-hooks) for the hook API and registration details.
+
+## Streaming + partial replies
+- Assistant deltas are streamed from pi-agent-core and emitted as `assistant` events.
+- Block streaming can emit partial replies either on `text_end` or `message_end`.
+- Reasoning streaming can be emitted as a separate stream or as block replies.
+- See [Streaming](/concepts/streaming) for chunking and block reply behavior.
+
+## Tool execution + messaging tools
+- Tool start/update/end events are emitted on the `tool` stream.
+- Tool results are sanitized for size and image payloads before logging/emitting.
+- Messaging tool sends are tracked to suppress duplicate assistant confirmations.
+
+## Reply shaping + suppression
+- Final payloads are assembled from:
+  - assistant text (and optional reasoning)
+  - inline tool summaries (when verbose + allowed)
+  - assistant error text when the model errors
+- `NO_REPLY` is treated as a silent token and filtered from outgoing payloads.
+- Messaging tool duplicates are removed from the final payload list.
+- If no renderable payloads remain and a tool errored, a fallback tool error reply is emitted
+  (unless a messaging tool already sent a user-visible reply).
+
+## Compaction + retries
+- Auto-compaction emits `compaction` stream events and can trigger a retry.
+- On retry, in-memory buffers and tool summaries are reset to avoid duplicate output.
+- See [Compaction](/concepts/compaction) for the compaction pipeline.
+
+## Event streams (today)
+- `lifecycle`: emitted by `subscribeEmbeddedPiSession` (and as a fallback by `agentCommand`)
+- `assistant`: streamed deltas from pi-agent-core
+- `tool`: streamed tool events from pi-agent-core
+
+## Chat channel handling
+- Assistant deltas are buffered into chat `delta` messages.
+- A chat `final` is emitted on **lifecycle end/error**.
+
+## Timeouts
+- `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides.
+- Agent runtime: `agents.defaults.timeoutSeconds` default 600s; enforced in `runEmbeddedPiAgent` abort timer.
+
+## Where things can end early
+- Agent timeout (abort)
+- AbortSignal (cancel)
+- Gateway disconnect or RPC timeout
+- `agent.wait` timeout (wait-only, does not stop agent)
--- a/docker-compose/ez-assistant/docs/concepts/agent-workspace.md
+++ b/docker-compose/ez-assistant/docs/concepts/agent-workspace.md
@@ -0,0 +1,231 @@
+---
+summary: "Agent workspace: location, layout, and backup strategy"
+read_when:
+  - You need to explain the agent workspace or its file layout
+  - You want to back up or migrate an agent workspace
+---
+# Agent workspace
+
+The workspace is the agent's home. It is the only working directory used for
+file tools and for workspace context. Keep it private and treat it as memory.
+
+This is separate from `~/.clawdbot/`, which stores config, credentials, and
+sessions.
+
+**Important:** the workspace is the **default cwd**, not a hard sandbox. Tools
+resolve relative paths against the workspace, but absolute paths can still reach
+elsewhere on the host unless sandboxing is enabled. If you need isolation, use
+[`agents.defaults.sandbox`](/gateway/sandboxing) (and/or per‑agent sandbox config).
+When sandboxing is enabled and `workspaceAccess` is not `"rw"`, tools operate
+inside a sandbox workspace under `~/.clawdbot/sandboxes`, not your host workspace.
+
+## Default location
+
+- Default: `~/clawd`
+- If `CLAWDBOT_PROFILE` is set and not `"default"`, the default becomes
+  `~/clawd-<profile>`.
+- Override in `~/.clawdbot/moltbot.json`:
+
+```json5
+{
+  agent: {
+    workspace: "~/clawd"
+  }
+}
+```
+
+`moltbot onboard`, `moltbot configure`, or `moltbot setup` will create the
+workspace and seed the bootstrap files if they are missing.
+
+If you already manage the workspace files yourself, you can disable bootstrap
+file creation:
+
+```json5
+{ agent: { skipBootstrap: true } }
+```
+
+## Extra workspace folders
+
+Older installs may have created `~/moltbot`. Keeping multiple workspace
+directories around can cause confusing auth or state drift, because only one
+workspace is active at a time.
+
+**Recommendation:** keep a single active workspace. If you no longer use the
+extra folders, archive or move them to Trash (for example `trash ~/moltbot`).
+If you intentionally keep multiple workspaces, make sure
+`agents.defaults.workspace` points to the active one.
+
+`moltbot doctor` warns when it detects extra workspace directories.
+
+## Workspace file map (what each file means)
+
+These are the standard files Moltbot expects inside the workspace:
+
+- `AGENTS.md`
+  - Operating instructions for the agent and how it should use memory.
+  - Loaded at the start of every session.
+  - Good place for rules, priorities, and "how to behave" details.
+
+- `SOUL.md`
+  - Persona, tone, and boundaries.
+  - Loaded every session.
+
+- `USER.md`
+  - Who the user is and how to address them.
+  - Loaded every session.
+
+- `IDENTITY.md`
+  - The agent's name, vibe, and emoji.
+  - Created/updated during the bootstrap ritual.
+
+- `TOOLS.md`
+  - Notes about your local tools and conventions.
+  - Does not control tool availability; it is only guidance.
+
+- `HEARTBEAT.md`
+  - Optional tiny checklist for heartbeat runs.
+  - Keep it short to avoid token burn.
+
+- `BOOT.md`
+  - Optional startup checklist executed on gateway restart when internal hooks are enabled.
+  - Keep it short; use the message tool for outbound sends.
+
+- `BOOTSTRAP.md`
+  - One-time first-run ritual.
+  - Only created for a brand-new workspace.
+  - Delete it after the ritual is complete.
+
+- `memory/YYYY-MM-DD.md`
+  - Daily memory log (one file per day).
+  - Recommended to read today + yesterday on session start.
+
+- `MEMORY.md` (optional)
+  - Curated long-term memory.
+  - Only load in the main, private session (not shared/group contexts).
+
+See [Memory](/concepts/memory) for the workflow and automatic memory flush.
+
+- `skills/` (optional)
+  - Workspace-specific skills.
+  - Overrides managed/bundled skills when names collide.
+
+- `canvas/` (optional)
+  - Canvas UI files for node displays (for example `canvas/index.html`).
+
+If any bootstrap file is missing, Moltbot injects a "missing file" marker into
+the session and continues. Large bootstrap files are truncated when injected;
+adjust the limit with `agents.defaults.bootstrapMaxChars` (default: 20000).
+`moltbot setup` can recreate missing defaults without overwriting existing
+files.
+
+## What is NOT in the workspace
+
+These live under `~/.clawdbot/` and should NOT be committed to the workspace repo:
+
+- `~/.clawdbot/moltbot.json` (config)
+- `~/.clawdbot/credentials/` (OAuth tokens, API keys)
+- `~/.clawdbot/agents/<agentId>/sessions/` (session transcripts + metadata)
+- `~/.clawdbot/skills/` (managed skills)
+
+If you need to migrate sessions or config, copy them separately and keep them
+out of version control.
+
+## Git backup (recommended, private)
+
+Treat the workspace as private memory. Put it in a **private** git repo so it is
+backed up and recoverable.
+
+Run these steps on the machine where the Gateway runs (that is where the
+workspace lives).
+
+### 1) Initialize the repo
+
+If git is installed, brand-new workspaces are initialized automatically. If this
+workspace is not already a repo, run:
+
+```bash
+cd ~/clawd
+git init
+git add AGENTS.md SOUL.md TOOLS.md IDENTITY.md USER.md HEARTBEAT.md memory/
+git commit -m "Add agent workspace"
+```
+
+### 2) Add a private remote (beginner-friendly options)
+
+Option A: GitHub web UI
+
+1. Create a new **private** repository on GitHub.
+2. Do not initialize with a README (avoids merge conflicts).
+3. Copy the HTTPS remote URL.
+4. Add the remote and push:
+
+```bash
+git branch -M main
+git remote add origin <https-url>
+git push -u origin main
+```
+
+Option B: GitHub CLI (`gh`)
+
+```bash
+gh auth login
+gh repo create clawd-workspace --private --source . --remote origin --push
+```
+
+Option C: GitLab web UI
+
+1. Create a new **private** repository on GitLab.
+2. Do not initialize with a README (avoids merge conflicts).
+3. Copy the HTTPS remote URL.
+4. Add the remote and push:
+
+```bash
+git branch -M main
+git remote add origin <https-url>
+git push -u origin main
+```
+
+### 3) Ongoing updates
+
+```bash
+git status
+git add .
+git commit -m "Update memory"
+git push
+```
+
+## Do not commit secrets
+
+Even in a private repo, avoid storing secrets in the workspace:
+
+- API keys, OAuth tokens, passwords, or private credentials.
+- Anything under `~/.clawdbot/`.
+- Raw dumps of chats or sensitive attachments.
+
+If you must store sensitive references, use placeholders and keep the real
+secret elsewhere (password manager, environment variables, or `~/.clawdbot/`).
+
+Suggested `.gitignore` starter:
+
+```gitignore
+.DS_Store
+.env
+**/*.key
+**/*.pem
+**/secrets*
+```
+
+## Moving the workspace to a new machine
+
+1. Clone the repo to the desired path (default `~/clawd`).
+2. Set `agents.defaults.workspace` to that path in `~/.clawdbot/moltbot.json`.
+3. Run `moltbot setup --workspace <path>` to seed any missing files.
+4. If you need sessions, copy `~/.clawdbot/agents/<agentId>/sessions/` from the
+   old machine separately.
+
+## Advanced notes
+
+- Multi-agent routing can use different workspaces per agent. See
+  [Channel routing](/concepts/channel-routing) for routing configuration.
+- If `agents.defaults.sandbox` is enabled, non-main sessions can use per-session sandbox
+  workspaces under `agents.defaults.sandbox.workspaceRoot`.
--- a/docker-compose/ez-assistant/docs/concepts/agent.md
+++ b/docker-compose/ez-assistant/docs/concepts/agent.md
@@ -0,0 +1,117 @@
+---
+summary: "Agent runtime (embedded p-mono), workspace contract, and session bootstrap"
+read_when:
+  - Changing agent runtime, workspace bootstrap, or session behavior
+---
+# Agent Runtime 🤖
+
+Moltbot runs a single embedded agent runtime derived from **p-mono**.
+
+## Workspace (required)
+
+Moltbot uses a single agent workspace directory (`agents.defaults.workspace`) as the agent’s **only** working directory (`cwd`) for tools and context.
+
+Recommended: use `moltbot setup` to create `~/.clawdbot/moltbot.json` if missing and initialize the workspace files.
+
+Full workspace layout + backup guide: [Agent workspace](/concepts/agent-workspace)
+
+If `agents.defaults.sandbox` is enabled, non-main sessions can override this with
+per-session workspaces under `agents.defaults.sandbox.workspaceRoot` (see
+[Gateway configuration](/gateway/configuration)).
+
+## Bootstrap files (injected)
+
+Inside `agents.defaults.workspace`, Moltbot expects these user-editable files:
+- `AGENTS.md` — operating instructions + “memory”
+- `SOUL.md` — persona, boundaries, tone
+- `TOOLS.md` — user-maintained tool notes (e.g. `imsg`, `sag`, conventions)
+- `BOOTSTRAP.md` — one-time first-run ritual (deleted after completion)
+- `IDENTITY.md` — agent name/vibe/emoji
+- `USER.md` — user profile + preferred address
+
+On the first turn of a new session, Moltbot injects the contents of these files directly into the agent context.
+
+Blank files are skipped. Large files are trimmed and truncated with a marker so prompts stay lean (read the file for full content).
+
+If a file is missing, Moltbot injects a single “missing file” marker line (and `moltbot setup` will create a safe default template).
+
+`BOOTSTRAP.md` is only created for a **brand new workspace** (no other bootstrap files present). If you delete it after completing the ritual, it should not be recreated on later restarts.
+
+To disable bootstrap file creation entirely (for pre-seeded workspaces), set:
+
+```json5
+{ agent: { skipBootstrap: true } }
+```
+
+## Built-in tools
+
+Core tools (read/exec/edit/write and related system tools) are always available,
+subject to tool policy. `apply_patch` is optional and gated by
+`tools.exec.applyPatch`. `TOOLS.md` does **not** control which tools exist; it’s
+guidance for how *you* want them used.
+
+## Skills
+
+Moltbot loads skills from three locations (workspace wins on name conflict):
+- Bundled (shipped with the install)
+- Managed/local: `~/.clawdbot/skills`
+- Workspace: `<workspace>/skills`
+
+Skills can be gated by config/env (see `skills` in [Gateway configuration](/gateway/configuration)).
+
+## p-mono integration
+
+Moltbot reuses pieces of the p-mono codebase (models/tools), but **session management, discovery, and tool wiring are Moltbot-owned**.
+
+- No p-coding agent runtime.
+- No `~/.pi/agent` or `<workspace>/.pi` settings are consulted.
+
+## Sessions
+
+Session transcripts are stored as JSONL at:
+- `~/.clawdbot/agents/<agentId>/sessions/<SessionId>.jsonl`
+
+The session ID is stable and chosen by Moltbot.
+Legacy Pi/Tau session folders are **not** read.
+
+## Steering while streaming
+
+When queue mode is `steer`, inbound messages are injected into the current run.
+The queue is checked **after each tool call**; if a queued message is present,
+remaining tool calls from the current assistant message are skipped (error tool
+results with "Skipped due to queued user message."), then the queued user
+message is injected before the next assistant response.
+
+When queue mode is `followup` or `collect`, inbound messages are held until the
+current turn ends, then a new agent turn starts with the queued payloads. See
+[Queue](/concepts/queue) for mode + debounce/cap behavior.
+
+Block streaming sends completed assistant blocks as soon as they finish; it is
+**off by default** (`agents.defaults.blockStreamingDefault: "off"`).
+Tune the boundary via `agents.defaults.blockStreamingBreak` (`text_end` vs `message_end`; defaults to text_end).
+Control soft block chunking with `agents.defaults.blockStreamingChunk` (defaults to
+800–1200 chars; prefers paragraph breaks, then newlines; sentences last).
+Coalesce streamed chunks with `agents.defaults.blockStreamingCoalesce` to reduce
+single-line spam (idle-based merging before send). Non-Telegram channels require
+explicit `*.blockStreaming: true` to enable block replies.
+Verbose tool summaries are emitted at tool start (no debounce); Control UI
+streams tool output via agent events when available.
+More details: [Streaming + chunking](/concepts/streaming).
+
+## Model refs
+
+Model refs in config (for example `agents.defaults.model` and `agents.defaults.models`) are parsed by splitting on the **first** `/`.
+
+- Use `provider/model` when configuring models.
+- If the model ID itself contains `/` (OpenRouter-style), include the provider prefix (example: `openrouter/moonshotai/kimi-k2`).
+- If you omit the provider, Moltbot treats the input as an alias or a model for the **default provider** (only works when there is no `/` in the model ID).
+
+## Configuration (minimal)
+
+At minimum, set:
+- `agents.defaults.workspace`
+- `channels.whatsapp.allowFrom` (strongly recommended)
+
+---
+
+*Next: [Group Chats](/concepts/group-messages)* 🦞
--- a/docker-compose/ez-assistant/docs/concepts/architecture.md
+++ b/docker-compose/ez-assistant/docs/concepts/architecture.md
@@ -0,0 +1,122 @@
+---
+summary: "WebSocket gateway architecture, components, and client flows"
+read_when:
+  - Working on gateway protocol, clients, or transports
+---
+# Gateway architecture
+
+Last updated: 2026-01-22
+
+## Overview
+
+- A single long‑lived **Gateway** owns all messaging surfaces (WhatsApp via
+  Baileys, Telegram via grammY, Slack, Discord, Signal, iMessage, WebChat).
+- Control-plane clients (macOS app, CLI, web UI, automations) connect to the
+  Gateway over **WebSocket** on the configured bind host (default
+  `127.0.0.1:18789`).
+- **Nodes** (macOS/iOS/Android/headless) also connect over **WebSocket**, but
+  declare `role: node` with explicit caps/commands.
+- One Gateway per host; it is the only place that opens a WhatsApp session.
+- A **canvas host** (default `18793`) serves agent‑editable HTML and A2UI.
+
+## Components and flows
+
+### Gateway (daemon)
+- Maintains provider connections.
+- Exposes a typed WS API (requests, responses, server‑push events).
+- Validates inbound frames against JSON Schema.
+- Emits events like `agent`, `chat`, `presence`, `health`, `heartbeat`, `cron`.
+
+### Clients (mac app / CLI / web admin)
+- One WS connection per client.
+- Send requests (`health`, `status`, `send`, `agent`, `system-presence`).
+- Subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
+
+### Nodes (macOS / iOS / Android / headless)
+- Connect to the **same WS server** with `role: node`.
+- Provide a device identity in `connect`; pairing is **device‑based** (role `node`) and
+  approval lives in the device pairing store.
+- Expose commands like `canvas.*`, `camera.*`, `screen.record`, `location.get`.
+
+Protocol details:
+- [Gateway protocol](/gateway/protocol)
+
+### WebChat
+- Static UI that uses the Gateway WS API for chat history and sends.
+- In remote setups, connects through the same SSH/Tailscale tunnel as other
+  clients.
+
+## Connection lifecycle (single client)
+
+```
+Client                    Gateway
+  |                          |
+  |---- req:connect -------->|
+  |<------ res (ok) ---------|   (or res error + close)
+  |   (payload=hello-ok carries snapshot: presence + health)
+  |                          |
+  |<------ event:presence ---|
+  |<------ event:tick -------|
+  |                          |
+  |------- req:agent ------->|
+  |<------ res:agent --------|   (ack: {runId,status:"accepted"})
+  |<------ event:agent ------|   (streaming)
+  |<------ res:agent --------|   (final: {runId,status,summary})
+  |                          |
+```
+
+## Wire protocol (summary)
+
+- Transport: WebSocket, text frames with JSON payloads.
+- First frame **must** be `connect`.
+- After handshake:
+  - Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}`
+  - Events: `{type:"event", event, payload, seq?, stateVersion?}`
+- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token`
+  must match or the socket closes.
+- Idempotency keys are required for side‑effecting methods (`send`, `agent`) to
+  safely retry; the server keeps a short‑lived dedupe cache.
+- Nodes must include `role: "node"` plus caps/commands/permissions in `connect`.
+
+## Pairing + local trust
+
+- All WS clients (operators + nodes) include a **device identity** on `connect`.
+- New device IDs require pairing approval; the Gateway issues a **device token**
+  for subsequent connects.
+- **Local** connects (loopback or the gateway host’s own tailnet address) can be
+  auto‑approved to keep same‑host UX smooth.
+- **Non‑local** connects must sign the `connect.challenge` nonce and require
+  explicit approval.
+- Gateway auth (`gateway.auth.*`) still applies to **all** connections, local or
+  remote.
+
+Details: [Gateway protocol](/gateway/protocol), [Pairing](/start/pairing),
+[Security](/gateway/security).
+
+## Protocol typing and codegen
+
+- TypeBox schemas define the protocol.
+- JSON Schema is generated from those schemas.
+- Swift models are generated from the JSON Schema.
+
+## Remote access
+
+- Preferred: Tailscale or VPN.
+- Alternative: SSH tunnel
+  ```bash
+  ssh -N -L 18789:127.0.0.1:18789 user@host
+  ```
+- The same handshake + auth token apply over the tunnel.
+- TLS + optional pinning can be enabled for WS in remote setups.
+
+## Operations snapshot
+
+- Start: `moltbot gateway` (foreground, logs to stdout).
+- Health: `health` over WS (also included in `hello-ok`).
+- Supervision: launchd/systemd for auto‑restart.
+
+## Invariants
+
+- Exactly one Gateway controls a single Baileys session per host.
+- Handshake is mandatory; any non‑JSON or non‑connect first frame is a hard close.
+- Events are not replayed; clients must refresh on gaps.
--- a/docker-compose/ez-assistant/docs/concepts/channel-routing.md
+++ b/docker-compose/ez-assistant/docs/concepts/channel-routing.md
@@ -0,0 +1,114 @@
+---
+summary: "Routing rules per channel (WhatsApp, Telegram, Discord, Slack) and shared context"
+read_when:
+  - Changing channel routing or inbox behavior
+---
+# Channels & routing
+
+
+Moltbot routes replies **back to the channel where a message came from**. The
+model does not choose a channel; routing is deterministic and controlled by the
+host configuration.
+
+## Key terms
+
+- **Channel**: `whatsapp`, `telegram`, `discord`, `slack`, `signal`, `imessage`, `webchat`.
+- **AccountId**: per‑channel account instance (when supported).
+- **AgentId**: an isolated workspace + session store (“brain”).
+- **SessionKey**: the bucket key used to store context and control concurrency.
+
+## Session key shapes (examples)
+
+Direct messages collapse to the agent’s **main** session:
+
+- `agent:<agentId>:<mainKey>` (default: `agent:main:main`)
+
+Groups and channels remain isolated per channel:
+
+- Groups: `agent:<agentId>:<channel>:group:<id>`
+- Channels/rooms: `agent:<agentId>:<channel>:channel:<id>`
+
+Threads:
+
+- Slack/Discord threads append `:thread:<threadId>` to the base key.
+- Telegram forum topics embed `:topic:<topicId>` in the group key.
+
+Examples:
+
+- `agent:main:telegram:group:-1001234567890:topic:42`
+- `agent:main:discord:channel:123456:thread:987654`
+
+## Routing rules (how an agent is chosen)
+
+Routing picks **one agent** for each inbound message:
+
+1. **Exact peer match** (`bindings` with `peer.kind` + `peer.id`).
+2. **Guild match** (Discord) via `guildId`.
+3. **Team match** (Slack) via `teamId`.
+4. **Account match** (`accountId` on the channel).
+5. **Channel match** (any account on that channel).
+6. **Default agent** (`agents.list[].default`, else first list entry, fallback to `main`).
+
+The matched agent determines which workspace and session store are used.
+
+## Broadcast groups (run multiple agents)
+
+Broadcast groups let you run **multiple agents** for the same peer **when Moltbot would normally reply** (for example: in WhatsApp groups, after mention/activation gating).
+
+Config:
+
+```json5
+{
+  broadcast: {
+    strategy: "parallel",
+    "120363403215116621@g.us": ["alfred", "baerbel"],
+    "+15555550123": ["support", "logger"]
+  }
+}
+```
+
+See: [Broadcast Groups](/broadcast-groups).
+
+## Config overview
+
+- `agents.list`: named agent definitions (workspace, model, etc.).
+- `bindings`: map inbound channels/accounts/peers to agents.
+
+Example:
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "support", name: "Support", workspace: "~/clawd-support" }
+    ]
+  },
+  bindings: [
+    { match: { channel: "slack", teamId: "T123" }, agentId: "support" },
+    { match: { channel: "telegram", peer: { kind: "group", id: "-100123" } }, agentId: "support" }
+  ]
+}
+```
+
+## Session storage
+
+Session stores live under the state directory (default `~/.clawdbot`):
+
+- `~/.clawdbot/agents/<agentId>/sessions/sessions.json`
+- JSONL transcripts live alongside the store
+
+You can override the store path via `session.store` and `{agentId}` templating.
+
+## WebChat behavior
+
+WebChat attaches to the **selected agent** and defaults to the agent’s main
+session. Because of this, WebChat lets you see cross‑channel context for that
+agent in one place.
+
+## Reply context
+
+Inbound replies include:
+- `ReplyToId`, `ReplyToBody`, and `ReplyToSender` when available.
+- Quoted context is appended to `Body` as a `[Replying to ...]` block.
+
+This is consistent across channels.
--- a/docker-compose/ez-assistant/docs/concepts/compaction.md
+++ b/docker-compose/ez-assistant/docs/concepts/compaction.md
@@ -0,0 +1,49 @@
+---
+summary: "Context window + compaction: how Moltbot keeps sessions under model limits"
+read_when:
+  - You want to understand auto-compaction and /compact
+  - You are debugging long sessions hitting context limits
+---
+# Context Window & Compaction
+
+Every model has a **context window** (max tokens it can see). Long-running chats accumulate messages and tool results; once the window is tight, Moltbot **compacts** older history to stay within limits.
+
+## What compaction is
+Compaction **summarizes older conversation** into a compact summary entry and keeps recent messages intact. The summary is stored in the session history, so future requests use:
+- The compaction summary
+- Recent messages after the compaction point
+
+Compaction **persists** in the session’s JSONL history.
+
+## Configuration
+See [Compaction config & modes](/concepts/compaction) for the `agents.defaults.compaction` settings.
+
+## Auto-compaction (default on)
+When a session nears or exceeds the model’s context window, Moltbot triggers auto-compaction and may retry the original request using the compacted context.
+
+You’ll see:
+- `🧹 Auto-compaction complete` in verbose mode
+- `/status` showing `🧹 Compactions: <count>`
+
+Before compaction, Moltbot can run a **silent memory flush** turn to store
+durable notes to disk. See [Memory](/concepts/memory) for details and config.
+
+## Manual compaction
+Use `/compact` (optionally with instructions) to force a compaction pass:
+```
+/compact Focus on decisions and open questions
+```
+
+## Context window source
+Context window is model-specific. Moltbot uses the model definition from the configured provider catalog to determine limits.
+
+## Compaction vs pruning
+- **Compaction**: summarises and **persists** in JSONL.
+- **Session pruning**: trims old **tool results** only, **in-memory**, per request.
+
+See [/concepts/session-pruning](/concepts/session-pruning) for pruning details.
+
+## Tips
+- Use `/compact` when sessions feel stale or context is bloated.
+- Large tool outputs are already truncated; pruning can further reduce tool-result buildup.
+- If you need a fresh slate, `/new` or `/reset` starts a new session id.
--- a/docker-compose/ez-assistant/docs/concepts/context.md
+++ b/docker-compose/ez-assistant/docs/concepts/context.md
@@ -0,0 +1,151 @@
+---
+summary: "Context: what the model sees, how it is built, and how to inspect it"
+read_when:
+  - You want to understand what “context” means in Moltbot
+  - You are debugging why the model “knows” something (or forgot it)
+  - You want to reduce context overhead (/context, /status, /compact)
+---
+# Context
+
+“Context” is **everything Moltbot sends to the model for a run**. It is bounded by the model’s **context window** (token limit).
+
+Beginner mental model:
+- **System prompt** (Moltbot-built): rules, tools, skills list, time/runtime, and injected workspace files.
+- **Conversation history**: your messages + the assistant’s messages for this session.
+- **Tool calls/results + attachments**: command output, file reads, images/audio, etc.
+
+Context is *not the same thing* as “memory”: memory can be stored on disk and reloaded later; context is what’s inside the model’s current window.
+
+## Quick start (inspect context)
+
+- `/status` → quick “how full is my window?” view + session settings.
+- `/context list` → what’s injected + rough sizes (per file + totals).
+- `/context detail` → deeper breakdown: per-file, per-tool schema sizes, per-skill entry sizes, and system prompt size.
+- `/usage tokens` → append per-reply usage footer to normal replies.
+- `/compact` → summarize older history into a compact entry to free window space.
+
+See also: [Slash commands](/tools/slash-commands), [Token use & costs](/token-use), [Compaction](/concepts/compaction).
+
+## Example output
+
+Values vary by model, provider, tool policy, and what’s in your workspace.
+
+### `/context list`
+
+```
+🧠 Context breakdown
+Workspace: <workspaceDir>
+Bootstrap max/file: 20,000 chars
+Sandbox: mode=non-main sandboxed=false
+System prompt (run): 38,412 chars (~9,603 tok) (Project Context 23,901 chars (~5,976 tok))
+
+Injected workspace files:
+- AGENTS.md: OK | raw 1,742 chars (~436 tok) | injected 1,742 chars (~436 tok)
+- SOUL.md: OK | raw 912 chars (~228 tok) | injected 912 chars (~228 tok)
+- TOOLS.md: TRUNCATED | raw 54,210 chars (~13,553 tok) | injected 20,962 chars (~5,241 tok)
+- IDENTITY.md: OK | raw 211 chars (~53 tok) | injected 211 chars (~53 tok)
+- USER.md: OK | raw 388 chars (~97 tok) | injected 388 chars (~97 tok)
+- HEARTBEAT.md: MISSING | raw 0 | injected 0
+- BOOTSTRAP.md: OK | raw 0 chars (~0 tok) | injected 0 chars (~0 tok)
+
+Skills list (system prompt text): 2,184 chars (~546 tok) (12 skills)
+Tools: read, edit, write, exec, process, browser, message, sessions_send, …
+Tool list (system prompt text): 1,032 chars (~258 tok)
+Tool schemas (JSON): 31,988 chars (~7,997 tok) (counts toward context; not shown as text)
+Tools: (same as above)
+
+Session tokens (cached): 14,250 total / ctx=32,000
+```
+
+### `/context detail`
+
+```
+🧠 Context breakdown (detailed)
+…
+Top skills (prompt entry size):
+- frontend-design: 412 chars (~103 tok)
+- oracle: 401 chars (~101 tok)
+… (+10 more skills)
+
+Top tools (schema size):
+- browser: 9,812 chars (~2,453 tok)
+- exec: 6,240 chars (~1,560 tok)
+… (+N more tools)
+```
+
+## What counts toward the context window
+
+Everything the model receives counts, including:
+- System prompt (all sections).
+- Conversation history.
+- Tool calls + tool results.
+- Attachments/transcripts (images/audio/files).
+- Compaction summaries and pruning artifacts.
+- Provider “wrappers” or hidden headers (not visible, still counted).
+
+## How Moltbot builds the system prompt
+
+The system prompt is **Moltbot-owned** and rebuilt each run. It includes:
+- Tool list + short descriptions.
+- Skills list (metadata only; see below).
+- Workspace location.
+- Time (UTC + converted user time if configured).
+- Runtime metadata (host/OS/model/thinking).
+- Injected workspace bootstrap files under **Project Context**.
+
+Full breakdown: [System Prompt](/concepts/system-prompt).
+
+## Injected workspace files (Project Context)
+
+By default, Moltbot injects a fixed set of workspace files (if present):
+- `AGENTS.md`
+- `SOUL.md`
+- `TOOLS.md`
+- `IDENTITY.md`
+- `USER.md`
+- `HEARTBEAT.md`
+- `BOOTSTRAP.md` (first-run only)
+
+Large files are truncated per-file using `agents.defaults.bootstrapMaxChars` (default `20000` chars). `/context` shows **raw vs injected** sizes and whether truncation happened.
+
+## Skills: what’s injected vs loaded on-demand
+
+The system prompt includes a compact **skills list** (name + description + location). This list has real overhead.
+
+Skill instructions are *not* included by default. The model is expected to `read` the skill’s `SKILL.md` **only when needed**.
+
+## Tools: there are two costs
+
+Tools affect context in two ways:
+1) **Tool list text** in the system prompt (what you see as “Tooling”).
+2) **Tool schemas** (JSON). These are sent to the model so it can call tools. They count toward context even though you don’t see them as plain text.
+
+`/context detail` breaks down the biggest tool schemas so you can see what dominates.
+
+## Commands, directives, and “inline shortcuts”
+
+Slash commands are handled by the Gateway. There are a few different behaviors:
+- **Standalone commands**: a message that is only `/...` runs as a command.
+- **Directives**: `/think`, `/verbose`, `/reasoning`, `/elevated`, `/model`, `/queue` are stripped before the model sees the message.
+  - Directive-only messages persist session settings.
+  - Inline directives in a normal message act as per-message hints.
+- **Inline shortcuts** (allowlisted senders only): certain `/...` tokens inside a normal message can run immediately (example: “hey /status”), and are stripped before the model sees the remaining text.
+
+Details: [Slash commands](/tools/slash-commands).
+
+## Sessions, compaction, and pruning (what persists)
+
+What persists across messages depends on the mechanism:
+- **Normal history** persists in the session transcript until compacted/pruned by policy.
+- **Compaction** persists a summary into the transcript and keeps recent messages intact.
+- **Pruning** removes old tool results from the *in-memory* prompt for a run, but does not rewrite the transcript.
+
+Docs: [Session](/concepts/session), [Compaction](/concepts/compaction), [Session pruning](/concepts/session-pruning).
+
+## What `/context` actually reports
+
+`/context` prefers the latest **run-built** system prompt report when available:
+- `System prompt (run)` = captured from the last embedded (tool-capable) run and persisted in the session store.
+- `System prompt (estimate)` = computed on the fly when no run report exists (or when running via a CLI backend that doesn’t generate the report).
+
+Either way, it reports sizes and top contributors; it does **not** dump the full system prompt or tool schemas.
--- a/docker-compose/ez-assistant/docs/concepts/group-messages.md
+++ b/docker-compose/ez-assistant/docs/concepts/group-messages.md
@@ -0,0 +1,78 @@
+---
+summary: "Behavior and config for WhatsApp group message handling (mentionPatterns are shared across surfaces)"
+read_when:
+  - Changing group message rules or mentions
+---
+# Group messages (WhatsApp web channel)
+
+Goal: let Clawd sit in WhatsApp groups, wake up only when pinged, and keep that thread separate from the personal DM session.
+
+Note: `agents.list[].groupChat.mentionPatterns` is now used by Telegram/Discord/Slack/iMessage as well; this doc focuses on WhatsApp-specific behavior. For multi-agent setups, set `agents.list[].groupChat.mentionPatterns` per agent (or use `messages.groupChat.mentionPatterns` as a global fallback).
+
+## What’s implemented (2025-12-03)
+- Activation modes: `mention` (default) or `always`. `mention` requires a ping (real WhatsApp @-mentions via `mentionedJids`, regex patterns, or the bot’s E.164 anywhere in the text). `always` wakes the agent on every message but it should reply only when it can add meaningful value; otherwise it returns the silent token `NO_REPLY`. Defaults can be set in config (`channels.whatsapp.groups`) and overridden per group via `/activation`. When `channels.whatsapp.groups` is set, it also acts as a group allowlist (include `"*"` to allow all).
+- Group policy: `channels.whatsapp.groupPolicy` controls whether group messages are accepted (`open|disabled|allowlist`). `allowlist` uses `channels.whatsapp.groupAllowFrom` (fallback: explicit `channels.whatsapp.allowFrom`). Default is `allowlist` (blocked until you add senders).
+- Per-group sessions: session keys look like `agent:<agentId>:whatsapp:group:<jid>` so commands such as `/verbose on` or `/think high` (sent as standalone messages) are scoped to that group; personal DM state is untouched. Heartbeats are skipped for group threads.
+- Context injection: **pending-only** group messages (default 50) that *did not* trigger a run are prefixed under `[Chat messages since your last reply - for context]`, with the triggering line under `[Current message - respond to this]`. Messages already in the session are not re-injected.
+- Sender surfacing: every group batch now ends with `[from: Sender Name (+E164)]` so Pi knows who is speaking.
+- Ephemeral/view-once: we unwrap those before extracting text/mentions, so pings inside them still trigger.
+- Group system prompt: on the first turn of a group session (and whenever `/activation` changes the mode) we inject a short blurb into the system prompt like `You are replying inside the WhatsApp group "<subject>". Group members: Alice (+44...), Bob (+43...), … Activation: trigger-only … Address the specific sender noted in the message context.` If metadata isn’t available we still tell the agent it’s a group chat.
+
+## Config example (WhatsApp)
+Add a `groupChat` block to `~/.clawdbot/moltbot.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body:
+
+```json5
+{
+  channels: {
+    whatsapp: {
+      groups: {
+        "*": { requireMention: true }
+      }
+    }
+  },
+  agents: {
+    list: [
+      {
+        id: "main",
+        groupChat: {
+          historyLimit: 50,
+          mentionPatterns: [
+            "@?moltbot",
+            "\\+?15555550123"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+
+Notes:
+- The regexes are case-insensitive; they cover a display-name ping like `@moltbot` and the raw number with or without `+`/spaces.
+- WhatsApp still sends canonical mentions via `mentionedJids` when someone taps the contact, so the number fallback is rarely needed but is a useful safety net.
+
+### Activation command (owner-only)
+
+Use the group chat command:
+- `/activation mention`
+- `/activation always`
+
+Only the owner number (from `channels.whatsapp.allowFrom`, or the bot’s own E.164 when unset) can change this. Send `/status` as a standalone message in the group to see the current activation mode.
+
+## How to use
+1) Add your WhatsApp account (the one running Moltbot) to the group.
+2) Say `@moltbot …` (or include the number). Only allowlisted senders can trigger it unless you set `groupPolicy: "open"`.
+3) The agent prompt will include recent group context plus the trailing `[from: …]` marker so it can address the right person.
+4) Session-level directives (`/verbose on`, `/think high`, `/new` or `/reset`, `/compact`) apply only to that group’s session; send them as standalone messages so they register. Your personal DM session remains independent.
+
+## Testing / verification
+- Manual smoke:
+  - Send an `@clawd` ping in the group and confirm a reply that references the sender name.
+  - Send a second ping and verify the history block is included then cleared on the next turn.
+- Check gateway logs (run with `--verbose`) to see `inbound web message` entries showing `from: <groupJid>` and the `[from: …]` suffix.
+
+## Known considerations
+- Heartbeats are intentionally skipped for groups to avoid noisy broadcasts.
+- Echo suppression uses the combined batch string; if you send identical text twice without mentions, only the first will get a response.
+- Session store entries will appear as `agent:<agentId>:whatsapp:group:<jid>` in the session store (`~/.clawdbot/agents/<agentId>/sessions/sessions.json` by default); a missing entry just means the group hasn’t triggered a run yet.
+- Typing indicators in groups follow `agents.defaults.typingMode` (default: `message` when unmentioned).
--- a/docker-compose/ez-assistant/docs/concepts/groups.md
+++ b/docker-compose/ez-assistant/docs/concepts/groups.md
@@ -0,0 +1,344 @@
+---
+summary: "Group chat behavior across surfaces (WhatsApp/Telegram/Discord/Slack/Signal/iMessage/Microsoft Teams)"
+read_when:
+  - Changing group chat behavior or mention gating
+---
+# Groups
+
+Moltbot treats group chats consistently across surfaces: WhatsApp, Telegram, Discord, Slack, Signal, iMessage, Microsoft Teams.
+
+## Beginner intro (2 minutes)
+Moltbot “lives” on your own messaging accounts. There is no separate WhatsApp bot user.
+If **you** are in a group, Moltbot can see that group and respond there.
+
+Default behavior:
+- Groups are restricted (`groupPolicy: "allowlist"`).
+- Replies require a mention unless you explicitly disable mention gating.
+
+Translation: allowlisted senders can trigger Moltbot by mentioning it.
+
+> TL;DR
+> - **DM access** is controlled by `*.allowFrom`.
+> - **Group access** is controlled by `*.groupPolicy` + allowlists (`*.groups`, `*.groupAllowFrom`).
+> - **Reply triggering** is controlled by mention gating (`requireMention`, `/activation`).
+
+Quick flow (what happens to a group message):
+```
+groupPolicy? disabled -> drop
+groupPolicy? allowlist -> group allowed? no -> drop
+requireMention? yes -> mentioned? no -> store for context only
+otherwise -> reply
+```
+
+![Group message flow](/images/groups-flow.svg)
+
+If you want...
+| Goal | What to set |
+|------|-------------|
+| Allow all groups but only reply on @mentions | `groups: { "*": { requireMention: true } }` |
+| Disable all group replies | `groupPolicy: "disabled"` |
+| Only specific groups | `groups: { "<group-id>": { ... } }` (no `"*"` key) |
+| Only you can trigger in groups | `groupPolicy: "allowlist"`, `groupAllowFrom: ["+1555..."]` |
+
+## Session keys
+- Group sessions use `agent:<agentId>:<channel>:group:<id>` session keys (rooms/channels use `agent:<agentId>:<channel>:channel:<id>`).
+- Telegram forum topics add `:topic:<threadId>` to the group id so each topic has its own session.
+- Direct chats use the main session (or per-sender if configured).
+- Heartbeats are skipped for group sessions.
+
+## Pattern: personal DMs + public groups (single agent)
+
+Yes — this works well if your “personal” traffic is **DMs** and your “public” traffic is **groups**.
+
+Why: in single-agent mode, DMs typically land in the **main** session key (`agent:main:main`), while groups always use **non-main** session keys (`agent:main:<channel>:group:<id>`). If you enable sandboxing with `mode: "non-main"`, those group sessions run in Docker while your main DM session stays on-host.
+
+This gives you one agent “brain” (shared workspace + memory), but two execution postures:
+- **DMs**: full tools (host)
+- **Groups**: sandbox + restricted tools (Docker)
+
+> If you need truly separate workspaces/personas (“personal” and “public” must never mix), use a second agent + bindings. See [Multi-Agent Routing](/concepts/multi-agent).
+
+Example (DMs on host, groups sandboxed + messaging-only tools):
+
+```json5
+{
+  agents: {
+    defaults: {
+      sandbox: {
+        mode: "non-main", // groups/channels are non-main -> sandboxed
+        scope: "session", // strongest isolation (one container per group/channel)
+        workspaceAccess: "none"
+      }
+    }
+  },
+  tools: {
+    sandbox: {
+      tools: {
+        // If allow is non-empty, everything else is blocked (deny still wins).
+        allow: ["group:messaging", "group:sessions"],
+        deny: ["group:runtime", "group:fs", "group:ui", "nodes", "cron", "gateway"]
+      }
+    }
+  }
+}
+```
+
+Want “groups can only see folder X” instead of “no host access”? Keep `workspaceAccess: "none"` and mount only allowlisted paths into the sandbox:
+
+```json5
+{
+  agents: {
+    defaults: {
+      sandbox: {
+        mode: "non-main",
+        scope: "session",
+        workspaceAccess: "none",
+        docker: {
+          binds: [
+            // hostPath:containerPath:mode
+            "~/FriendsShared:/data:ro"
+          ]
+        }
+      }
+    }
+  }
+}
+```
+
+Related:
+- Configuration keys and defaults: [Gateway configuration](/gateway/configuration#agentsdefaultssandbox)
+- Debugging why a tool is blocked: [Sandbox vs Tool Policy vs Elevated](/gateway/sandbox-vs-tool-policy-vs-elevated)
+- Bind mounts details: [Sandboxing](/gateway/sandboxing#custom-bind-mounts)
+
+## Display labels
+- UI labels use `displayName` when available, formatted as `<channel>:<token>`.
+- `#room` is reserved for rooms/channels; group chats use `g-<slug>` (lowercase, spaces -> `-`, keep `#@+._-`).
+
+## Group policy
+Control how group/room messages are handled per channel:
+
+```json5
+{
+  channels: {
+    whatsapp: {
+      groupPolicy: "disabled", // "open" | "disabled" | "allowlist"
+      groupAllowFrom: ["+15551234567"]
+    },
+    telegram: {
+      groupPolicy: "disabled",
+      groupAllowFrom: ["123456789", "@username"]
+    },
+    signal: {
+      groupPolicy: "disabled",
+      groupAllowFrom: ["+15551234567"]
+    },
+    imessage: {
+      groupPolicy: "disabled",
+      groupAllowFrom: ["chat_id:123"]
+    },
+    msteams: {
+      groupPolicy: "disabled",
+      groupAllowFrom: ["user@org.com"]
+    },
+    discord: {
+      groupPolicy: "allowlist",
+      guilds: {
+        "GUILD_ID": { channels: { help: { allow: true } } }
+      }
+    },
+    slack: {
+      groupPolicy: "allowlist",
+      channels: { "#general": { allow: true } }
+    },
+    matrix: {
+      groupPolicy: "allowlist",
+      groupAllowFrom: ["@owner:example.org"],
+      groups: {
+        "!roomId:example.org": { allow: true },
+        "#alias:example.org": { allow: true }
+      }
+    }
+  }
+}
+```
+
+| Policy | Behavior |
+|--------|----------|
+| `"open"` | Groups bypass allowlists; mention-gating still applies. |
+| `"disabled"` | Block all group messages entirely. |
+| `"allowlist"` | Only allow groups/rooms that match the configured allowlist. |
+
+Notes:
+- `groupPolicy` is separate from mention-gating (which requires @mentions).
+- WhatsApp/Telegram/Signal/iMessage/Microsoft Teams: use `groupAllowFrom` (fallback: explicit `allowFrom`).
+- Discord: allowlist uses `channels.discord.guilds.<id>.channels`.
+- Slack: allowlist uses `channels.slack.channels`.
+- Matrix: allowlist uses `channels.matrix.groups` (room IDs, aliases, or names). Use `channels.matrix.groupAllowFrom` to restrict senders; per-room `users` allowlists are also supported.
+- Group DMs are controlled separately (`channels.discord.dm.*`, `channels.slack.dm.*`).
+- Telegram allowlist can match user IDs (`"123456789"`, `"telegram:123456789"`, `"tg:123456789"`) or usernames (`"@alice"` or `"alice"`); prefixes are case-insensitive.
+- Default is `groupPolicy: "allowlist"`; if your group allowlist is empty, group messages are blocked.
+
+Quick mental model (evaluation order for group messages):
+1) `groupPolicy` (open/disabled/allowlist)
+2) group allowlists (`*.groups`, `*.groupAllowFrom`, channel-specific allowlist)
+3) mention gating (`requireMention`, `/activation`)
+
+## Mention gating (default)
+Group messages require a mention unless overridden per group. Defaults live per subsystem under `*.groups."*"`.
+
+Replying to a bot message counts as an implicit mention (when the channel supports reply metadata). This applies to Telegram, WhatsApp, Slack, Discord, and Microsoft Teams.
+
+```json5
+{
+  channels: {
+    whatsapp: {
+      groups: {
+        "*": { requireMention: true },
+        "123@g.us": { requireMention: false }
+      }
+    },
+    telegram: {
+      groups: {
+        "*": { requireMention: true },
+        "123456789": { requireMention: false }
+      }
+    },
+    imessage: {
+      groups: {
+        "*": { requireMention: true },
+        "123": { requireMention: false }
+      }
+    }
+  },
+  agents: {
+    list: [
+      {
+        id: "main",
+        groupChat: {
+          mentionPatterns: ["@clawd", "moltbot", "\\+15555550123"],
+          historyLimit: 50
+        }
+      }
+    ]
+  }
+}
+```
+
+Notes:
+- `mentionPatterns` are case-insensitive regexes.
+- Surfaces that provide explicit mentions still pass; patterns are a fallback.
+- Per-agent override: `agents.list[].groupChat.mentionPatterns` (useful when multiple agents share a group).
+- Mention gating is only enforced when mention detection is possible (native mentions or `mentionPatterns` are configured).
+- Discord defaults live in `channels.discord.guilds."*"` (overridable per guild/channel).
+- Group history context is wrapped uniformly across channels and is **pending-only** (messages skipped due to mention gating); use `messages.groupChat.historyLimit` for the global default and `channels.<channel>.historyLimit` (or `channels.<channel>.accounts.*.historyLimit`) for overrides. Set `0` to disable.
+
+## Group/channel tool restrictions (optional)
+Some channel configs support restricting which tools are available **inside a specific group/room/channel**.
+
+- `tools`: allow/deny tools for the whole group.
+- `toolsBySender`: per-sender overrides within the group (keys are sender IDs/usernames/emails/phone numbers depending on the channel). Use `"*"` as a wildcard.
+
+Resolution order (most specific wins):
+1) group/channel `toolsBySender` match
+2) group/channel `tools`
+3) default (`"*"`) `toolsBySender` match
+4) default (`"*"`) `tools`
+
+Example (Telegram):
+
+```json5
+{
+  channels: {
+    telegram: {
+      groups: {
+        "*": { tools: { deny: ["exec"] } },
+        "-1001234567890": {
+          tools: { deny: ["exec", "read", "write"] },
+          toolsBySender: {
+            "123456789": { alsoAllow: ["exec"] }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+- Group/channel tool restrictions are applied in addition to global/agent tool policy (deny still wins).
+- Some channels use different nesting for rooms/channels (e.g., Discord `guilds.*.channels.*`, Slack `channels.*`, MS Teams `teams.*.channels.*`).
+
+## Group allowlists
+When `channels.whatsapp.groups`, `channels.telegram.groups`, or `channels.imessage.groups` is configured, the keys act as a group allowlist. Use `"*"` to allow all groups while still setting default mention behavior.
+
+Common intents (copy/paste):
+
+1) Disable all group replies
+```json5
+{
+  channels: { whatsapp: { groupPolicy: "disabled" } }
+}
+```
+
+2) Allow only specific groups (WhatsApp)
+```json5
+{
+  channels: {
+    whatsapp: {
+      groups: {
+        "123@g.us": { requireMention: true },
+        "456@g.us": { requireMention: false }
+      }
+    }
+  }
+}
+```
+
+3) Allow all groups but require mention (explicit)
+```json5
+{
+  channels: {
+    whatsapp: {
+      groups: { "*": { requireMention: true } }
+    }
+  }
+}
+```
+
+4) Only the owner can trigger in groups (WhatsApp)
+```json5
+{
+  channels: {
+    whatsapp: {
+      groupPolicy: "allowlist",
+      groupAllowFrom: ["+15551234567"],
+      groups: { "*": { requireMention: true } }
+    }
+  }
+}
+```
+
+## Activation (owner-only)
+Group owners can toggle per-group activation:
+- `/activation mention`
+- `/activation always`
+
+Owner is determined by `channels.whatsapp.allowFrom` (or the bot’s self E.164 when unset). Send the command as a standalone message. Other surfaces currently ignore `/activation`.
+
+## Context fields
+Group inbound payloads set:
+- `ChatType=group`
+- `GroupSubject` (if known)
+- `GroupMembers` (if known)
+- `WasMentioned` (mention gating result)
+- Telegram forum topics also include `MessageThreadId` and `IsForum`.
+
+The agent system prompt includes a group intro on the first turn of a new group session. It reminds the model to respond like a human, avoid Markdown tables, and avoid typing literal `\n` sequences.
+
+## iMessage specifics
+- Prefer `chat_id:<id>` when routing or allowlisting.
+- List chats: `imsg chats --limit 20`.
+- Group replies always go back to the same `chat_id`.
+
+## WhatsApp specifics
+See [Group messages](/concepts/group-messages) for WhatsApp-only behavior (history injection, mention handling details).
--- a/docker-compose/ez-assistant/docs/concepts/markdown-formatting.md
+++ b/docker-compose/ez-assistant/docs/concepts/markdown-formatting.md
@@ -0,0 +1,132 @@
+---
+summary: "Markdown formatting pipeline for outbound channels"
+read_when:
+  - You are changing markdown formatting or chunking for outbound channels
+  - You are adding a new channel formatter or style mapping
+  - You are debugging formatting regressions across channels
+---
+# Markdown formatting
+
+Moltbot formats outbound Markdown by converting it into a shared intermediate
+representation (IR) before rendering channel-specific output. The IR keeps the
+source text intact while carrying style/link spans so chunking and rendering can
+stay consistent across channels.
+
+## Goals
+
+- **Consistency:** one parse step, multiple renderers.
+- **Safe chunking:** split text before rendering so inline formatting never
+  breaks across chunks.
+- **Channel fit:** map the same IR to Slack mrkdwn, Telegram HTML, and Signal
+  style ranges without re-parsing Markdown.
+
+## Pipeline
+
+1. **Parse Markdown -> IR**
+   - IR is plain text plus style spans (bold/italic/strike/code/spoiler) and link spans.
+   - Offsets are UTF-16 code units so Signal style ranges align with its API.
+   - Tables are parsed only when a channel opts into table conversion.
+2. **Chunk IR (format-first)**
+   - Chunking happens on the IR text before rendering.
+   - Inline formatting does not split across chunks; spans are sliced per chunk.
+3. **Render per channel**
+   - **Slack:** mrkdwn tokens (bold/italic/strike/code), links as `<url|label>`.
+   - **Telegram:** HTML tags (`<b>`, `<i>`, `<s>`, `<code>`, `<pre><code>`, `<a href>`).
+   - **Signal:** plain text + `text-style` ranges; links become `label (url)` when label differs.
+
+## IR example
+
+Input Markdown:
+
+```markdown
+Hello **world** — see [docs](https://docs.molt.bot).
+```
+
+IR (schematic):
+
+```json
+{
+  "text": "Hello world — see docs.",
+  "styles": [
+    { "start": 6, "end": 11, "style": "bold" }
+  ],
+  "links": [
+    { "start": 19, "end": 23, "href": "https://docs.molt.bot" }
+  ]
+}
+```
+
+## Where it is used
+
+- Slack, Telegram, and Signal outbound adapters render from the IR.
+- Other channels (WhatsApp, iMessage, MS Teams, Discord) still use plain text or
+  their own formatting rules, with Markdown table conversion applied before
+  chunking when enabled.
+
+## Table handling
+
+Markdown tables are not consistently supported across chat clients. Use
+`markdown.tables` to control conversion per channel (and per account).
+
+- `code`: render tables as code blocks (default for most channels).
+- `bullets`: convert each row into bullet points (default for Signal + WhatsApp).
+- `off`: disable table parsing and conversion; raw table text passes through.
+
+Config keys:
+
+```yaml
+channels:
+  discord:
+    markdown:
+      tables: code
+    accounts:
+      work:
+        markdown:
+          tables: off
+```
+
+## Chunking rules
+
+- Chunk limits come from channel adapters/config and are applied to the IR text.
+- Code fences are preserved as a single block with a trailing newline so channels
+  render them correctly.
+- List prefixes and blockquote prefixes are part of the IR text, so chunking
+  does not split mid-prefix.
+- Inline styles (bold/italic/strike/inline-code/spoiler) are never split across
+  chunks; the renderer reopens styles inside each chunk.
+
+If you need more on chunking behavior across channels, see
+[Streaming + chunking](/concepts/streaming).
+
+## Link policy
+
+- **Slack:** `[label](url)` -> `<url|label>`; bare URLs remain bare. Autolink
+  is disabled during parse to avoid double-linking.
+- **Telegram:** `[label](url)` -> `<a href="url">label</a>` (HTML parse mode).
+- **Signal:** `[label](url)` -> `label (url)` unless label matches the URL.
+
+## Spoilers
+
+Spoiler markers (`||spoiler||`) are parsed only for Signal, where they map to
+SPOILER style ranges. Other channels treat them as plain text.
+
+## How to add or update a channel formatter
+
+1. **Parse once:** use the shared `markdownToIR(...)` helper with channel-appropriate
+   options (autolink, heading style, blockquote prefix).
+2. **Render:** implement a renderer with `renderMarkdownWithMarkers(...)` and a
+   style marker map (or Signal style ranges).
+3. **Chunk:** call `chunkMarkdownIR(...)` before rendering; render each chunk.
+4. **Wire adapter:** update the channel outbound adapter to use the new chunker
+   and renderer.
+5. **Test:** add or update format tests and an outbound delivery test if the
+   channel uses chunking.
+
+## Common gotchas
+
+- Slack angle-bracket tokens (`<@U123>`, `<#C123>`, `<https://...>`) must be
+  preserved; escape raw HTML safely.
+- Telegram HTML requires escaping text outside tags to avoid broken markup.
+- Signal style ranges depend on UTF-16 offsets; do not use code point offsets.
+- Preserve trailing newlines for fenced code blocks so closing markers land on
+  their own line.
--- a/docker-compose/ez-assistant/docs/concepts/memory.md
+++ b/docker-compose/ez-assistant/docs/concepts/memory.md
@@ -0,0 +1,388 @@
+---
+summary: "How Moltbot memory works (workspace files + automatic memory flush)"
+read_when:
+  - You want the memory file layout and workflow
+  - You want to tune the automatic pre-compaction memory flush
+---
+# Memory
+
+Moltbot memory is **plain Markdown in the agent workspace**. The files are the
+source of truth; the model only "remembers" what gets written to disk.
+
+Memory search tools are provided by the active memory plugin (default:
+`memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.
+
+## Memory files (Markdown)
+
+The default workspace layout uses two memory layers:
+
+- `memory/YYYY-MM-DD.md`
+  - Daily log (append-only).
+  - Read today + yesterday at session start.
+- `MEMORY.md` (optional)
+  - Curated long-term memory.
+  - **Only load in the main, private session** (never in group contexts).
+
+These files live under the workspace (`agents.defaults.workspace`, default
+`~/clawd`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
+
+## When to write memory
+
+- Decisions, preferences, and durable facts go to `MEMORY.md`.
+- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
+- If someone says "remember this," write it down (do not keep it in RAM).
+- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
+- If you want something to stick, **ask the bot to write it** into memory.
+
+## Automatic memory flush (pre-compaction ping)
+
+When a session is **close to auto-compaction**, Moltbot triggers a **silent,
+agentic turn** that reminds the model to write durable memory **before** the
+context is compacted. The default prompts explicitly say the model *may reply*,
+but usually `NO_REPLY` is the correct response so the user never sees this turn.
+
+This is controlled by `agents.defaults.compaction.memoryFlush`:
+
+```json5
+{
+  agents: {
+    defaults: {
+      compaction: {
+        reserveTokensFloor: 20000,
+        memoryFlush: {
+          enabled: true,
+          softThresholdTokens: 4000,
+          systemPrompt: "Session nearing compaction. Store durable memories now.",
+          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
+        }
+      }
+    }
+  }
+}
+```
+
+Details:
+- **Soft threshold**: flush triggers when the session token estimate crosses
+  `contextWindow - reserveTokensFloor - softThresholdTokens`.
+- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
+- **Two prompts**: a user prompt plus a system prompt append the reminder.
+- **One flush per compaction cycle** (tracked in `sessions.json`).
+- **Workspace must be writable**: if the session runs sandboxed with
+  `workspaceAccess: "ro"` or `"none"`, the flush is skipped.
+
+For the full compaction lifecycle, see
+[Session management + compaction](/reference/session-management-compaction).
+
+## Vector memory search
+
+Moltbot can build a small vector index over `MEMORY.md` and `memory/*.md` so
+semantic queries can find related notes even when wording differs.
+
+Defaults:
+- Enabled by default.
+- Watches memory files for changes (debounced).
+- Uses remote embeddings by default. If `memorySearch.provider` is not set, Moltbot auto-selects:
+  1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
+  2. `openai` if an OpenAI key can be resolved.
+  3. `gemini` if a Gemini key can be resolved.
+  4. Otherwise memory search stays disabled until configured.
+- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
+- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
+
+Remote embeddings **require** an API key for the embedding provider. Moltbot
+resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
+variables. Codex OAuth only covers chat/completions and does **not** satisfy
+embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
+`models.providers.google.apiKey`. When using a custom OpenAI-compatible endpoint,
+set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).
+
+### Gemini embeddings (native)
+
+Set the provider to `gemini` to use the Gemini embeddings API directly:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "gemini",
+      model: "gemini-embedding-001",
+      remote: {
+        apiKey: "YOUR_GEMINI_API_KEY"
+      }
+    }
+  }
+}
+```
+
+Notes:
+- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
+- `remote.headers` lets you add extra headers if needed.
+- Default model: `gemini-embedding-001`.
+
+If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy),
+you can use the `remote` configuration with the OpenAI provider:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
+        headers: { "X-Custom-Header": "value" }
+      }
+    }
+  }
+}
+```
+
+If you don't want to set an API key, use `memorySearch.provider = "local"` or set
+`memorySearch.fallback = "none"`.
+
+Fallbacks:
+- `memorySearch.fallback` can be `openai`, `gemini`, `local`, or `none`.
+- The fallback provider is only used when the primary embedding provider fails.
+
+Batch indexing (OpenAI + Gemini):
+- Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable.
+- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
+- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
+- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
+- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
+
+Why OpenAI batch is fast + cheap:
+- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
+- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
+- See the OpenAI Batch API docs and pricing for details:
+  - https://platform.openai.com/docs/api-reference/batch
+  - https://platform.openai.com/pricing
+
+Config example:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      fallback: "openai",
+      remote: {
+        batch: { enabled: true, concurrency: 2 }
+      },
+      sync: { watch: true }
+    }
+  }
+}
+```
+
+Tools:
+- `memory_search` — returns snippets with file + line ranges.
+- `memory_get` — read memory file content by path.
+
+Local mode:
+- Set `agents.defaults.memorySearch.provider = "local"`.
+- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
+- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
+
+### How the memory tools work
+
+- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
+- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
+- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
+
+### What gets indexed (and when)
+
+- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
+- Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
+- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
+- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, Moltbot automatically resets and reindexes the entire store.
+
+### Hybrid search (BM25 + vector)
+
+When enabled, Moltbot combines:
+- **Vector similarity** (semantic match, wording can differ)
+- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
+
+If full-text search is unavailable on your platform, Moltbot falls back to vector-only search.
+
+#### Why hybrid?
+
+Vector search is great at “this means the same thing”:
+- “Mac Studio gateway host” vs “the machine running the gateway”
+- “debounce file updates” vs “avoid indexing on every write”
+
+But it can be weak at exact, high-signal tokens:
+- IDs (`a828e60`, `b3b9895a…`)
+- code symbols (`memorySearch.query.hybrid`)
+- error strings (“sqlite-vec unavailable”)
+
+BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
+Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
+good results for both “natural language” queries and “needle in a haystack” queries.
+
+#### How we merge results (the current design)
+
+Implementation sketch:
+
+1) Retrieve a candidate pool from both sides:
+- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
+- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
+
+2) Convert BM25 rank into a 0..1-ish score:
+- `textScore = 1 / (1 + max(0, bm25Rank))`
+
+3) Union candidates by chunk id and compute a weighted score:
+- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
+
+Notes:
+- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
+- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
+- If FTS5 can’t be created, we keep vector-only search (no hard failure).
+
+This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes.
+If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
+(min/max or z-score) before mixing.
+
+Config:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      query: {
+        hybrid: {
+          enabled: true,
+          vectorWeight: 0.7,
+          textWeight: 0.3,
+          candidateMultiplier: 4
+        }
+      }
+    }
+  }
+}
+```
+
+### Embedding cache
+
+Moltbot can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
+
+Config:
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      cache: {
+        enabled: true,
+        maxEntries: 50000
+      }
+    }
+  }
+}
+```
+
+### Session memory search (experimental)
+
+You can optionally index **session transcripts** and surface them via `memory_search`.
+This is gated behind an experimental flag.
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      experimental: { sessionMemory: true },
+      sources: ["memory", "sessions"]
+    }
+  }
+}
+```
+
+Notes:
+- Session indexing is **opt-in** (off by default).
+- Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
+- `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
+- Results still include snippets only; `memory_get` remains limited to memory files.
+- Session indexing is isolated per agent (only that agent’s session logs are indexed).
+- Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
+
+Delta thresholds (defaults shown):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      sync: {
+        sessions: {
+          deltaBytes: 100000,   // ~100 KB
+          deltaMessages: 50     // JSONL lines
+        }
+      }
+    }
+  }
+}
+```
+
+### SQLite vector acceleration (sqlite-vec)
+
+When the sqlite-vec extension is available, Moltbot stores embeddings in a
+SQLite virtual table (`vec0`) and performs vector distance queries in the
+database. This keeps search fast without loading every embedding into JS.
+
+Configuration (optional):
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      store: {
+        vector: {
+          enabled: true,
+          extensionPath: "/path/to/sqlite-vec"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+- `enabled` defaults to true; when disabled, search falls back to in-process
+  cosine similarity over stored embeddings.
+- If the sqlite-vec extension is missing or fails to load, Moltbot logs the
+  error and continues with the JS fallback (no vector table).
+- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
+  or non-standard install locations).
+
+### Local embedding auto-download
+
+- Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB).
+- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
+- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
+- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
+
+### Custom OpenAI-compatible endpoint example
+
+```json5
+agents: {
+  defaults: {
+    memorySearch: {
+      provider: "openai",
+      model: "text-embedding-3-small",
+      remote: {
+        baseUrl: "https://api.example.com/v1/",
+        apiKey: "YOUR_REMOTE_API_KEY",
+        headers: {
+          "X-Organization": "org-id",
+          "X-Project": "project-id"
+        }
+      }
+    }
+  }
+}
+```
+
+Notes:
+- `remote.*` takes precedence over `models.providers.openai.*`.
+- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.
--- a/docker-compose/ez-assistant/docs/concepts/messages.md
+++ b/docker-compose/ez-assistant/docs/concepts/messages.md
@@ -0,0 +1,143 @@
+---
+summary: "Message flow, sessions, queueing, and reasoning visibility"
+read_when:
+  - Explaining how inbound messages become replies
+  - Clarifying sessions, queueing modes, or streaming behavior
+  - Documenting reasoning visibility and usage implications
+---
+# Messages
+
+This page ties together how Moltbot handles inbound messages, sessions, queueing,
+streaming, and reasoning visibility.
+
+## Message flow (high level)
+
+```
+Inbound message
+  -> routing/bindings -> session key
+  -> queue (if a run is active)
+  -> agent run (streaming + tools)
+  -> outbound replies (channel limits + chunking)
+```
+
+Key knobs live in configuration:
+- `messages.*` for prefixes, queueing, and group behavior.
+- `agents.defaults.*` for block streaming and chunking defaults.
+- Channel overrides (`channels.whatsapp.*`, `channels.telegram.*`, etc.) for caps and streaming toggles.
+
+See [Configuration](/gateway/configuration) for full schema.
+
+## Inbound dedupe
+
+Channels can redeliver the same message after reconnects. Moltbot keeps a
+short-lived cache keyed by channel/account/peer/session/message id so duplicate
+deliveries do not trigger another agent run.
+
+## Inbound debouncing
+
+Rapid consecutive messages from the **same sender** can be batched into a single
+agent turn via `messages.inbound`. Debouncing is scoped per channel + conversation
+and uses the most recent message for reply threading/IDs.
+
+Config (global default + per-channel overrides):
+```json5
+{
+  messages: {
+    inbound: {
+      debounceMs: 2000,
+      byChannel: {
+        whatsapp: 5000,
+        slack: 1500,
+        discord: 1500
+      }
+    }
+  }
+}
+```
+
+Notes:
+- Debounce applies to **text-only** messages; media/attachments flush immediately.
+- Control commands bypass debouncing so they remain standalone.
+
+## Sessions and devices
+
+Sessions are owned by the gateway, not by clients.
+- Direct chats collapse into the agent main session key.
+- Groups/channels get their own session keys.
+- The session store and transcripts live on the gateway host.
+
+Multiple devices/channels can map to the same session, but history is not fully
+synced back to every client. Recommendation: use one primary device for long
+conversations to avoid divergent context. The Control UI and TUI always show the
+gateway-backed session transcript, so they are the source of truth.
+
+Details: [Session management](/concepts/session).
+
+## Inbound bodies and history context
+
+Moltbot separates the **prompt body** from the **command body**:
+- `Body`: prompt text sent to the agent. This may include channel envelopes and
+  optional history wrappers.
+- `CommandBody`: raw user text for directive/command parsing.
+- `RawBody`: legacy alias for `CommandBody` (kept for compatibility).
+
+When a channel supplies history, it uses a shared wrapper:
+- `[Chat messages since your last reply - for context]`
+- `[Current message - respond to this]`
+
+For **non-direct chats** (groups/channels/rooms), the **current message body** is prefixed with the
+sender label (same style used for history entries). This keeps real-time and queued/history
+messages consistent in the agent prompt.
+
+History buffers are **pending-only**: they include group messages that did *not*
+trigger a run (for example, mention-gated messages) and **exclude** messages
+already in the session transcript.
+
+Directive stripping only applies to the **current message** section so history
+remains intact. Channels that wrap history should set `CommandBody` (or
+`RawBody`) to the original message text and keep `Body` as the combined prompt.
+History buffers are configurable via `messages.groupChat.historyLimit` (global
+default) and per-channel overrides like `channels.slack.historyLimit` or
+`channels.telegram.accounts.<id>.historyLimit` (set `0` to disable).
+
+## Queueing and followups
+
+If a run is already active, inbound messages can be queued, steered into the
+current run, or collected for a followup turn.
+
+- Configure via `messages.queue` (and `messages.queue.byChannel`).
+- Modes: `interrupt`, `steer`, `followup`, `collect`, plus backlog variants.
+
+Details: [Queueing](/concepts/queue).
+
+## Streaming, chunking, and batching
+
+Block streaming sends partial replies as the model produces text blocks.
+Chunking respects channel text limits and avoids splitting fenced code.
+
+Key settings:
+- `agents.defaults.blockStreamingDefault` (`on|off`, default off)
+- `agents.defaults.blockStreamingBreak` (`text_end|message_end`)
+- `agents.defaults.blockStreamingChunk` (`minChars|maxChars|breakPreference`)
+- `agents.defaults.blockStreamingCoalesce` (idle-based batching)
+- `agents.defaults.humanDelay` (human-like pause between block replies)
+- Channel overrides: `*.blockStreaming` and `*.blockStreamingCoalesce` (non-Telegram channels require explicit `*.blockStreaming: true`)
+
+Details: [Streaming + chunking](/concepts/streaming).
+
+## Reasoning visibility and tokens
+
+Moltbot can expose or hide model reasoning:
+- `/reasoning on|off|stream` controls visibility.
+- Reasoning content still counts toward token usage when produced by the model.
+- Telegram supports reasoning stream into the draft bubble.
+
+Details: [Thinking + reasoning directives](/tools/thinking) and [Token use](/token-use).
+
+## Prefixes, threading, and replies
+
+Outbound message formatting is centralized in `messages`:
+- `messages.responsePrefix` (outbound prefix) and `channels.whatsapp.messagePrefix` (WhatsApp inbound prefix)
+- Reply threading via `replyToMode` and per-channel defaults
+
+Details: [Configuration](/gateway/configuration#messages) and channel docs.
--- a/docker-compose/ez-assistant/docs/concepts/model-failover.md
+++ b/docker-compose/ez-assistant/docs/concepts/model-failover.md
@@ -0,0 +1,139 @@
+---
+summary: "How Moltbot rotates auth profiles and falls back across models"
+read_when:
+  - Diagnosing auth profile rotation, cooldowns, or model fallback behavior
+  - Updating failover rules for auth profiles or models
+---
+
+# Model failover
+
+Moltbot handles failures in two stages:
+1) **Auth profile rotation** within the current provider.
+2) **Model fallback** to the next model in `agents.defaults.model.fallbacks`.
+
+This doc explains the runtime rules and the data that backs them.
+
+## Auth storage (keys + OAuth)
+
+Moltbot uses **auth profiles** for both API keys and OAuth tokens.
+
+- Secrets live in `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json` (legacy: `~/.clawdbot/agent/auth-profiles.json`).
+- Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets).
+- Legacy import-only OAuth file: `~/.clawdbot/credentials/oauth.json` (imported into `auth-profiles.json` on first use).
+
+More detail: [/concepts/oauth](/concepts/oauth)
+
+Credential types:
+- `type: "api_key"` → `{ provider, key }`
+- `type: "oauth"` → `{ provider, access, refresh, expires, email? }` (+ `projectId`/`enterpriseUrl` for some providers)
+
+## Profile IDs
+
+OAuth logins create distinct profiles so multiple accounts can coexist.
+- Default: `provider:default` when no email is available.
+- OAuth with email: `provider:<email>` (for example `google-antigravity:user@gmail.com`).
+
+Profiles live in `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json` under `profiles`.
+
+## Rotation order
+
+When a provider has multiple profiles, Moltbot chooses an order like this:
+
+1) **Explicit config**: `auth.order[provider]` (if set).
+2) **Configured profiles**: `auth.profiles` filtered by provider.
+3) **Stored profiles**: entries in `auth-profiles.json` for the provider.
+
+If no explicit order is configured, Moltbot uses a round‑robin order:
+- **Primary key:** profile type (**OAuth before API keys**).
+- **Secondary key:** `usageStats.lastUsed` (oldest first, within each type).
+- **Cooldown/disabled profiles** are moved to the end, ordered by soonest expiry.
+
+### Session stickiness (cache-friendly)
+
+Moltbot **pins the chosen auth profile per session** to keep provider caches warm.
+It does **not** rotate on every request. The pinned profile is reused until:
+- the session is reset (`/new` / `/reset`)
+- a compaction completes (compaction count increments)
+- the profile is in cooldown/disabled
+
+Manual selection via `/model …@<profileId>` sets a **user override** for that session
+and is not auto‑rotated until a new session starts.
+
+Auto‑pinned profiles (selected by the session router) are treated as a **preference**:
+they are tried first, but Moltbot may rotate to another profile on rate limits/timeouts.
+User‑pinned profiles stay locked to that profile; if it fails and model fallbacks
+are configured, Moltbot moves to the next model instead of switching profiles.
+
+### Why OAuth can “look lost”
+
+If you have both an OAuth profile and an API key profile for the same provider, round‑robin can switch between them across messages unless pinned. To force a single profile:
+- Pin with `auth.order[provider] = ["provider:profileId"]`, or
+- Use a per-session override via `/model …` with a profile override (when supported by your UI/chat surface).
+
+## Cooldowns
+
+When a profile fails due to auth/rate‑limit errors (or a timeout that looks
+like rate limiting), Moltbot marks it in cooldown and moves to the next profile.
+Format/invalid‑request errors (for example Cloud Code Assist tool call ID
+validation failures) are treated as failover‑worthy and use the same cooldowns.
+
+Cooldowns use exponential backoff:
+- 1 minute
+- 5 minutes
+- 25 minutes
+- 1 hour (cap)
+
+State is stored in `auth-profiles.json` under `usageStats`:
+
+```json
+{
+  "usageStats": {
+    "provider:profile": {
+      "lastUsed": 1736160000000,
+      "cooldownUntil": 1736160600000,
+      "errorCount": 2
+    }
+  }
+}
+```
+
+## Billing disables
+
+Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failover‑worthy, but they’re usually not transient. Instead of a short cooldown, Moltbot marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
+
+State is stored in `auth-profiles.json`:
+
+```json
+{
+  "usageStats": {
+    "provider:profile": {
+      "disabledUntil": 1736178000000,
+      "disabledReason": "billing"
+    }
+  }
+}
+```
+
+Defaults:
+- Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**.
+- Backoff counters reset if the profile hasn’t failed for **24 hours** (configurable).
+
+## Model fallback
+
+If all profiles for a provider fail, Moltbot moves to the next model in
+`agents.defaults.model.fallbacks`. This applies to auth failures, rate limits, and
+timeouts that exhausted profile rotation (other errors do not advance fallback).
+
+When a run starts with a model override (hooks or CLI), fallbacks still end at
+`agents.defaults.model.primary` after trying any configured fallbacks.
+
+## Related config
+
+See [Gateway configuration](/gateway/configuration) for:
+- `auth.profiles` / `auth.order`
+- `auth.cooldowns.billingBackoffHours` / `auth.cooldowns.billingBackoffHoursByProvider`
+- `auth.cooldowns.billingMaxHours` / `auth.cooldowns.failureWindowHours`
+- `agents.defaults.model.primary` / `agents.defaults.model.fallbacks`
+- `agents.defaults.imageModel` routing
+
+See [Models](/concepts/models) for the broader model selection and fallback overview.
--- a/docker-compose/ez-assistant/docs/concepts/model-providers.md
+++ b/docker-compose/ez-assistant/docs/concepts/model-providers.md
@@ -0,0 +1,318 @@
+---
+summary: "Model provider overview with example configs + CLI flows"
+read_when:
+  - You need a provider-by-provider model setup reference
+  - You want example configs or CLI onboarding commands for model providers
+---
+# Model providers
+
+This page covers **LLM/model providers** (not chat channels like WhatsApp/Telegram).
+For model selection rules, see [/concepts/models](/concepts/models).
+
+## Quick rules
+
+- Model refs use `provider/model` (example: `opencode/claude-opus-4-5`).
+- If you set `agents.defaults.models`, it becomes the allowlist.
+- CLI helpers: `moltbot onboard`, `moltbot models list`, `moltbot models set <provider/model>`.
+
+## Built-in providers (pi-ai catalog)
+
+Moltbot ships with the pi‑ai catalog. These providers require **no**
+`models.providers` config; just set auth + pick a model.
+
+### OpenAI
+
+- Provider: `openai`
+- Auth: `OPENAI_API_KEY`
+- Example model: `openai/gpt-5.2`
+- CLI: `moltbot onboard --auth-choice openai-api-key`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "openai/gpt-5.2" } } }
+}
+```
+
+### Anthropic
+
+- Provider: `anthropic`
+- Auth: `ANTHROPIC_API_KEY` or `claude setup-token`
+- Example model: `anthropic/claude-opus-4-5`
+- CLI: `moltbot onboard --auth-choice token` (paste setup-token) or `moltbot models auth paste-token --provider anthropic`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "anthropic/claude-opus-4-5" } } }
+}
+```
+
+### OpenAI Code (Codex)
+
+- Provider: `openai-codex`
+- Auth: OAuth (ChatGPT)
+- Example model: `openai-codex/gpt-5.2`
+- CLI: `moltbot onboard --auth-choice openai-codex` or `moltbot models auth login --provider openai-codex`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "openai-codex/gpt-5.2" } } }
+}
+```
+
+### OpenCode Zen
+
+- Provider: `opencode`
+- Auth: `OPENCODE_API_KEY` (or `OPENCODE_ZEN_API_KEY`)
+- Example model: `opencode/claude-opus-4-5`
+- CLI: `moltbot onboard --auth-choice opencode-zen`
+
+```json5
+{
+  agents: { defaults: { model: { primary: "opencode/claude-opus-4-5" } } }
+}
+```
+
+### Google Gemini (API key)
+
+- Provider: `google`
+- Auth: `GEMINI_API_KEY`
+- Example model: `google/gemini-3-pro-preview`
+- CLI: `moltbot onboard --auth-choice gemini-api-key`
+
+### Google Vertex / Antigravity / Gemini CLI
+
+- Providers: `google-vertex`, `google-antigravity`, `google-gemini-cli`
+- Auth: Vertex uses gcloud ADC; Antigravity/Gemini CLI use their respective auth flows
+- Antigravity OAuth is shipped as a bundled plugin (`google-antigravity-auth`, disabled by default).
+  - Enable: `moltbot plugins enable google-antigravity-auth`
+  - Login: `moltbot models auth login --provider google-antigravity --set-default`
+- Gemini CLI OAuth is shipped as a bundled plugin (`google-gemini-cli-auth`, disabled by default).
+  - Enable: `moltbot plugins enable google-gemini-cli-auth`
+  - Login: `moltbot models auth login --provider google-gemini-cli --set-default`
+  - Note: you do **not** paste a client id or secret into `moltbot.json`. The CLI login flow stores
+    tokens in auth profiles on the gateway host.
+
+### Z.AI (GLM)
+
+- Provider: `zai`
+- Auth: `ZAI_API_KEY`
+- Example model: `zai/glm-4.7`
+- CLI: `moltbot onboard --auth-choice zai-api-key`
+  - Aliases: `z.ai/*` and `z-ai/*` normalize to `zai/*`
+
+### Vercel AI Gateway
+
+- Provider: `vercel-ai-gateway`
+- Auth: `AI_GATEWAY_API_KEY`
+- Example model: `vercel-ai-gateway/anthropic/claude-opus-4.5`
+- CLI: `moltbot onboard --auth-choice ai-gateway-api-key`
+
+### Other built-in providers
+
+- OpenRouter: `openrouter` (`OPENROUTER_API_KEY`)
+- Example model: `openrouter/anthropic/claude-sonnet-4-5`
+- xAI: `xai` (`XAI_API_KEY`)
+- Groq: `groq` (`GROQ_API_KEY`)
+- Cerebras: `cerebras` (`CEREBRAS_API_KEY`)
+  - GLM models on Cerebras use ids `zai-glm-4.7` and `zai-glm-4.6`.
+  - OpenAI-compatible base URL: `https://api.cerebras.ai/v1`.
+- Mistral: `mistral` (`MISTRAL_API_KEY`)
+- GitHub Copilot: `github-copilot` (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` / `GITHUB_TOKEN`)
+
+## Providers via `models.providers` (custom/base URL)
+
+Use `models.providers` (or `models.json`) to add **custom** providers or
+OpenAI/Anthropic‑compatible proxies.
+
+### Moonshot AI (Kimi)
+
+Moonshot uses OpenAI-compatible endpoints, so configure it as a custom provider:
+
+- Provider: `moonshot`
+- Auth: `MOONSHOT_API_KEY`
+- Example model: `moonshot/kimi-k2-0905-preview`
+- Kimi K2 model IDs:
+  {/* moonshot-kimi-k2-model-refs:start */}
+  - `moonshot/kimi-k2-0905-preview`
+  - `moonshot/kimi-k2-turbo-preview`
+  - `moonshot/kimi-k2-thinking`
+  - `moonshot/kimi-k2-thinking-turbo`
+  {/* moonshot-kimi-k2-model-refs:end */}
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "moonshot/kimi-k2-0905-preview" } }
+  },
+  models: {
+    mode: "merge",
+    providers: {
+      moonshot: {
+        baseUrl: "https://api.moonshot.ai/v1",
+        apiKey: "${MOONSHOT_API_KEY}",
+        api: "openai-completions",
+        models: [{ id: "kimi-k2-0905-preview", name: "Kimi K2 0905 Preview" }]
+      }
+    }
+  }
+}
+```
+
+### Kimi Code
+
+Kimi Code uses a dedicated endpoint and key (separate from Moonshot):
+
+- Provider: `kimi-code`
+- Auth: `KIMICODE_API_KEY`
+- Example model: `kimi-code/kimi-for-coding`
+
+```json5
+{
+  env: { KIMICODE_API_KEY: "sk-..." },
+  agents: {
+    defaults: { model: { primary: "kimi-code/kimi-for-coding" } }
+  },
+  models: {
+    mode: "merge",
+    providers: {
+      "kimi-code": {
+        baseUrl: "https://api.kimi.com/coding/v1",
+        apiKey: "${KIMICODE_API_KEY}",
+        api: "openai-completions",
+        models: [{ id: "kimi-for-coding", name: "Kimi For Coding" }]
+      }
+    }
+  }
+}
+```
+
+### Qwen OAuth (free tier)
+
+Qwen provides OAuth access to Qwen Coder + Vision via a device-code flow.
+Enable the bundled plugin, then log in:
+
+```bash
+moltbot plugins enable qwen-portal-auth
+moltbot models auth login --provider qwen-portal --set-default
+```
+
+Model refs:
+- `qwen-portal/coder-model`
+- `qwen-portal/vision-model`
+
+See [/providers/qwen](/providers/qwen) for setup details and notes.
+
+### Synthetic
+
+Synthetic provides Anthropic-compatible models behind the `synthetic` provider:
+
+- Provider: `synthetic`
+- Auth: `SYNTHETIC_API_KEY`
+- Example model: `synthetic/hf:MiniMaxAI/MiniMax-M2.1`
+- CLI: `moltbot onboard --auth-choice synthetic-api-key`
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "synthetic/hf:MiniMaxAI/MiniMax-M2.1" } }
+  },
+  models: {
+    mode: "merge",
+    providers: {
+      synthetic: {
+        baseUrl: "https://api.synthetic.new/anthropic",
+        apiKey: "${SYNTHETIC_API_KEY}",
+        api: "anthropic-messages",
+        models: [{ id: "hf:MiniMaxAI/MiniMax-M2.1", name: "MiniMax M2.1" }]
+      }
+    }
+  }
+}
+```
+
+### MiniMax
+
+MiniMax is configured via `models.providers` because it uses custom endpoints:
+
+- MiniMax (Anthropic‑compatible): `--auth-choice minimax-api`
+- Auth: `MINIMAX_API_KEY`
+
+See [/providers/minimax](/providers/minimax) for setup details, model options, and config snippets.
+
+### Ollama
+
+Ollama is a local LLM runtime that provides an OpenAI-compatible API:
+
+- Provider: `ollama`
+- Auth: None required (local server)
+- Example model: `ollama/llama3.3`
+- Installation: https://ollama.ai
+
+```bash
+# Install Ollama, then pull a model:
+ollama pull llama3.3
+```
+
+```json5
+{
+  agents: {
+    defaults: { model: { primary: "ollama/llama3.3" } }
+  }
+}
+```
+
+Ollama is automatically detected when running locally at `http://127.0.0.1:11434/v1`. See [/providers/ollama](/providers/ollama) for model recommendations and custom configuration.
+
+### Local proxies (LM Studio, vLLM, LiteLLM, etc.)
+
+Example (OpenAI‑compatible):
+
+```json5
+{
+  agents: {
+    defaults: {
+      model: { primary: "lmstudio/minimax-m2.1-gs32" },
+      models: { "lmstudio/minimax-m2.1-gs32": { alias: "Minimax" } }
+    }
+  },
+  models: {
+    providers: {
+      lmstudio: {
+        baseUrl: "http://localhost:1234/v1",
+        apiKey: "LMSTUDIO_KEY",
+        api: "openai-completions",
+        models: [
+          {
+            id: "minimax-m2.1-gs32",
+            name: "MiniMax M2.1",
+            reasoning: false,
+            input: ["text"],
+            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+            contextWindow: 200000,
+            maxTokens: 8192
+          }
+        ]
+      }
+    }
+  }
+}
+```
+
+Notes:
+- For custom providers, `reasoning`, `input`, `cost`, `contextWindow`, and `maxTokens` are optional.
+  When omitted, Moltbot defaults to:
+  - `reasoning: false`
+  - `input: ["text"]`
+  - `cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }`
+  - `contextWindow: 200000`
+  - `maxTokens: 8192`
+- Recommended: set explicit values that match your proxy/model limits.
+
+## CLI examples
+
+```bash
+moltbot onboard --auth-choice opencode-zen
+moltbot models set opencode/claude-opus-4-5
+moltbot models list
+```
+
+See also: [/gateway/configuration](/gateway/configuration) for full configuration examples.
--- a/docker-compose/ez-assistant/docs/concepts/models.md
+++ b/docker-compose/ez-assistant/docs/concepts/models.md
@@ -0,0 +1,202 @@
+---
+summary: "Models CLI: list, set, aliases, fallbacks, scan, status"
+read_when:
+  - Adding or modifying models CLI (models list/set/scan/aliases/fallbacks)
+  - Changing model fallback behavior or selection UX
+  - Updating model scan probes (tools/images)
+---
+# Models CLI
+
+See [/concepts/model-failover](/concepts/model-failover) for auth profile
+rotation, cooldowns, and how that interacts with fallbacks.
+Quick provider overview + examples: [/concepts/model-providers](/concepts/model-providers).
+
+## How model selection works
+
+Moltbot selects models in this order:
+
+1) **Primary** model (`agents.defaults.model.primary` or `agents.defaults.model`).
+2) **Fallbacks** in `agents.defaults.model.fallbacks` (in order).
+3) **Provider auth failover** happens inside a provider before moving to the
+   next model.
+
+Related:
+- `agents.defaults.models` is the allowlist/catalog of models Moltbot can use (plus aliases).
+- `agents.defaults.imageModel` is used **only when** the primary model can’t accept images.
+- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
+
+## Quick model picks (anecdotal)
+
+- **GLM**: a bit better for coding/tool calling.
+- **MiniMax**: better for writing and vibes.
+
+## Setup wizard (recommended)
+
+If you don’t want to hand-edit config, run the onboarding wizard:
+
+```bash
+moltbot onboard
+```
+
+It can set up model + auth for common providers, including **OpenAI Code (Codex)
+subscription** (OAuth) and **Anthropic** (API key recommended; `claude
+setup-token` also supported).
+
+## Config keys (overview)
+
+- `agents.defaults.model.primary` and `agents.defaults.model.fallbacks`
+- `agents.defaults.imageModel.primary` and `agents.defaults.imageModel.fallbacks`
+- `agents.defaults.models` (allowlist + aliases + provider params)
+- `models.providers` (custom providers written into `models.json`)
+
+Model refs are normalized to lowercase. Provider aliases like `z.ai/*` normalize
+to `zai/*`.
+
+Provider configuration examples (including OpenCode Zen) live in
+[/gateway/configuration](/gateway/configuration#opencode-zen-multi-model-proxy).
+
+## “Model is not allowed” (and why replies stop)
+
+If `agents.defaults.models` is set, it becomes the **allowlist** for `/model` and for
+session overrides. When a user selects a model that isn’t in that allowlist,
+Moltbot returns:
+
+```
+Model "provider/model" is not allowed. Use /model to list available models.
+```
+
+This happens **before** a normal reply is generated, so the message can feel
+like it “didn’t respond.” The fix is to either:
+
+- Add the model to `agents.defaults.models`, or
+- Clear the allowlist (remove `agents.defaults.models`), or
+- Pick a model from `/model list`.
+
+Example allowlist config:
+
+```json5
+{
+  agent: {
+    model: { primary: "anthropic/claude-sonnet-4-5" },
+    models: {
+      "anthropic/claude-sonnet-4-5": { alias: "Sonnet" },
+      "anthropic/claude-opus-4-5": { alias: "Opus" }
+    }
+  }
+}
+```
+
+## Switching models in chat (`/model`)
+
+You can switch models for the current session without restarting:
+
+```
+/model
+/model list
+/model 3
+/model openai/gpt-5.2
+/model status
+```
+
+Notes:
+- `/model` (and `/model list`) is a compact, numbered picker (model family + available providers).
+- `/model <#>` selects from that picker.
+- `/model status` is the detailed view (auth candidates and, when configured, provider endpoint `baseUrl` + `api` mode).
+- Model refs are parsed by splitting on the **first** `/`. Use `provider/model` when typing `/model <ref>`.
+- If the model ID itself contains `/` (OpenRouter-style), you must include the provider prefix (example: `/model openrouter/moonshotai/kimi-k2`).
+- If you omit the provider, Moltbot treats the input as an alias or a model for the **default provider** (only works when there is no `/` in the model ID).
+
+Full command behavior/config: [Slash commands](/tools/slash-commands).
+
+## CLI commands
+
+```bash
+moltbot models list
+moltbot models status
+moltbot models set <provider/model>
+moltbot models set-image <provider/model>
+
+moltbot models aliases list
+moltbot models aliases add <alias> <provider/model>
+moltbot models aliases remove <alias>
+
+moltbot models fallbacks list
+moltbot models fallbacks add <provider/model>
+moltbot models fallbacks remove <provider/model>
+moltbot models fallbacks clear
+
+moltbot models image-fallbacks list
+moltbot models image-fallbacks add <provider/model>
+moltbot models image-fallbacks remove <provider/model>
+moltbot models image-fallbacks clear
+```
+
+`moltbot models` (no subcommand) is a shortcut for `models status`.
+
+### `models list`
+
+Shows configured models by default. Useful flags:
+
+- `--all`: full catalog
+- `--local`: local providers only
+- `--provider <name>`: filter by provider
+- `--plain`: one model per line
+- `--json`: machine‑readable output
+
+### `models status`
+
+Shows the resolved primary model, fallbacks, image model, and an auth overview
+of configured providers. It also surfaces OAuth expiry status for profiles found
+in the auth store (warns within 24h by default). `--plain` prints only the
+resolved primary model.
+OAuth status is always shown (and included in `--json` output). If a configured
+provider has no credentials, `models status` prints a **Missing auth** section.
+JSON includes `auth.oauth` (warn window + profiles) and `auth.providers`
+(effective auth per provider).
+Use `--check` for automation (exit `1` when missing/expired, `2` when expiring).
+
+Preferred Anthropic auth is the Claude Code CLI setup-token (run anywhere; paste on the gateway host if needed):
+
+```bash
+claude setup-token
+moltbot models status
+```
+
+## Scanning (OpenRouter free models)
+
+`moltbot models scan` inspects OpenRouter’s **free model catalog** and can
+optionally probe models for tool and image support.
+
+Key flags:
+
+- `--no-probe`: skip live probes (metadata only)
+- `--min-params <b>`: minimum parameter size (billions)
+- `--max-age-days <days>`: skip older models
+- `--provider <name>`: provider prefix filter
+- `--max-candidates <n>`: fallback list size
+- `--set-default`: set `agents.defaults.model.primary` to the first selection
+- `--set-image`: set `agents.defaults.imageModel.primary` to the first image selection
+
+Probing requires an OpenRouter API key (from auth profiles or
+`OPENROUTER_API_KEY`). Without a key, use `--no-probe` to list candidates only.
+
+Scan results are ranked by:
+1) Image support
+2) Tool latency
+3) Context size
+4) Parameter count
+
+Input
+- OpenRouter `/models` list (filter `:free`)
+- Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY` (see [/environment](/environment))
+- Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates`
+- Probe controls: `--timeout`, `--concurrency`
+
+When run in a TTY, you can select fallbacks interactively. In non‑interactive
+mode, pass `--yes` to accept defaults.
+
+## Models registry (`models.json`)
+
+Custom providers in `models.providers` are written into `models.json` under the
+agent directory (default `~/.clawdbot/agents/<agentId>/models.json`). This file
+is merged by default unless `models.mode` is set to `replace`.
--- a/docker-compose/ez-assistant/docs/concepts/multi-agent.md
+++ b/docker-compose/ez-assistant/docs/concepts/multi-agent.md
@@ -0,0 +1,354 @@
+---
+summary: "Multi-agent routing: isolated agents, channel accounts, and bindings"
+title: Multi-Agent Routing
+read_when: "You want multiple isolated agents (workspaces + auth) in one gateway process."
+status: active
+---
+
+# Multi-Agent Routing
+
+Goal: multiple *isolated* agents (separate workspace + `agentDir` + sessions), plus multiple channel accounts (e.g. two WhatsApps) in one running Gateway. Inbound is routed to an agent via bindings.
+
+## What is “one agent”?
+
+An **agent** is a fully scoped brain with its own:
+
+- **Workspace** (files, AGENTS.md/SOUL.md/USER.md, local notes, persona rules).
+- **State directory** (`agentDir`) for auth profiles, model registry, and per-agent config.
+- **Session store** (chat history + routing state) under `~/.clawdbot/agents/<agentId>/sessions`.
+
+Auth profiles are **per-agent**. Each agent reads from its own:
+
+```
+~/.clawdbot/agents/<agentId>/agent/auth-profiles.json
+```
+
+Main agent credentials are **not** shared automatically. Never reuse `agentDir`
+across agents (it causes auth/session collisions). If you want to share creds,
+copy `auth-profiles.json` into the other agent's `agentDir`.
+
+Skills are per-agent via each workspace’s `skills/` folder, with shared skills
+available from `~/.clawdbot/skills`. See [Skills: per-agent vs shared](/tools/skills#per-agent-vs-shared-skills).
+
+The Gateway can host **one agent** (default) or **many agents** side-by-side.
+
+**Workspace note:** each agent’s workspace is the **default cwd**, not a hard
+sandbox. Relative paths resolve inside the workspace, but absolute paths can
+reach other host locations unless sandboxing is enabled. See
+[Sandboxing](/gateway/sandboxing).
+
+## Paths (quick map)
+
+- Config: `~/.clawdbot/moltbot.json` (or `CLAWDBOT_CONFIG_PATH`)
+- State dir: `~/.clawdbot` (or `CLAWDBOT_STATE_DIR`)
+- Workspace: `~/clawd` (or `~/clawd-<agentId>`)
+- Agent dir: `~/.clawdbot/agents/<agentId>/agent` (or `agents.list[].agentDir`)
+- Sessions: `~/.clawdbot/agents/<agentId>/sessions`
+
+### Single-agent mode (default)
+
+If you do nothing, Moltbot runs a single agent:
+
+- `agentId` defaults to **`main`**.
+- Sessions are keyed as `agent:main:<mainKey>`.
+- Workspace defaults to `~/clawd` (or `~/clawd-<profile>` when `CLAWDBOT_PROFILE` is set).
+- State defaults to `~/.clawdbot/agents/main/agent`.
+
+## Agent helper
+
+Use the agent wizard to add a new isolated agent:
+
+```bash
+moltbot agents add work
+```
+
+Then add `bindings` (or let the wizard do it) to route inbound messages.
+
+Verify with:
+
+```bash
+moltbot agents list --bindings
+```
+
+## Multiple agents = multiple people, multiple personalities
+
+With **multiple agents**, each `agentId` becomes a **fully isolated persona**:
+
+- **Different phone numbers/accounts** (per channel `accountId`).
+- **Different personalities** (per-agent workspace files like `AGENTS.md` and `SOUL.md`).
+- **Separate auth + sessions** (no cross-talk unless explicitly enabled).
+
+This lets **multiple people** share one Gateway server while keeping their AI “brains” and data isolated.
+
+## One WhatsApp number, multiple people (DM split)
+
+You can route **different WhatsApp DMs** to different agents while staying on **one WhatsApp account**. Match on sender E.164 (like `+15551234567`) with `peer.kind: "dm"`. Replies still come from the same WhatsApp number (no per‑agent sender identity).
+
+Important detail: direct chats collapse to the agent’s **main session key**, so true isolation requires **one agent per person**.
+
+Example:
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "alex", workspace: "~/clawd-alex" },
+      { id: "mia", workspace: "~/clawd-mia" }
+    ]
+  },
+  bindings: [
+    { agentId: "alex", match: { channel: "whatsapp", peer: { kind: "dm", id: "+15551230001" } } },
+    { agentId: "mia",  match: { channel: "whatsapp", peer: { kind: "dm", id: "+15551230002" } } }
+  ],
+  channels: {
+    whatsapp: {
+      dmPolicy: "allowlist",
+      allowFrom: ["+15551230001", "+15551230002"]
+    }
+  }
+}
+```
+
+Notes:
+- DM access control is **global per WhatsApp account** (pairing/allowlist), not per agent.
+- For shared groups, bind the group to one agent or use [Broadcast groups](/broadcast-groups).
+
+## Routing rules (how messages pick an agent)
+
+Bindings are **deterministic** and **most-specific wins**:
+
+1. `peer` match (exact DM/group/channel id)
+2. `guildId` (Discord)
+3. `teamId` (Slack)
+4. `accountId` match for a channel
+5. channel-level match (`accountId: "*"`)
+6. fallback to default agent (`agents.list[].default`, else first list entry, default: `main`)
+
+## Multiple accounts / phone numbers
+
+Channels that support **multiple accounts** (e.g. WhatsApp) use `accountId` to identify
+each login. Each `accountId` can be routed to a different agent, so one server can host
+multiple phone numbers without mixing sessions.
+
+## Concepts
+
+- `agentId`: one “brain” (workspace, per-agent auth, per-agent session store).
+- `accountId`: one channel account instance (e.g. WhatsApp account `"personal"` vs `"biz"`).
+- `binding`: routes inbound messages to an `agentId` by `(channel, accountId, peer)` and optionally guild/team ids.
+- Direct chats collapse to `agent:<agentId>:<mainKey>` (per-agent “main”; `session.mainKey`).
+
+## Example: two WhatsApps → two agents
+
+`~/.clawdbot/moltbot.json` (JSON5):
+
+```js
+{
+  agents: {
+    list: [
+      {
+        id: "home",
+        default: true,
+        name: "Home",
+        workspace: "~/clawd-home",
+        agentDir: "~/.clawdbot/agents/home/agent",
+      },
+      {
+        id: "work",
+        name: "Work",
+        workspace: "~/clawd-work",
+        agentDir: "~/.clawdbot/agents/work/agent",
+      },
+    ],
+  },
+
+  // Deterministic routing: first match wins (most-specific first).
+  bindings: [
+    { agentId: "home", match: { channel: "whatsapp", accountId: "personal" } },
+    { agentId: "work", match: { channel: "whatsapp", accountId: "biz" } },
+
+    // Optional per-peer override (example: send a specific group to work agent).
+    {
+      agentId: "work",
+      match: {
+        channel: "whatsapp",
+        accountId: "personal",
+        peer: { kind: "group", id: "1203630...@g.us" },
+      },
+    },
+  ],
+
+  // Off by default: agent-to-agent messaging must be explicitly enabled + allowlisted.
+  tools: {
+    agentToAgent: {
+      enabled: false,
+      allow: ["home", "work"],
+    },
+  },
+
+  channels: {
+    whatsapp: {
+      accounts: {
+        personal: {
+          // Optional override. Default: ~/.clawdbot/credentials/whatsapp/personal
+          // authDir: "~/.clawdbot/credentials/whatsapp/personal",
+        },
+        biz: {
+          // Optional override. Default: ~/.clawdbot/credentials/whatsapp/biz
+          // authDir: "~/.clawdbot/credentials/whatsapp/biz",
+        },
+      },
+    },
+  },
+}
+```
+
+## Example: WhatsApp daily chat + Telegram deep work
+
+Split by channel: route WhatsApp to a fast everyday agent and Telegram to an Opus agent.
+
+```json5
+{
+  agents: {
+    list: [
+      {
+        id: "chat",
+        name: "Everyday",
+        workspace: "~/clawd-chat",
+        model: "anthropic/claude-sonnet-4-5"
+      },
+      {
+        id: "opus",
+        name: "Deep Work",
+        workspace: "~/clawd-opus",
+        model: "anthropic/claude-opus-4-5"
+      }
+    ]
+  },
+  bindings: [
+    { agentId: "chat", match: { channel: "whatsapp" } },
+    { agentId: "opus", match: { channel: "telegram" } }
+  ]
+}
+```
+
+Notes:
+- If you have multiple accounts for a channel, add `accountId` to the binding (for example `{ channel: "whatsapp", accountId: "personal" }`).
+- To route a single DM/group to Opus while keeping the rest on chat, add a `match.peer` binding for that peer; peer matches always win over channel-wide rules.
+
+## Example: same channel, one peer to Opus
+
+Keep WhatsApp on the fast agent, but route one DM to Opus:
+
+```json5
+{
+  agents: {
+    list: [
+      { id: "chat", name: "Everyday", workspace: "~/clawd-chat", model: "anthropic/claude-sonnet-4-5" },
+      { id: "opus", name: "Deep Work", workspace: "~/clawd-opus", model: "anthropic/claude-opus-4-5" }
+    ]
+  },
+  bindings: [
+    { agentId: "opus", match: { channel: "whatsapp", peer: { kind: "dm", id: "+15551234567" } } },
+    { agentId: "chat", match: { channel: "whatsapp" } }
+  ]
+}
+```
+
+Peer bindings always win, so keep them above the channel-wide rule.
+
+## Family agent bound to a WhatsApp group
+
+Bind a dedicated family agent to a single WhatsApp group, with mention gating
+and a tighter tool policy:
+
+```json5
+{
+  agents: {
+    list: [
+      {
+        id: "family",
+        name: "Family",
+        workspace: "~/clawd-family",
+        identity: { name: "Family Bot" },
+        groupChat: {
+          mentionPatterns: ["@family", "@familybot", "@Family Bot"]
+        },
+        sandbox: {
+          mode: "all",
+          scope: "agent"
+        },
+        tools: {
+          allow: ["exec", "read", "sessions_list", "sessions_history", "sessions_send", "sessions_spawn", "session_status"],
+          deny: ["write", "edit", "apply_patch", "browser", "canvas", "nodes", "cron"]
+        }
+      }
+    ]
+  },
+  bindings: [
+    {
+      agentId: "family",
+      match: {
+        channel: "whatsapp",
+        peer: { kind: "group", id: "120363999999999999@g.us" }
+      }
+    }
+  ]
+}
+```
+
+Notes:
+- Tool allow/deny lists are **tools**, not skills. If a skill needs to run a
+  binary, ensure `exec` is allowed and the binary exists in the sandbox.
+- For stricter gating, set `agents.list[].groupChat.mentionPatterns` and keep
+  group allowlists enabled for the channel.
+
+## Per-Agent Sandbox and Tool Configuration
+
+Starting with v2026.1.6, each agent can have its own sandbox and tool restrictions:
+
+```js
+{
+  agents: {
+    list: [
+      {
+        id: "personal",
+        workspace: "~/clawd-personal",
+        sandbox: {
+          mode: "off",  // No sandbox for personal agent
+        },
+        // No tool restrictions - all tools available
+      },
+      {
+        id: "family",
+        workspace: "~/clawd-family",
+        sandbox: {
+          mode: "all",     // Always sandboxed
+          scope: "agent",  // One container per agent
+          docker: {
+            // Optional one-time setup after container creation
+            setupCommand: "apt-get update && apt-get install -y git curl",
+          },
+        },
+        tools: {
+          allow: ["read"],                    // Only read tool
+          deny: ["exec", "write", "edit", "apply_patch"],    // Deny others
+        },
+      },
+    ],
+  },
+}
+```
+
+Note: `setupCommand` lives under `sandbox.docker` and runs once on container creation.
+Per-agent `sandbox.docker.*` overrides are ignored when the resolved scope is `"shared"`.
+
+**Benefits:**
+- **Security isolation**: Restrict tools for untrusted agents
+- **Resource control**: Sandbox specific agents while keeping others on host
+- **Flexible policies**: Different permissions per agent
+
+Note: `tools.elevated` is **global** and sender-based; it is not configurable per agent.
+If you need per-agent boundaries, use `agents.list[].tools` to deny `exec`.
+For group targeting, use `agents.list[].groupChat.mentionPatterns` so @mentions map cleanly to the intended agent.
+
+See [Multi-Agent Sandbox & Tools](/multi-agent-sandbox-tools) for detailed examples.
--- a/docker-compose/ez-assistant/docs/concepts/oauth.md
+++ b/docker-compose/ez-assistant/docs/concepts/oauth.md
@@ -0,0 +1,135 @@
+---
+summary: "OAuth in Moltbot: token exchange, storage, and multi-account patterns"
+read_when:
+  - You want to understand Moltbot OAuth end-to-end
+  - You hit token invalidation / logout issues
+  - You want setup-token or OAuth auth flows
+  - You want multiple accounts or profile routing
+---
+# OAuth
+
+Moltbot supports “subscription auth” via OAuth for providers that offer it (notably **OpenAI Codex (ChatGPT OAuth)**). For Anthropic subscriptions, use the **setup-token** flow. This page explains:
+
+- how the OAuth **token exchange** works (PKCE)
+- where tokens are **stored** (and why)
+- how to handle **multiple accounts** (profiles + per-session overrides)
+
+Moltbot also supports **provider plugins** that ship their own OAuth or API‑key
+flows. Run them via:
+
+```bash
+moltbot models auth login --provider <id>
+```
+
+## The token sink (why it exists)
+
+OAuth providers commonly mint a **new refresh token** during login/refresh flows. Some providers (or OAuth clients) can invalidate older refresh tokens when a new one is issued for the same user/app.
+
+Practical symptom:
+- you log in via Moltbot *and* via Claude Code / Codex CLI → one of them randomly gets “logged out” later
+
+To reduce that, Moltbot treats `auth-profiles.json` as a **token sink**:
+- the runtime reads credentials from **one place**
+- we can keep multiple profiles and route them deterministically
+
+## Storage (where tokens live)
+
+Secrets are stored **per-agent**:
+
+- Auth profiles (OAuth + API keys): `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`
+- Runtime cache (managed automatically; don’t edit): `~/.clawdbot/agents/<agentId>/agent/auth.json`
+
+Legacy import-only file (still supported, but not the main store):
+- `~/.clawdbot/credentials/oauth.json` (imported into `auth-profiles.json` on first use)
+
+All of the above also respect `$CLAWDBOT_STATE_DIR` (state dir override). Full reference: [/gateway/configuration](/gateway/configuration#auth-storage-oauth--api-keys)
+
+## Anthropic setup-token (subscription auth)
+
+Run `claude setup-token` on any machine, then paste it into Moltbot:
+
+```bash
+moltbot models auth setup-token --provider anthropic
+```
+
+If you generated the token elsewhere, paste it manually:
+
+```bash
+moltbot models auth paste-token --provider anthropic
+```
+
+Verify:
+
+```bash
+moltbot models status
+```
+
+## OAuth exchange (how login works)
+
+Moltbot’s interactive login flows are implemented in `@mariozechner/pi-ai` and wired into the wizards/commands.
+
+### Anthropic (Claude Pro/Max) setup-token
+
+Flow shape:
+
+1) run `claude setup-token`
+2) paste the token into Moltbot
+3) store as a token auth profile (no refresh)
+
+The wizard path is `moltbot onboard` → auth choice `setup-token` (Anthropic).
+
+### OpenAI Codex (ChatGPT OAuth)
+
+Flow shape (PKCE):
+
+1) generate PKCE verifier/challenge + random `state`
+2) open `https://auth.openai.com/oauth/authorize?...`
+3) try to capture callback on `http://127.0.0.1:1455/auth/callback`
+4) if callback can’t bind (or you’re remote/headless), paste the redirect URL/code
+5) exchange at `https://auth.openai.com/oauth/token`
+6) extract `accountId` from the access token and store `{ access, refresh, expires, accountId }`
+
+Wizard path is `moltbot onboard` → auth choice `openai-codex`.
+
+## Refresh + expiry
+
+Profiles store an `expires` timestamp.
+
+At runtime:
+- if `expires` is in the future → use the stored access token
+- if expired → refresh (under a file lock) and overwrite the stored credentials
+
+The refresh flow is automatic; you generally don't need to manage tokens manually.
+
+## Multiple accounts (profiles) + routing
+
+Two patterns:
+
+### 1) Preferred: separate agents
+
+If you want “personal” and “work” to never interact, use isolated agents (separate sessions + credentials + workspace):
+
+```bash
+moltbot agents add work
+moltbot agents add personal
+```
+
+Then configure auth per-agent (wizard) and route chats to the right agent.
+
+### 2) Advanced: multiple profiles in one agent
+
+`auth-profiles.json` supports multiple profile IDs for the same provider.
+
+Pick which profile is used:
+- globally via config ordering (`auth.order`)
+- per-session via `/model ...@<profileId>`
+
+Example (session override):
+- `/model Opus@anthropic:work`
+
+How to see what profile IDs exist:
+- `moltbot channels list --json` (shows `auth[]`)
+
+Related docs:
+- [/concepts/model-failover](/concepts/model-failover) (rotation + cooldown rules)
+- [/tools/slash-commands](/tools/slash-commands) (command surface)
--- a/docker-compose/ez-assistant/docs/concepts/presence.md
+++ b/docker-compose/ez-assistant/docs/concepts/presence.md
@@ -0,0 +1,98 @@
+---
+summary: "How Moltbot presence entries are produced, merged, and displayed"
+read_when:
+  - Debugging the Instances tab
+  - Investigating duplicate or stale instance rows
+  - Changing gateway WS connect or system-event beacons
+---
+# Presence
+
+Moltbot “presence” is a lightweight, best‑effort view of:
+- the **Gateway** itself, and
+- **clients connected to the Gateway** (mac app, WebChat, CLI, etc.)
+
+Presence is used primarily to render the macOS app’s **Instances** tab and to
+provide quick operator visibility.
+
+## Presence fields (what shows up)
+
+Presence entries are structured objects with fields like:
+
+- `instanceId` (optional but strongly recommended): stable client identity (usually `connect.client.instanceId`)
+- `host`: human‑friendly host name
+- `ip`: best‑effort IP address
+- `version`: client version string
+- `deviceFamily` / `modelIdentifier`: hardware hints
+- `mode`: `ui`, `webchat`, `cli`, `backend`, `probe`, `test`, `node`, ...
+- `lastInputSeconds`: “seconds since last user input” (if known)
+- `reason`: `self`, `connect`, `node-connected`, `periodic`, ...
+- `ts`: last update timestamp (ms since epoch)
+
+## Producers (where presence comes from)
+
+Presence entries are produced by multiple sources and **merged**.
+
+### 1) Gateway self entry
+
+The Gateway always seeds a “self” entry at startup so UIs show the gateway host
+even before any clients connect.
+
+### 2) WebSocket connect
+
+Every WS client begins with a `connect` request. On successful handshake the
+Gateway upserts a presence entry for that connection.
+
+#### Why one‑off CLI commands don’t show up
+
+The CLI often connects for short, one‑off commands. To avoid spamming the
+Instances list, `client.mode === "cli"` is **not** turned into a presence entry.
+
+### 3) `system-event` beacons
+
+Clients can send richer periodic beacons via the `system-event` method. The mac
+app uses this to report host name, IP, and `lastInputSeconds`.
+
+### 4) Node connects (role: node)
+When a node connects over the Gateway WebSocket with `role: node`, the Gateway
+upserts a presence entry for that node (same flow as other WS clients).
+
+## Merge + dedupe rules (why `instanceId` matters)
+
+Presence entries are stored in a single in‑memory map:
+
+- Entries are keyed by a **presence key**.
+- The best key is a stable `instanceId` (from `connect.client.instanceId`) that survives restarts.
+- Keys are case‑insensitive.
+
+If a client reconnects without a stable `instanceId`, it may show up as a
+**duplicate** row.
+
+## TTL and bounded size
+
+Presence is intentionally ephemeral:
+
+- **TTL:** entries older than 5 minutes are pruned
+- **Max entries:** 200 (oldest dropped first)
+
+This keeps the list fresh and avoids unbounded memory growth.
+
+## Remote/tunnel caveat (loopback IPs)
+
+When a client connects over an SSH tunnel / local port forward, the Gateway may
+see the remote address as `127.0.0.1`. To avoid overwriting a good client‑reported
+IP, loopback remote addresses are ignored.
+
+## Consumers
+
+### macOS Instances tab
+
+The macOS app renders the output of `system-presence` and applies a small status
+indicator (Active/Idle/Stale) based on the age of the last update.
+
+## Debugging tips
+
+- To see the raw list, call `system-presence` against the Gateway.
+- If you see duplicates:
+  - confirm clients send a stable `client.instanceId` in the handshake
+  - confirm periodic beacons use the same `instanceId`
+  - check whether the connection‑derived entry is missing `instanceId` (duplicates are expected)
--- a/docker-compose/ez-assistant/docs/concepts/queue.md
+++ b/docker-compose/ez-assistant/docs/concepts/queue.md
@@ -0,0 +1,77 @@
+---
+summary: "Command queue design that serializes inbound auto-reply runs"
+read_when:
+  - Changing auto-reply execution or concurrency
+---
+# Command Queue (2026-01-16)
+
+We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.
+
+## Why
+- Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
+- Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.
+
+## How it works
+- A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
+- `runEmbeddedPiAgent` enqueues by **session key** (lane `session:<key>`) to guarantee only one active run per session.
+- Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agents.defaults.maxConcurrent`.
+- When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
+- Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.
+
+## Queue modes (per channel)
+Inbound messages can steer the current run, wait for a followup turn, or do both:
+- `steer`: inject immediately into the current run (cancels pending tool calls after the next tool boundary). If not streaming, falls back to followup.
+- `followup`: enqueue for the next agent turn after the current run ends.
+- `collect`: coalesce all queued messages into a **single** followup turn (default). If messages target different channels/threads, they drain individually to preserve routing.
+- `steer-backlog` (aka `steer+backlog`): steer now **and** preserve the message for a followup turn.
+- `interrupt` (legacy): abort the active run for that session, then run the newest message.
+- `queue` (legacy alias): same as `steer`.
+
+Steer-backlog means you can get a followup response after the steered run, so
+streaming surfaces can look like duplicates. Prefer `collect`/`steer` if you want
+one response per inbound message.
+Send `/queue collect` as a standalone command (per-session) or set `messages.queue.byChannel.discord: "collect"`.
+
+Defaults (when unset in config):
+- All surfaces → `collect`
+
+Configure globally or per channel via `messages.queue`:
+
+```json5
+{
+  messages: {
+    queue: {
+      mode: "collect",
+      debounceMs: 1000,
+      cap: 20,
+      drop: "summarize",
+      byChannel: { discord: "collect" }
+    }
+  }
+}
+```
+
+## Queue options
+Options apply to `followup`, `collect`, and `steer-backlog` (and to `steer` when it falls back to followup):
+- `debounceMs`: wait for quiet before starting a followup turn (prevents “continue, continue”).
+- `cap`: max queued messages per session.
+- `drop`: overflow policy (`old`, `new`, `summarize`).
+
+Summarize keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt.
+Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`.
+
+## Per-session overrides
+- Send `/queue <mode>` as a standalone command to store the mode for the current session.
+- Options can be combined: `/queue collect debounce:2s cap:25 drop:summarize`
+- `/queue default` or `/queue reset` clears the session override.
+
+## Scope and guarantees
+- Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
+- Default lane (`main`) is process-wide for inbound + main heartbeats; set `agents.defaults.maxConcurrent` to allow multiple sessions in parallel.
+- Additional lanes may exist (e.g. `cron`, `subagent`) so background jobs can run in parallel without blocking inbound replies.
+- Per-session lanes guarantee that only one agent run touches a given session at a time.
+- No external dependencies or background worker threads; pure TypeScript + promises.
+
+## Troubleshooting
+- If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
+- If you need queue depth, enable verbose logs and watch for queue timing lines.
--- a/docker-compose/ez-assistant/docs/concepts/retry.md
+++ b/docker-compose/ez-assistant/docs/concepts/retry.md
@@ -0,0 +1,60 @@
+---
+summary: "Retry policy for outbound provider calls"
+read_when:
+  - Updating provider retry behavior or defaults
+  - Debugging provider send errors or rate limits
+---
+# Retry policy
+
+## Goals
+- Retry per HTTP request, not per multi-step flow.
+- Preserve ordering by retrying only the current step.
+- Avoid duplicating non-idempotent operations.
+
+## Defaults
+- Attempts: 3
+- Max delay cap: 30000 ms
+- Jitter: 0.1 (10 percent)
+- Provider defaults:
+  - Telegram min delay: 400 ms
+  - Discord min delay: 500 ms
+
+## Behavior
+### Discord
+- Retries only on rate-limit errors (HTTP 429).
+- Uses Discord `retry_after` when available, otherwise exponential backoff.
+
+### Telegram
+- Retries on transient errors (429, timeout, connect/reset/closed, temporarily unavailable).
+- Uses `retry_after` when available, otherwise exponential backoff.
+- Markdown parse errors are not retried; they fall back to plain text.
+
+## Configuration
+Set retry policy per provider in `~/.clawdbot/moltbot.json`:
+
+```json5
+{
+  channels: {
+    telegram: {
+      retry: {
+        attempts: 3,
+        minDelayMs: 400,
+        maxDelayMs: 30000,
+        jitter: 0.1
+      }
+    },
+    discord: {
+      retry: {
+        attempts: 3,
+        minDelayMs: 500,
+        maxDelayMs: 30000,
+        jitter: 0.1
+      }
+    }
+  }
+}
+```
+
+## Notes
+- Retries apply per request (message send, media upload, reaction, poll, sticker).
+- Composite flows do not retry completed steps.
--- a/docker-compose/ez-assistant/docs/concepts/session-pruning.md
+++ b/docker-compose/ez-assistant/docs/concepts/session-pruning.md
@@ -0,0 +1,104 @@
+---
+summary: "Session pruning: tool-result trimming to reduce context bloat"
+read_when:
+  - You want to reduce LLM context growth from tool outputs
+  - You are tuning agents.defaults.contextPruning
+---
+# Session Pruning
+
+Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).
+
+## When it runs
+- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
+- Only affects the messages sent to the model for that request.
+ - Only active for Anthropic API calls (and OpenRouter Anthropic models).
+ - For best results, match `ttl` to your model `cacheControlTtl`.
+ - After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again.
+
+## Smart defaults (Anthropic)
+- **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`.
+- **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheControlTtl` to `1h` on Anthropic models.
+- If you set any of these values explicitly, Moltbot does **not** override them.
+
+## What this improves (cost + cache behavior)
+- **Why prune:** Anthropic prompt caching only applies within the TTL. If a session goes idle past the TTL, the next request re-caches the full prompt unless you trim it first.
+- **What gets cheaper:** pruning reduces the **cacheWrite** size for that first request after the TTL expires.
+- **Why the TTL reset matters:** once pruning runs, the cache window resets, so follow‑up requests can reuse the freshly cached prompt instead of re-caching the full history again.
+- **What it does not do:** pruning doesn’t add tokens or “double” costs; it only changes what gets cached on that first post‑TTL request.
+
+## What can be pruned
+- Only `toolResult` messages.
+- User + assistant messages are **never** modified.
+- The last `keepLastAssistants` assistant messages are protected; tool results after that cutoff are not pruned.
+- If there aren’t enough assistant messages to establish the cutoff, pruning is skipped.
+- Tool results containing **image blocks** are skipped (never trimmed/cleared).
+
+## Context window estimation
+Pruning uses an estimated context window (chars ≈ tokens × 4). The window size is resolved in this order:
+1) Model definition `contextWindow` (from the model registry).
+2) `models.providers.*.models[].contextWindow` override.
+3) `agents.defaults.contextTokens`.
+4) Default `200000` tokens.
+
+## Mode
+### cache-ttl
+- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
+- When it runs: same soft-trim + hard-clear behavior as before.
+
+## Soft vs hard pruning
+- **Soft-trim**: only for oversized tool results.
+  - Keeps head + tail, inserts `...`, and appends a note with the original size.
+  - Skips results with image blocks.
+- **Hard-clear**: replaces the entire tool result with `hardClear.placeholder`.
+
+## Tool selection
+- `tools.allow` / `tools.deny` support `*` wildcards.
+- Deny wins.
+- Matching is case-insensitive.
+- Empty allow list => all tools allowed.
+
+## Interaction with other limits
+- Built-in tools already truncate their own output; session pruning is an extra layer that prevents long-running chats from accumulating too much tool output in the model context.
+- Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).
+
+## Defaults (when enabled)
+- `ttl`: `"5m"`
+- `keepLastAssistants`: `3`
+- `softTrimRatio`: `0.3`
+- `hardClearRatio`: `0.5`
+- `minPrunableToolChars`: `50000`
+- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }`
+- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
+
+## Examples
+Default (off):
+```json5
+{
+  agent: {
+    contextPruning: { mode: "off" }
+  }
+}
+```
+
+Enable TTL-aware pruning:
+```json5
+{
+  agent: {
+    contextPruning: { mode: "cache-ttl", ttl: "5m" }
+  }
+}
+```
+
+Restrict pruning to specific tools:
+```json5
+{
+  agent: {
+    contextPruning: {
+      mode: "cache-ttl",
+      tools: { allow: ["exec", "read"], deny: ["*image*"] }
+    }
+  }
+}
+```
+
+See config reference: [Gateway Configuration](/gateway/configuration)
--- a/docker-compose/ez-assistant/docs/concepts/session-tool.md
+++ b/docker-compose/ez-assistant/docs/concepts/session-tool.md
@@ -0,0 +1,171 @@
+---
+summary: "Agent session tools for listing sessions, fetching history, and sending cross-session messages"
+read_when:
+  - Adding or modifying session tools
+---
+
+# Session Tools
+
+Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history, and send to another session.
+
+## Tool Names
+- `sessions_list`
+- `sessions_history`
+- `sessions_send`
+- `sessions_spawn`
+
+## Key Model
+- Main direct chat bucket is always the literal key `"main"` (resolved to the current agent’s main key).
+- Group chats use `agent:<agentId>:<channel>:group:<id>` or `agent:<agentId>:<channel>:channel:<id>` (pass the full key).
+- Cron jobs use `cron:<job.id>`.
+- Hooks use `hook:<uuid>` unless explicitly set.
+- Node sessions use `node-<nodeId>` unless explicitly set.
+
+`global` and `unknown` are reserved values and are never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
+
+## sessions_list
+List sessions as an array of rows.
+
+Parameters:
+- `kinds?: string[]` filter: any of `"main" | "group" | "cron" | "hook" | "node" | "other"`
+- `limit?: number` max rows (default: server default, clamp e.g. 200)
+- `activeMinutes?: number` only sessions updated within N minutes
+- `messageLimit?: number` 0 = no messages (default 0); >0 = include last N messages
+
+Behavior:
+- `messageLimit > 0` fetches `chat.history` per session and includes the last N messages.
+- Tool results are filtered out in list output; use `sessions_history` for tool messages.
+- When running in a **sandboxed** agent session, session tools default to **spawned-only visibility** (see below).
+
+Row shape (JSON):
+- `key`: session key (string)
+- `kind`: `main | group | cron | hook | node | other`
+- `channel`: `whatsapp | telegram | discord | signal | imessage | webchat | internal | unknown`
+- `displayName` (group display label if available)
+- `updatedAt` (ms)
+- `sessionId`
+- `model`, `contextTokens`, `totalTokens`
+- `thinkingLevel`, `verboseLevel`, `systemSent`, `abortedLastRun`
+- `sendPolicy` (session override if set)
+- `lastChannel`, `lastTo`
+- `deliveryContext` (normalized `{ channel, to, accountId }` when available)
+- `transcriptPath` (best-effort path derived from store dir + sessionId)
+- `messages?` (only when `messageLimit > 0`)
+
+## sessions_history
+Fetch transcript for one session.
+
+Parameters:
+- `sessionKey` (required; accepts session key or `sessionId` from `sessions_list`)
+- `limit?: number` max messages (server clamps)
+- `includeTools?: boolean` (default false)
+
+Behavior:
+- `includeTools=false` filters `role: "toolResult"` messages.
+- Returns messages array in the raw transcript format.
+- When given a `sessionId`, Moltbot resolves it to the corresponding session key (missing ids error).
+
+## sessions_send
+Send a message into another session.
+
+Parameters:
+- `sessionKey` (required; accepts session key or `sessionId` from `sessions_list`)
+- `message` (required)
+- `timeoutSeconds?: number` (default >0; 0 = fire-and-forget)
+
+Behavior:
+- `timeoutSeconds = 0`: enqueue and return `{ runId, status: "accepted" }`.
+- `timeoutSeconds > 0`: wait up to N seconds for completion, then return `{ runId, status: "ok", reply }`.
+- If wait times out: `{ runId, status: "timeout", error }`. Run continues; call `sessions_history` later.
+- If the run fails: `{ runId, status: "error", error }`.
+- Announce delivery runs after the primary run completes and is best-effort; `status: "ok"` does not guarantee the announce was delivered.
+- Waits via gateway `agent.wait` (server-side) so reconnects don't drop the wait.
+- Agent-to-agent message context is injected for the primary run.
+- After the primary run completes, Moltbot runs a **reply-back loop**:
+  - Round 2+ alternates between requester and target agents.
+  - Reply exactly `REPLY_SKIP` to stop the ping‑pong.
+  - Max turns is `session.agentToAgent.maxPingPongTurns` (0–5, default 5).
+- Once the loop ends, Moltbot runs the **agent‑to‑agent announce step** (target agent only):
+  - Reply exactly `ANNOUNCE_SKIP` to stay silent.
+  - Any other reply is sent to the target channel.
+  - Announce step includes the original request + round‑1 reply + latest ping‑pong reply.
+
+## Channel Field
+- For groups, `channel` is the channel recorded on the session entry.
+- For direct chats, `channel` maps from `lastChannel`.
+- For cron/hook/node, `channel` is `internal`.
+- If missing, `channel` is `unknown`.
+
+## Security / Send Policy
+Policy-based blocking by channel/chat type (not per session id).
+
+```json
+{
+  "session": {
+    "sendPolicy": {
+      "rules": [
+        {
+          "match": { "channel": "discord", "chatType": "group" },
+          "action": "deny"
+        }
+      ],
+      "default": "allow"
+    }
+  }
+}
+```
+
+Runtime override (per session entry):
+- `sendPolicy: "allow" | "deny"` (unset = inherit config)
+- Settable via `sessions.patch` or owner-only `/send on|off|inherit` (standalone message).
+
+Enforcement points:
+- `chat.send` / `agent` (gateway)
+- auto-reply delivery logic
+
+## sessions_spawn
+Spawn a sub-agent run in an isolated session and announce the result back to the requester chat channel.
+
+Parameters:
+- `task` (required)
+- `label?` (optional; used for logs/UI)
+- `agentId?` (optional; spawn under another agent id if allowed)
+- `model?` (optional; overrides the sub-agent model; invalid values error)
+- `runTimeoutSeconds?` (default 0; when set, aborts the sub-agent run after N seconds)
+- `cleanup?` (`delete|keep`, default `keep`)
+
+Allowlist:
+- `agents.list[].subagents.allowAgents`: list of agent ids allowed via `agentId` (`["*"]` to allow any). Default: only the requester agent.
+
+Discovery:
+- Use `agents_list` to discover which agent ids are allowed for `sessions_spawn`.
+
+Behavior:
+- Starts a new `agent:<agentId>:subagent:<uuid>` session with `deliver: false`.
+- Sub-agents default to the full tool set **minus session tools** (configurable via `tools.subagents.tools`).
+- Sub-agents are not allowed to call `sessions_spawn` (no sub-agent → sub-agent spawning).
+- Always non-blocking: returns `{ status: "accepted", runId, childSessionKey }` immediately.
+- After completion, Moltbot runs a sub-agent **announce step** and posts the result to the requester chat channel.
+- Reply exactly `ANNOUNCE_SKIP` during the announce step to stay silent.
+- Announce replies are normalized to `Status`/`Result`/`Notes`; `Status` comes from runtime outcome (not model text).
+- Sub-agent sessions are auto-archived after `agents.defaults.subagents.archiveAfterMinutes` (default: 60).
+- Announce replies include a stats line (runtime, tokens, sessionKey/sessionId, transcript path, and optional cost).
+
+## Sandbox Session Visibility
+
+Sandboxed sessions can use session tools, but by default they only see sessions they spawned via `sessions_spawn`.
+
+Config:
+
+```json5
+{
+  agents: {
+    defaults: {
+      sandbox: {
+        // default: "spawned"
+        sessionToolsVisibility: "spawned" // or "all"
+      }
+    }
+  }
+}
+```
--- a/docker-compose/ez-assistant/docs/concepts/session.md
+++ b/docker-compose/ez-assistant/docs/concepts/session.md
@@ -0,0 +1,150 @@
+---
+summary: "Session management rules, keys, and persistence for chats"
+read_when:
+  - Modifying session handling or storage
+---
+# Session Management
+
+Moltbot treats **one direct-chat session per agent** as primary. Direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`), while group/channel chats get their own keys. `session.mainKey` is honored.
+
+Use `session.dmScope` to control how **direct messages** are grouped:
+- `main` (default): all DMs share the main session for continuity.
+- `per-peer`: isolate by sender id across channels.
+- `per-channel-peer`: isolate by channel + sender (recommended for multi-user inboxes).
+Use `session.identityLinks` to map provider-prefixed peer ids to a canonical identity so the same person shares a DM session across channels when using `per-peer` or `per-channel-peer`.
+
+## Gateway is the source of truth
+All session state is **owned by the gateway** (the “master” Moltbot). UI clients (macOS app, WebChat, etc.) must query the gateway for session lists and token counts instead of reading local files.
+
+- In **remote mode**, the session store you care about lives on the remote gateway host, not your Mac.
+- Token counts shown in UIs come from the gateway’s store fields (`inputTokens`, `outputTokens`, `totalTokens`, `contextTokens`). Clients do not parse JSONL transcripts to “fix up” totals.
+
+## Where state lives
+- On the **gateway host**:
+  - Store file: `~/.clawdbot/agents/<agentId>/sessions/sessions.json` (per agent).
+- Transcripts: `~/.clawdbot/agents/<agentId>/sessions/<SessionId>.jsonl` (Telegram topic sessions use `.../<SessionId>-topic-<threadId>.jsonl`).
+- The store is a map `sessionKey -> { sessionId, updatedAt, ... }`. Deleting entries is safe; they are recreated on demand.
+- Group entries may include `displayName`, `channel`, `subject`, `room`, and `space` to label sessions in UIs.
+- Session entries include `origin` metadata (label + routing hints) so UIs can explain where a session came from.
+- Moltbot does **not** read legacy Pi/Tau session folders.
+
+## Session pruning
+Moltbot trims **old tool results** from the in-memory context right before LLM calls by default.
+This does **not** rewrite JSONL history. See [/concepts/session-pruning](/concepts/session-pruning).
+
+## Pre-compaction memory flush
+When a session nears auto-compaction, Moltbot can run a **silent memory flush**
+turn that reminds the model to write durable notes to disk. This only runs when
+the workspace is writable. See [Memory](/concepts/memory) and
+[Compaction](/concepts/compaction).
+
+## Mapping transports → session keys
+- Direct chats follow `session.dmScope` (default `main`).
+  - `main`: `agent:<agentId>:<mainKey>` (continuity across devices/channels).
+    - Multiple phone numbers and channels can map to the same agent main key; they act as transports into one conversation.
+  - `per-peer`: `agent:<agentId>:dm:<peerId>`.
+  - `per-channel-peer`: `agent:<agentId>:<channel>:dm:<peerId>`.
+  - If `session.identityLinks` matches a provider-prefixed peer id (for example `telegram:123`), the canonical key replaces `<peerId>` so the same person shares a session across channels.
+- Group chats isolate state: `agent:<agentId>:<channel>:group:<id>` (rooms/channels use `agent:<agentId>:<channel>:channel:<id>`).
+  - Telegram forum topics append `:topic:<threadId>` to the group id for isolation.
+  - Legacy `group:<id>` keys are still recognized for migration.
+- Inbound contexts may still use `group:<id>`; the channel is inferred from `Provider` and normalized to the canonical `agent:<agentId>:<channel>:group:<id>` form.
+- Other sources:
+  - Cron jobs: `cron:<job.id>`
+  - Webhooks: `hook:<uuid>` (unless explicitly set by the hook)
+  - Node runs: `node-<nodeId>`
+
+## Lifecycle
+- Reset policy: sessions are reused until they expire, and expiry is evaluated on the next inbound message.
+- Daily reset: defaults to **4:00 AM local time on the gateway host**. A session is stale once its last update is earlier than the most recent daily reset time.
+- Idle reset (optional): `idleMinutes` adds a sliding idle window. When both daily and idle resets are configured, **whichever expires first** forces a new session.
+- Legacy idle-only: if you set `session.idleMinutes` without any `session.reset`/`resetByType` config, Moltbot stays in idle-only mode for backward compatibility.
+- Per-type overrides (optional): `resetByType` lets you override the policy for `dm`, `group`, and `thread` sessions (thread = Slack/Discord threads, Telegram topics, Matrix threads when provided by the connector).
+- Per-channel overrides (optional): `resetByChannel` overrides the reset policy for a channel (applies to all session types for that channel and takes precedence over `reset`/`resetByType`).
+- Reset triggers: exact `/new` or `/reset` (plus any extras in `resetTriggers`) start a fresh session id and pass the remainder of the message through. `/new <model>` accepts a model alias, `provider/model`, or provider name (fuzzy match) to set the new session model. If `/new` or `/reset` is sent alone, Moltbot runs a short “hello” greeting turn to confirm the reset.
+- Manual reset: delete specific keys from the store or remove the JSONL transcript; the next message recreates them.
+- Isolated cron jobs always mint a fresh `sessionId` per run (no idle reuse).
+
+## Send policy (optional)
+Block delivery for specific session types without listing individual ids.
+
+```json5
+{
+  session: {
+    sendPolicy: {
+      rules: [
+        { action: "deny", match: { channel: "discord", chatType: "group" } },
+        { action: "deny", match: { keyPrefix: "cron:" } }
+      ],
+      default: "allow"
+    }
+  }
+}
+```
+
+Runtime override (owner only):
+- `/send on` → allow for this session
+- `/send off` → deny for this session
+- `/send inherit` → clear override and use config rules
+Send these as standalone messages so they register.
+
+## Configuration (optional rename example)
+```json5
+// ~/.clawdbot/moltbot.json
+{
+  session: {
+    scope: "per-sender",      // keep group keys separate
+    dmScope: "main",          // DM continuity (set per-channel-peer for shared inboxes)
+    identityLinks: {
+      alice: ["telegram:123456789", "discord:987654321012345678"]
+    },
+    reset: {
+      // Defaults: mode=daily, atHour=4 (gateway host local time).
+      // If you also set idleMinutes, whichever expires first wins.
+      mode: "daily",
+      atHour: 4,
+      idleMinutes: 120
+    },
+    resetByType: {
+      thread: { mode: "daily", atHour: 4 },
+      dm: { mode: "idle", idleMinutes: 240 },
+      group: { mode: "idle", idleMinutes: 120 }
+    },
+    resetByChannel: {
+      discord: { mode: "idle", idleMinutes: 10080 }
+    },
+    resetTriggers: ["/new", "/reset"],
+    store: "~/.clawdbot/agents/{agentId}/sessions/sessions.json",
+    mainKey: "main",
+  }
+}
+```
+
+## Inspecting
+- `moltbot status` — shows store path and recent sessions.
+- `moltbot sessions --json` — dumps every entry (filter with `--active <minutes>`).
+- `moltbot gateway call sessions.list --params '{}'` — fetch sessions from the running gateway (use `--url`/`--token` for remote gateway access).
+- Send `/status` as a standalone message in chat to see whether the agent is reachable, how much of the session context is used, current thinking/verbose toggles, and when your WhatsApp web creds were last refreshed (helps spot relink needs).
+- Send `/context list` or `/context detail` to see what’s in the system prompt and injected workspace files (and the biggest context contributors).
+- Send `/stop` as a standalone message to abort the current run, clear queued followups for that session, and stop any sub-agent runs spawned from it (the reply includes the stopped count).
+- Send `/compact` (optional instructions) as a standalone message to summarize older context and free up window space. See [/concepts/compaction](/concepts/compaction).
+- JSONL transcripts can be opened directly to review full turns.
+
+## Tips
+- Keep the primary key dedicated to 1:1 traffic; let groups keep their own keys.
+- When automating cleanup, delete individual keys instead of the whole store to preserve context elsewhere.
+
+## Session origin metadata
+Each session entry records where it came from (best-effort) in `origin`:
+- `label`: human label (resolved from conversation label + group subject/channel)
+- `provider`: normalized channel id (including extensions)
+- `from`/`to`: raw routing ids from the inbound envelope
+- `accountId`: provider account id (when multi-account)
+- `threadId`: thread/topic id when the channel supports it
+The origin fields are populated for direct messages, channels, and groups. If a
+connector only updates delivery routing (for example, to keep a DM main session
+fresh), it should still provide inbound context so the session keeps its
+explainer metadata. Extensions can do this by sending `ConversationLabel`,
+`GroupSubject`, `GroupChannel`, `GroupSpace`, and `SenderName` in the inbound
+context and calling `recordSessionMetaFromInbound` (or passing the same context
+to `updateLastRoute`).
--- a/docker-compose/ez-assistant/docs/concepts/sessions.md
+++ b/docker-compose/ez-assistant/docs/concepts/sessions.md
@@ -0,0 +1,8 @@
+---
+summary: "Alias for session management docs"
+read_when:
+  - You looked for docs/sessions.md; canonical doc lives in docs/session.md
+---
+# Sessions
+
+Canonical session management docs live in [Session management](/concepts/session).
--- a/docker-compose/ez-assistant/docs/concepts/streaming.md
+++ b/docker-compose/ez-assistant/docs/concepts/streaming.md
@@ -0,0 +1,123 @@
+---
+summary: "Streaming + chunking behavior (block replies, draft streaming, limits)"
+read_when:
+  - Explaining how streaming or chunking works on channels
+  - Changing block streaming or channel chunking behavior
+  - Debugging duplicate/early block replies or draft streaming
+---
+# Streaming + chunking
+
+Moltbot has two separate “streaming” layers:
+- **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).
+- **Token-ish streaming (Telegram only):** update a **draft bubble** with partial text while generating; final message is sent at the end.
+
+There is **no real token streaming** to external channel messages today. Telegram draft streaming is the only partial-stream surface.
+
+## Block streaming (channel messages)
+
+Block streaming sends assistant output in coarse chunks as it becomes available.
+
+```
+Model output
+  └─ text_delta/events
+       ├─ (blockStreamingBreak=text_end)
+       │    └─ chunker emits blocks as buffer grows
+       └─ (blockStreamingBreak=message_end)
+            └─ chunker flushes at message_end
+                   └─ channel send (block replies)
+```
+Legend:
+- `text_delta/events`: model stream events (may be sparse for non-streaming models).
+- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
+- `channel send`: actual outbound messages (block replies).
+
+**Controls:**
+- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
+- Channel overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per channel.
+- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
+- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
+- `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).
+- Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).
+- Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).
+- Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.
+
+**Boundary semantics:**
+- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
+- `message_end`: wait until assistant message finishes, then flush buffered output.
+
+`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
+
+## Chunking algorithm (low/high bounds)
+
+Block chunking is implemented by `EmbeddedBlockChunker`:
+- **Low bound:** don’t emit until buffer >= `minChars` (unless forced).
+- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
+- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.
+- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.
+
+`maxChars` is clamped to the channel `textChunkLimit`, so you can’t exceed per-channel caps.
+
+## Coalescing (merge streamed blocks)
+
+When block streaming is enabled, Moltbot can **merge consecutive block chunks**
+before sending them out. This reduces “single-line spam” while still providing
+progressive output.
+
+- Coalescing waits for **idle gaps** (`idleMs`) before flushing.
+- Buffers are capped by `maxChars` and will flush if they exceed it.
+- `minChars` prevents tiny fragments from sending until enough text accumulates
+  (final flush always sends remaining text).
+- Joiner is derived from `blockStreamingChunk.breakPreference`
+  (`paragraph` → `\n\n`, `newline` → `\n`, `sentence` → space).
+- Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
+- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.
+
+## Human-like pacing between blocks
+
+When block streaming is enabled, you can add a **randomized pause** between
+block replies (after the first block). This makes multi-bubble responses feel
+more natural.
+
+- Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).
+- Modes: `off` (default), `natural` (800–2500ms), `custom` (`minMs`/`maxMs`).
+- Applies only to **block replies**, not final replies or tool summaries.
+
+## “Stream chunks or everything”
+
+This maps to:
+- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.
+- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
+- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
+
+**Channel note:** For non-Telegram channels, block streaming is **off unless**
+`*.blockStreaming` is explicitly set to `true`. Telegram can stream drafts
+(`channels.telegram.streamMode`) without block replies.
+
+Config location reminder: the `blockStreaming*` defaults live under
+`agents.defaults`, not the root config.
+
+## Telegram draft streaming (token-ish)
+
+Telegram is the only channel with draft streaming:
+- Uses Bot API `sendMessageDraft` in **private chats with topics**.
+- `channels.telegram.streamMode: "partial" | "block" | "off"`.
+  - `partial`: draft updates with the latest stream text.
+  - `block`: draft updates in chunked blocks (same chunker rules).
+  - `off`: no draft streaming.
+- Draft chunk config (only for `streamMode: "block"`): `channels.telegram.draftChunk` (defaults: `minChars: 200`, `maxChars: 800`).
+- Draft streaming is separate from block streaming; block replies are off by default and only enabled by `*.blockStreaming: true` on non-Telegram channels.
+- Final reply is still a normal message.
+- `/reasoning stream` writes reasoning into the draft bubble (Telegram only).
+
+When draft streaming is active, Moltbot disables block streaming for that reply to avoid double-streaming.
+
+```
+Telegram (private + topics)
+  └─ sendMessageDraft (draft bubble)
+       ├─ streamMode=partial → update latest text
+       └─ streamMode=block   → chunker updates draft
+  └─ final reply → normal message
+```
+Legend:
+- `sendMessageDraft`: Telegram draft bubble (not a real message).
+- `final reply`: normal Telegram message send.
--- a/docker-compose/ez-assistant/docs/concepts/system-prompt.md
+++ b/docker-compose/ez-assistant/docs/concepts/system-prompt.md
@@ -0,0 +1,110 @@
+---
+summary: "What the Moltbot system prompt contains and how it is assembled"
+read_when:
+  - Editing system prompt text, tools list, or time/heartbeat sections
+  - Changing workspace bootstrap or skills injection behavior
+---
+# System Prompt
+
+Moltbot builds a custom system prompt for every agent run. The prompt is **Moltbot-owned** and does not use the p-coding-agent default prompt.
+
+The prompt is assembled by Moltbot and injected into each agent run.
+
+## Structure
+
+The prompt is intentionally compact and uses fixed sections:
+
+- **Tooling**: current tool list + short descriptions.
+- **Skills** (when available): tells the model how to load skill instructions on demand.
+- **Moltbot Self-Update**: how to run `config.apply` and `update.run`.
+- **Workspace**: working directory (`agents.defaults.workspace`).
+- **Documentation**: local path to Moltbot docs (repo or npm package) and when to read them.
+- **Workspace Files (injected)**: indicates bootstrap files are included below.
+- **Sandbox** (when enabled): indicates sandboxed runtime, sandbox paths, and whether elevated exec is available.
+- **Current Date & Time**: user-local time, timezone, and time format.
+- **Reply Tags**: optional reply tag syntax for supported providers.
+- **Heartbeats**: heartbeat prompt and ack behavior.
+- **Runtime**: host, OS, node, model, repo root (when detected), thinking level (one line).
+- **Reasoning**: current visibility level + /reasoning toggle hint.
+
+## Prompt modes
+
+Moltbot can render smaller system prompts for sub-agents. The runtime sets a
+`promptMode` for each run (not a user-facing config):
+
+- `full` (default): includes all sections above.
+- `minimal`: used for sub-agents; omits **Skills**, **Memory Recall**, **Moltbot
+  Self-Update**, **Model Aliases**, **User Identity**, **Reply Tags**,
+  **Messaging**, **Silent Replies**, and **Heartbeats**. Tooling, Workspace,
+  Sandbox, Current Date & Time (when known), Runtime, and injected context stay
+  available.
+- `none`: returns only the base identity line.
+
+When `promptMode=minimal`, extra injected prompts are labeled **Subagent
+Context** instead of **Group Chat Context**.
+
+## Workspace bootstrap injection
+
+Bootstrap files are trimmed and appended under **Project Context** so the model sees identity and profile context without needing explicit reads:
+
+- `AGENTS.md`
+- `SOUL.md`
+- `TOOLS.md`
+- `IDENTITY.md`
+- `USER.md`
+- `HEARTBEAT.md`
+- `BOOTSTRAP.md` (only on brand-new workspaces)
+
+Large files are truncated with a marker. The max per-file size is controlled by
+`agents.defaults.bootstrapMaxChars` (default: 20000). Missing files inject a
+short missing-file marker.
+
+Internal hooks can intercept this step via `agent:bootstrap` to mutate or replace
+the injected bootstrap files (for example swapping `SOUL.md` for an alternate persona).
+
+To inspect how much each injected file contributes (raw vs injected, truncation, plus tool schema overhead), use `/context list` or `/context detail`. See [Context](/concepts/context).
+
+## Time handling
+
+The system prompt includes a dedicated **Current Date & Time** section when the
+user timezone is known. To keep the prompt cache-stable, it now only includes
+the **time zone** (no dynamic clock or time format).
+
+Use `session_status` when the agent needs the current time; the status card
+includes a timestamp line.
+
+Configure with:
+
+- `agents.defaults.userTimezone`
+- `agents.defaults.timeFormat` (`auto` | `12` | `24`)
+
+See [Date & Time](/date-time) for full behavior details.
+
+## Skills
+
+When eligible skills exist, Moltbot injects a compact **available skills list**
+(`formatSkillsForPrompt`) that includes the **file path** for each skill. The
+prompt instructs the model to use `read` to load the SKILL.md at the listed
+location (workspace, managed, or bundled). If no skills are eligible, the
+Skills section is omitted.
+
+```
+<available_skills>
+  <skill>
+    <name>...</name>
+    <description>...</description>
+    <location>...</location>
+  </skill>
+</available_skills>
+```
+
+This keeps the base prompt small while still enabling targeted skill usage.
+
+## Documentation
+
+When available, the system prompt includes a **Documentation** section that points to the
+local Moltbot docs directory (either `docs/` in the repo workspace or the bundled npm
+package docs) and also notes the public mirror, source repo, community Discord, and
+ClawdHub (https://clawdhub.com) for skills discovery. The prompt instructs the model to consult local docs first
+for Moltbot behavior, commands, configuration, or architecture, and to run
+`moltbot status` itself when possible (asking the user only when it lacks access).
--- a/docker-compose/ez-assistant/docs/concepts/timezone.md
+++ b/docker-compose/ez-assistant/docs/concepts/timezone.md
@@ -0,0 +1,89 @@
+---
+summary: "Timezone handling for agents, envelopes, and prompts"
+read_when:
+  - You need to understand how timestamps are normalized for the model
+  - Configuring the user timezone for system prompts
+---
+
+# Timezones
+
+Moltbot standardizes timestamps so the model sees a **single reference time**.
+
+## Message envelopes (local by default)
+
+Inbound messages are wrapped in an envelope like:
+
+```
+[Provider ... 2026-01-05 16:26 PST] message text
+```
+
+The timestamp in the envelope is **host-local by default**, with minutes precision.
+
+You can override this with:
+
+```json5
+{
+  agents: {
+    defaults: {
+      envelopeTimezone: "local", // "utc" | "local" | "user" | IANA timezone
+      envelopeTimestamp: "on", // "on" | "off"
+      envelopeElapsed: "on" // "on" | "off"
+    }
+  }
+}
+```
+
+- `envelopeTimezone: "utc"` uses UTC.
+- `envelopeTimezone: "user"` uses `agents.defaults.userTimezone` (falls back to host timezone).
+- Use an explicit IANA timezone (e.g., `"Europe/Vienna"`) for a fixed offset.
+- `envelopeTimestamp: "off"` removes absolute timestamps from envelope headers.
+- `envelopeElapsed: "off"` removes elapsed time suffixes (the `+2m` style).
+
+### Examples
+
+**Local (default):**
+
+```
+[Signal Alice +1555 2026-01-18 00:19 PST] hello
+```
+
+**Fixed timezone:**
+
+```
+[Signal Alice +1555 2026-01-18 06:19 GMT+1] hello
+```
+
+**Elapsed time:**
+
+```
+[Signal Alice +1555 +2m 2026-01-18T05:19Z] follow-up
+```
+
+## Tool payloads (raw provider data + normalized fields)
+
+Tool calls (`channels.discord.readMessages`, `channels.slack.readMessages`, etc.) return **raw provider timestamps**.
+We also attach normalized fields for consistency:
+
+- `timestampMs` (UTC epoch milliseconds)
+- `timestampUtc` (ISO 8601 UTC string)
+
+Raw provider fields are preserved.
+
+## User timezone for the system prompt
+
+Set `agents.defaults.userTimezone` to tell the model the user's local time zone. If it is
+unset, Moltbot resolves the **host timezone at runtime** (no config write).
+
+```json5
+{
+  agents: { defaults: { userTimezone: "America/Chicago" } }
+}
+```
+
+The system prompt includes:
+- `Current Date & Time` section with local time and timezone
+- `Time format: 12-hour` or `24-hour`
+
+You can control the prompt format with `agents.defaults.timeFormat` (`auto` | `12` | `24`).
+
+See [Date & Time](/date-time) for the full behavior and examples.
--- a/docker-compose/ez-assistant/docs/concepts/typebox.md
+++ b/docker-compose/ez-assistant/docs/concepts/typebox.md
@@ -0,0 +1,281 @@
+---
+summary: "TypeBox schemas as the single source of truth for the gateway protocol"
+read_when:
+  - Updating protocol schemas or codegen
+---
+# TypeBox as protocol source of truth
+
+Last updated: 2026-01-10
+
+TypeBox is a TypeScript-first schema library. We use it to define the **Gateway
+WebSocket protocol** (handshake, request/response, server events). Those schemas
+drive **runtime validation**, **JSON Schema export**, and **Swift codegen** for
+the macOS app. One source of truth; everything else is generated.
+
+If you want the higher-level protocol context, start with
+[Gateway architecture](/concepts/architecture).
+
+## Mental model (30 seconds)
+
+Every Gateway WS message is one of three frames:
+
+- **Request**: `{ type: "req", id, method, params }`
+- **Response**: `{ type: "res", id, ok, payload | error }`
+- **Event**: `{ type: "event", event, payload, seq?, stateVersion? }`
+
+The first frame **must** be a `connect` request. After that, clients can call
+methods (e.g. `health`, `send`, `chat.send`) and subscribe to events (e.g.
+`presence`, `tick`, `agent`).
+
+Connection flow (minimal):
+
+```
+Client                    Gateway
+  |---- req:connect -------->|
+  |<---- res:hello-ok --------|
+  |<---- event:tick ----------|
+  |---- req:health ---------->|
+  |<---- res:health ----------|
+```
+
+Common methods + events:
+
+| Category | Examples | Notes |
+| --- | --- | --- |
+| Core | `connect`, `health`, `status` | `connect` must be first |
+| Messaging | `send`, `poll`, `agent`, `agent.wait` | side-effects need `idempotencyKey` |
+| Chat | `chat.history`, `chat.send`, `chat.abort`, `chat.inject` | WebChat uses these |
+| Sessions | `sessions.list`, `sessions.patch`, `sessions.delete` | session admin |
+| Nodes | `node.list`, `node.invoke`, `node.pair.*` | Gateway WS + node actions |
+| Events | `tick`, `presence`, `agent`, `chat`, `health`, `shutdown` | server push |
+
+Authoritative list lives in `src/gateway/server.ts` (`METHODS`, `EVENTS`).
+
+## Where the schemas live
+
+- Source: `src/gateway/protocol/schema.ts`
+- Runtime validators (AJV): `src/gateway/protocol/index.ts`
+- Server handshake + method dispatch: `src/gateway/server.ts`
+- Node client: `src/gateway/client.ts`
+- Generated JSON Schema: `dist/protocol.schema.json`
+- Generated Swift models: `apps/macos/Sources/MoltbotProtocol/GatewayModels.swift`
+
+## Current pipeline
+
+- `pnpm protocol:gen`
+  - writes JSON Schema (draft‑07) to `dist/protocol.schema.json`
+- `pnpm protocol:gen:swift`
+  - generates Swift gateway models
+- `pnpm protocol:check`
+  - runs both generators and verifies the output is committed
+
+## How the schemas are used at runtime
+
+- **Server side**: every inbound frame is validated with AJV. The handshake only
+  accepts a `connect` request whose params match `ConnectParams`.
+- **Client side**: the JS client validates event and response frames before
+  using them.
+- **Method surface**: the Gateway advertises the supported `methods` and
+  `events` in `hello-ok`.
+
+## Example frames
+
+Connect (first message):
+
+```json
+{
+  "type": "req",
+  "id": "c1",
+  "method": "connect",
+  "params": {
+    "minProtocol": 2,
+    "maxProtocol": 2,
+    "client": {
+      "id": "moltbot-macos",
+      "displayName": "macos",
+      "version": "1.0.0",
+      "platform": "macos 15.1",
+      "mode": "ui",
+      "instanceId": "A1B2"
+    }
+  }
+}
+```
+
+Hello-ok response:
+
+```json
+{
+  "type": "res",
+  "id": "c1",
+  "ok": true,
+  "payload": {
+    "type": "hello-ok",
+    "protocol": 2,
+    "server": { "version": "dev", "connId": "ws-1" },
+    "features": { "methods": ["health"], "events": ["tick"] },
+    "snapshot": { "presence": [], "health": {}, "stateVersion": { "presence": 0, "health": 0 }, "uptimeMs": 0 },
+    "policy": { "maxPayload": 1048576, "maxBufferedBytes": 1048576, "tickIntervalMs": 30000 }
+  }
+}
+```
+
+Request + response:
+
+```json
+{ "type": "req", "id": "r1", "method": "health" }
+```
+
+```json
+{ "type": "res", "id": "r1", "ok": true, "payload": { "ok": true } }
+```
+
+Event:
+
+```json
+{ "type": "event", "event": "tick", "payload": { "ts": 1730000000 }, "seq": 12 }
+```
+
+## Minimal client (Node.js)
+
+Smallest useful flow: connect + health.
+
+```ts
+import { WebSocket } from "ws";
+
+const ws = new WebSocket("ws://127.0.0.1:18789");
+
+ws.on("open", () => {
+  ws.send(JSON.stringify({
+    type: "req",
+    id: "c1",
+    method: "connect",
+    params: {
+      minProtocol: 3,
+      maxProtocol: 3,
+      client: {
+        id: "cli",
+        displayName: "example",
+        version: "dev",
+        platform: "node",
+        mode: "cli"
+      }
+    }
+  }));
+});
+
+ws.on("message", (data) => {
+  const msg = JSON.parse(String(data));
+  if (msg.type === "res" && msg.id === "c1" && msg.ok) {
+    ws.send(JSON.stringify({ type: "req", id: "h1", method: "health" }));
+  }
+  if (msg.type === "res" && msg.id === "h1") {
+    console.log("health:", msg.payload);
+    ws.close();
+  }
+});
+```
+
+## Worked example: add a method end‑to‑end
+
+Example: add a new `system.echo` request that returns `{ ok: true, text }`.
+
+1) **Schema (source of truth)**
+
+Add to `src/gateway/protocol/schema.ts`:
+
+```ts
+export const SystemEchoParamsSchema = Type.Object(
+  { text: NonEmptyString },
+  { additionalProperties: false },
+);
+
+export const SystemEchoResultSchema = Type.Object(
+  { ok: Type.Boolean(), text: NonEmptyString },
+  { additionalProperties: false },
+);
+```
+
+Add both to `ProtocolSchemas` and export types:
+
+```ts
+  SystemEchoParams: SystemEchoParamsSchema,
+  SystemEchoResult: SystemEchoResultSchema,
+```
+
+```ts
+export type SystemEchoParams = Static<typeof SystemEchoParamsSchema>;
+export type SystemEchoResult = Static<typeof SystemEchoResultSchema>;
+```
+
+2) **Validation**
+
+In `src/gateway/protocol/index.ts`, export an AJV validator:
+
+```ts
+export const validateSystemEchoParams =
+  ajv.compile<SystemEchoParams>(SystemEchoParamsSchema);
+```
+
+3) **Server behavior**
+
+Add a handler in `src/gateway/server-methods/system.ts`:
+
+```ts
+export const systemHandlers: GatewayRequestHandlers = {
+  "system.echo": ({ params, respond }) => {
+    const text = String(params.text ?? "");
+    respond(true, { ok: true, text });
+  },
+};
+```
+
+Register it in `src/gateway/server-methods.ts` (already merges `systemHandlers`),
+then add `"system.echo"` to `METHODS` in `src/gateway/server.ts`.
+
+4) **Regenerate**
+
+```bash
+pnpm protocol:check
+```
+
+5) **Tests + docs**
+
+Add a server test in `src/gateway/server.*.test.ts` and note the method in docs.
+
+## Swift codegen behavior
+
+The Swift generator emits:
+
+- `GatewayFrame` enum with `req`, `res`, `event`, and `unknown` cases
+- Strongly typed payload structs/enums
+- `ErrorCode` values and `GATEWAY_PROTOCOL_VERSION`
+
+Unknown frame types are preserved as raw payloads for forward compatibility.
+
+## Versioning + compatibility
+
+- `PROTOCOL_VERSION` lives in `src/gateway/protocol/schema.ts`.
+- Clients send `minProtocol` + `maxProtocol`; the server rejects mismatches.
+- The Swift models keep unknown frame types to avoid breaking older clients.
+
+## Schema patterns and conventions
+
+- Most objects use `additionalProperties: false` for strict payloads.
+- `NonEmptyString` is the default for IDs and method/event names.
+- The top-level `GatewayFrame` uses a **discriminator** on `type`.
+- Methods with side effects usually require an `idempotencyKey` in params
+  (example: `send`, `poll`, `agent`, `chat.send`).
+
+## Live schema JSON
+
+Generated JSON Schema is in the repo at `dist/protocol.schema.json`. The
+published raw file is typically available at:
+
+- https://raw.githubusercontent.com/moltbot/moltbot/main/dist/protocol.schema.json
+
+## When you change schemas
+
+1) Update the TypeBox schemas.
+2) Run `pnpm protocol:check`.
+3) Commit the regenerated schema + Swift models.
--- a/docker-compose/ez-assistant/docs/concepts/typing-indicators.md
+++ b/docker-compose/ez-assistant/docs/concepts/typing-indicators.md
@@ -0,0 +1,59 @@
+---
+summary: "When Moltbot shows typing indicators and how to tune them"
+read_when:
+  - Changing typing indicator behavior or defaults
+---
+# Typing indicators
+
+Typing indicators are sent to the chat channel while a run is active. Use
+`agents.defaults.typingMode` to control **when** typing starts and `typingIntervalSeconds`
+to control **how often** it refreshes.
+
+## Defaults
+When `agents.defaults.typingMode` is **unset**, Moltbot keeps the legacy behavior:
+- **Direct chats**: typing starts immediately once the model loop begins.
+- **Group chats with a mention**: typing starts immediately.
+- **Group chats without a mention**: typing starts only when message text begins streaming.
+- **Heartbeat runs**: typing is disabled.
+
+## Modes
+Set `agents.defaults.typingMode` to one of:
+- `never` — no typing indicator, ever.
+- `instant` — start typing **as soon as the model loop begins**, even if the run
+  later returns only the silent reply token.
+- `thinking` — start typing on the **first reasoning delta** (requires
+  `reasoningLevel: "stream"` for the run).
+- `message` — start typing on the **first non-silent text delta** (ignores
+  the `NO_REPLY` silent token).
+
+Order of “how early it fires”:
+`never` → `message` → `thinking` → `instant`
+
+## Configuration
+```json5
+{
+  agent: {
+    typingMode: "thinking",
+    typingIntervalSeconds: 6
+  }
+}
+```
+
+You can override mode or cadence per session:
+```json5
+{
+  session: {
+    typingMode: "message",
+    typingIntervalSeconds: 4
+  }
+}
+```
+
+## Notes
+- `message` mode won’t show typing for silent-only replies (e.g. the `NO_REPLY`
+  token used to suppress output).
+- `thinking` only fires if the run streams reasoning (`reasoningLevel: "stream"`).
+  If the model doesn’t emit reasoning deltas, typing won’t start.
+- Heartbeats never show typing, regardless of mode.
+- `typingIntervalSeconds` controls the **refresh cadence**, not the start time.
+  The default is 6 seconds.
--- a/docker-compose/ez-assistant/docs/concepts/usage-tracking.md
+++ b/docker-compose/ez-assistant/docs/concepts/usage-tracking.md
@@ -0,0 +1,30 @@
+---
+summary: "Usage tracking surfaces and credential requirements"
+read_when:
+  - You are wiring provider usage/quota surfaces
+  - You need to explain usage tracking behavior or auth requirements
+---
+# Usage tracking
+
+## What it is
+- Pulls provider usage/quota directly from their usage endpoints.
+- No estimated costs; only the provider-reported windows.
+
+## Where it shows up
+- `/status` in chats: emoji‑rich status card with session tokens + estimated cost (API key only). Provider usage shows for the **current model provider** when available.
+- `/usage off|tokens|full` in chats: per-response usage footer (OAuth shows tokens only).
+- `/usage cost` in chats: local cost summary aggregated from Moltbot session logs.
+- CLI: `moltbot status --usage` prints a full per-provider breakdown.
+- CLI: `moltbot channels list` prints the same usage snapshot alongside provider config (use `--no-usage` to skip).
+- macOS menu bar: “Usage” section under Context (only if available).
+
+## Providers + credentials
+- **Anthropic (Claude)**: OAuth tokens in auth profiles.
+- **GitHub Copilot**: OAuth tokens in auth profiles.
+- **Gemini CLI**: OAuth tokens in auth profiles.
+- **Antigravity**: OAuth tokens in auth profiles.
+- **OpenAI Codex**: OAuth tokens in auth profiles (accountId used when present).
+- **MiniMax**: API key (coding plan key; `MINIMAX_CODE_PLAN_KEY` or `MINIMAX_API_KEY`); uses the 5‑hour coding plan window.
+- **z.ai**: API key via env/config/auth store.
+
+Usage is hidden if no matching OAuth/API credentials exist.