Architecture

Analysis Date: 2026-04-17

Pattern Overview

Overall: Docker Compose service orchestration β€” no compiled application source code. All services are pre-built images configured via environment variables and volume mounts.

Key Characteristics:
- Fully local-first, zero cloud dependency
- All inter-service communication stays inside the Docker network
- One service (Ollama) runs on the host machine, accessed via host.docker.internal
- PostgreSQL replaces the default SQLite backend for production-grade persistence
- AI metadata enrichment is a satellite layer β€” it augments Paperless-ngx via REST API without modifying the core storage system

Services

broker (Redis 8):
- Purpose: Message broker and task queue for Paperless-ngx's document processing pipeline
- Image: docker.io/library/redis:8
- Port: 6379 (internal only, not exposed to host)
- Persistent volume: redisdata:/data
- Depended on by: webserver

db (PostgreSQL 18):
- Purpose: Primary relational database for all Paperless-ngx application data (documents, tags, correspondents, document types, users)
- Image: docker.io/library/postgres:18
- Port: 5432 (internal only, not exposed to host)
- Persistent volume: pgdata:/var/lib/postgresql
- Credentials: set via POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD (all default to paperless in docker-compose.yml)
- Depended on by: webserver

webserver (Paperless-ngx):
- Purpose: Core document management system β€” handles file ingestion, OCR, full-text search, storage, and REST API
- Image: ghcr.io/paperless-ngx/paperless-ngx:latest
- Port: 8000:8000 (exposed to host)
- Volumes:
- data:/usr/src/paperless/data β€” application data and search index
- media:/usr/src/paperless/media β€” stored document files
- ./export:/usr/src/paperless/export β€” export directory (host-mounted, gitignored)
- ./consume:/usr/src/paperless/consume β€” document inbox for file-drop ingestion (host-mounted, gitignored)
- Configuration: docker-compose.env (not in git, must be created manually) + inline environment (PAPERLESS_REDIS, PAPERLESS_DBHOST)
- Depends on: db, broker
- Depended on by: paperless-ai

paperless-ai:
- Purpose: AI enrichment satellite β€” polls Paperless-ngx via REST API, sends document text to Ollama for analysis, writes generated metadata (title, tags, correspondent, document type, date) back via API
- Image: clusterzx/paperless-ai
- Container name: paperless-ai (explicit)
- Port: 3000:3000 (exposed to host, configurable via PAPERLESS_AI_PORT env var)
- Persistent volume: paperless-ai_data:/app/data β€” stores its own config including .env with API token and Ollama settings
- Security hardening: cap_drop: ALL, no-new-privileges: true
- RAG integration: RAG_SERVICE_URL=http://webserver:8000, RAG_SERVICE_ENABLED=true
- Depends on: webserver
- Reaches Ollama via: http://host.docker.internal:11434 (host machine, not containerized)

Ollama (host process, not containerized):
- Purpose: Local LLM inference β€” runs llama3.2 (and llama2) for document analysis
- Runs on: host machine (GPU access)
- Port: 11434 on host
- Started with: ollama serve (must be running before docker compose up)
- Reached from containers via: http://host.docker.internal:11434

Data Flow

Document Ingestion:

  1. User drops file into ./consume/ directory or uploads via web UI at http://localhost:8000
  2. webserver detects file, runs OCR (Tesseract), extracts text and metadata
  3. Document stored in media volume; record written to PostgreSQL via db
  4. Task queued through broker (Redis) for async processing steps
  5. Full-text index updated in data volume

AI Enrichment (every 30 minutes):

  1. paperless-ai cron job fires (SCAN_INTERVAL=*/30 * * * *)
  2. Fetches unprocessed documents from http://webserver:8000/api using PAPERLESS_API_TOKEN
  3. Sends document text to Ollama at http://host.docker.internal:11434 with model llama3.2
  4. Ollama returns structured metadata: title, tags, correspondent, document type, date
  5. paperless-ai writes metadata back to Paperless-ngx via REST API
  6. ChromaDB/RAGZ creates vector embeddings (SentenceTransformer) stored in paperless-ai_data volume

State Management:
- Document records and metadata: PostgreSQL (pgdata volume)
- Task queue state: Redis (redisdata volume)
- Raw document files: Docker media volume
- Application/search index data: Docker data volume
- AI config and vector embeddings: Docker paperless-ai_data volume

Entry Points

Document upload (human):
- Web UI: http://localhost:8000
- File drop: ./consume/ directory (host-mounted)

AI dashboard:
- Web UI: http://localhost:3000
- Initial setup: http://localhost:3000/setup

Admin / API:
- Paperless-ngx REST API: http://localhost:8000/api
- Django shell: docker exec -it paperless-webserver-1 python3 manage.py shell
- Token generation: docker exec paperless-webserver-1 python3 manage.py shell -c "..."

Network Topology

All four containerized services share the default Docker Compose bridge network. Service-to-service communication uses Docker DNS names (webserver, db, broker).

Ollama is the only component outside the Docker network. It is reached from paperless-ai using host.docker.internal:11434.

Critical networking constraint: network_mode: bridge must NOT be set on paperless-ai β€” doing so isolates it from the Compose network and breaks host.docker.internal resolution, silently preventing Ollama access.

Error Handling

Silent failure modes (documented in .claude/memory/project_coldstart_bug.md):

  1. Missing docker-compose.env β€” services start but Paperless-ngx is misconfigured; no error visible on port 8000
  2. Ollama not running on host β€” paperless-ai starts successfully but AI enrichment silently fails at each scan interval
  3. network_mode: bridge on paperless-ai β€” container starts, web UI works, but all Ollama calls fail

Pre-flight checklist (must verify before docker compose up):
1. docker-compose.env exists with ADMIN_USER, ADMIN_PASSWORD, SECRET_KEY
2. Ollama is running: ollama serve
3. network_mode: bridge is NOT present in docker-compose.yml for paperless-ai

Planned Extensions

Nullfeld-Integration: FPGA x SoC star maps layer β€” not yet implemented

Vector-Bridge to Eule/Qdrant: Cross-system RAG connecting this stack to crumbforest.org Qdrant instance β€” not yet implemented

Custom Fields: Paperless-ngx custom field configuration β€” not yet done


Architecture analysis: 2026-04-17