Offline-First Mode Quickstart Guide 🔌
Version: V14.0.0+ Prerequisites: Python 3.10+, 8GB+ RAM (16GB recommended)
Boring-Gemini V14.0 introduces a true Offline-First architecture. This guide helps you set up a fully autonomous local development environment with zero internet dependency.
1. Quick Setup
Step 1: Install Dependencies
Offline mode requires llama-cpp-python for local inference.
# Install with local support extras
pip install boring-aicoding[local]
# Or manually
pip install llama-cpp-python
GPU Acceleration: If you have an NVIDIA GPU, install with CUDA support:
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Step 2: Download a Model
Use the built-in CLI to download a recommended GGUF model.
# List recommended models
boring model list
# Download a balanced model (e.g., Llama-3-8B-Quantized)
boring model download --name "llama-3-8b-instruct-q4_k_m.gguf"
Models are stored in ~/.boring/models/.
Step 3: Enable Offline Mode
You can enable offline mode globally or per session.
Option A: CLI Toggle (Persistent)
Option B: Environment Variable (Temporary)
2. Verification
Run the doctor command to verify your offline status.
Output should show:
5. Offline Mode
- Status: ENABLED
6. Local LLM Models
- Models: 1 available
- llama-3-8b-instruct-q4_k_m.gguf
3. How it Works
When Offline Mode is active:
- Network Cutoff: All external API calls (Gemini, OpenAI, Anthropic) are blocked.
- Local Inference: The Agent automatically routes LLM requests to your local GGUF model.
- Local Tools: Only local tools are loaded (File Ops, Local RAG, Shell). Web search tools are disabled.
- Local RAG: Queries use
SentenceTransformers(local embeddings) andChromaDB(local vector store).
Fallback Behavior
If no local model is loaded but Offline Mode is ON, the system will error out gracefully suggesting you to run boring model download.
4. Performance Tuning
Create a .env file in your project to tune performance:
# .env
BORING_LOCAL_MODEL_PATH=~/.boring/models/my-custom-model.gguf
BORING_LOCAL_CTX_WINDOW=8192
BORING_LOCAL_GPU_LAYERS=35 # Offload layers to GPU
Last updated: V14.0.0