13 KiB
🤖 Agentic LLM Hub
Self-hosted AI agent platform with multi-provider LLM aggregation, advanced reasoning engines, and integrated development environment.
📋 Overview
Agentic LLM Hub is a production-ready, self-hosted platform that unifies multiple LLM providers behind a single API gateway. It features intelligent request routing, multiple reasoning engines (ReAct, Plan-and-Execute, Reflexion), persistent memory via vector storage, and a complete web-based IDE with AI assistance.
Why Agentic LLM Hub?
- 🔌 Multi-Provider Aggregation — Seamlessly route between 7+ LLM providers with automatic failover
- 🧠 Advanced Reasoning — Choose from multiple reasoning modes based on task complexity
- 💰 Cost Optimization — Free tier providers prioritized, paid providers as fallback
- 🖥️ Complete Workspace — Browser-based VS Code with AI assistant integration
- 🔧 Extensible — MCP tool ecosystem for custom integrations
- 📊 Persistent Memory — Vector-based conversation history and knowledge storage
✨ Features
| Component | Technology | Purpose |
|---|---|---|
| LLM Gateway | LiteLLM | Unified API for 7+ providers with load balancing |
| Reasoning Engine | Custom implementation | ReAct, Plan-and-Execute, Reflexion modes |
| Vector Memory | ChromaDB | Persistent embeddings & conversation history |
| Cache Layer | Redis | Response caching & session management |
| Web IDE | code-server | VS Code in browser with Continue.dev |
| Chat Interface | Open WebUI | Modern conversational UI |
| Tool Gateway | MCP Gateway | Extensible tool ecosystem |
Supported Providers
| Provider | Free Tier | Best For |
|---|---|---|
| Groq | 20 RPM | Speed, quick coding |
| Mistral | 1B tokens/month | High volume processing |
| OpenRouter | 50 req/day | Universal fallback access |
| Anthropic Claude | $5 trial | Complex reasoning |
| Moonshot Kimi | $5 signup | Coding, 128K context |
| DeepSeek | Pay-as-you-go | Cheapest reasoning |
| Cohere | 1K calls/month | Embeddings |
🚀 Quick Start
Prerequisites
- OS: Debian 12, Ubuntu 22.04+, or Proxmox LXC
- RAM: 4GB minimum (8GB recommended with IDE)
- Storage: 20GB free space
- Docker & Docker Compose
Installation
# 1. Clone the repository
git clone https://github.com/yourusername/llm-hub.git
cd llm-hub
# 2. Configure environment
cp .env.example .env
nano .env # Add your API keys
# 3. Deploy
make setup
make start
Or use scripts directly:
chmod +x setup.sh start.sh
./setup.sh && ./start.sh full
🌐 Access Points
Once running, access the services at:
| Service | URL | Description |
|---|---|---|
| 📝 Web UI | http://localhost:3000 |
Chat interface with Open WebUI |
| 💻 VS Code IDE | http://localhost:8443 |
Full IDE with AI assistant |
| 🔌 Agent API | http://localhost:8080/v1 |
Main API endpoint |
| ⚡ LiteLLM | http://localhost:4000 |
LLM Gateway & model management |
| 🔧 MCP Tools | http://localhost:8001/docs |
Tool OpenAPI documentation |
| 🧠 ChromaDB | http://localhost:8000 |
Vector memory dashboard |
🧠 Reasoning Modes
Choose the right reasoning strategy for your task:
| Mode | Description | Speed | Accuracy | Best For |
|---|---|---|---|---|
react |
Iterative thought-action loops | ⚡ Fast | ⭐⭐⭐ Medium | Simple Q&A, debugging |
plan_execute |
Plan first, then execute | 🚀 Medium | ⭐⭐⭐⭐ High | Multi-step tasks, automation |
reflexion |
Self-correcting with verification | 🐢 Slow | ⭐⭐⭐⭐⭐ Very High | Code review, critical analysis |
auto |
Automatic mode selection | ⚡ Variable | ⭐⭐⭐⭐ Adaptive | General purpose |
Set your default mode in .env:
DEFAULT_REASONING=auto
Or specify per-request:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-agent-xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "Refactor this codebase",
"reasoning_mode": "reflexion"
}'
🛠️ Configuration
1. API Keys
Edit .env and add at least one provider:
# Required: At least one LLM provider
GROQ_API_KEY_1=gsk_xxx
MISTRAL_API_KEY=your_key
# Recommended: Multiple providers for redundancy
ANTHROPIC_API_KEY=sk-ant-xxx
MOONSHOT_API_KEY=sk-xxx
OPENROUTER_API_KEY=sk-or-xxx
2. Security
Change default passwords in .env:
# Code-Server access
IDE_PASSWORD=your-secure-password
IDE_SUDO_PASSWORD=your-admin-password
# API access
MASTER_KEY=sk-agent-$(openssl rand -hex 16)
3. Advanced Settings
# Enable self-reflection
ENABLE_REFLECTION=true
# Maximum iterations per request
MAX_ITERATIONS=10
# Knowledge graph (requires more RAM)
ENABLE_KNOWLEDGE_GRAPH=false
💻 Usage Examples
Python SDK
import requests
API_URL = "http://localhost:8080/v1"
API_KEY = "sk-agent-xxx"
response = requests.post(
f"{API_URL}/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"message": "Create a Python script to fetch weather data",
"reasoning_mode": "plan_execute",
"session_id": "my-session-001"
}
)
result = response.json()
print(result["response"])
print(f"Steps taken: {len(result['steps'])}")
cURL
# Simple query
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-agent-xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "Explain quantum computing",
"reasoning_mode": "react"
}'
# Complex task with history
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-agent-xxx" \
-H "Content-Type: application/json" \
-d '{
"message": "Build a REST API with FastAPI",
"reasoning_mode": "plan_execute",
"max_iterations": 15
}'
OpenAI-Compatible API
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="sk-agent-xxx"
)
response = client.chat.completions.create(
model="agent/orchestrator",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={"reasoning_mode": "auto"}
)
📊 Architecture
┌─────────────────────────────────────────────────────────────┐
│ Agentic LLM Hub │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web UI │ │ VS Code IDE │ │ API │ │
│ │ (:3000) │ │ (:8443) │ │ (:8080) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ │ Agent Core │ │
│ │ (Reasoning Engines) │ │
│ └────────────┬────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ │ │ │ │
│ ┌────┴────┐ ┌─────┴─────┐ ┌──────┴──────┐ │
│ │ Redis │ │ LiteLLM │ │ ChromaDB │ │
│ │ (Cache) │ │ (Gateway) │ │ (Memory) │ │
│ │(:6379) │ │ (:4000) │ │ (:8000) │ │
│ └─────────┘ └─────┬─────┘ └─────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ┌─────┴──┐ ┌─────┴──┐ ┌─────┴──┐ │
│ │ Groq │ │ Claude │ │ Mistral│ │
│ └───┬────┘ └───┬────┘ └───┬────┘ │
│ │ │ │ │
│ ┌───┴──┐ ┌────┴───┐ ┌────┴───┐ │
│ │ Kimi │ │DeepSeek│ │ ... │ │
│ └──────┘ └────────┘ └────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
🧰 Management Commands
Use the Makefile for common operations:
make setup # Initial setup
make start # Start all services (full profile)
make start-ide # Start with IDE only
make stop # Stop all services
make logs # View logs
make status # Check service status
make update # Pull latest and update images
make backup # Backup data directories
make clean # Remove containers (data preserved)
Or use Docker Compose profiles:
# Core services only
docker-compose up -d
# Full stack with IDE and UI
docker-compose --profile ide --profile ui up -d
# With MCP tools
docker-compose --profile mcp up -d
📚 Documentation
- Setup Guide — Detailed installation and configuration
- API Reference — Complete API documentation with examples
- Provider Guide — Provider setup and rate limits
🔄 Updates
Update to the latest version:
# Automatic update
make update
# Or manual
git pull origin main
docker-compose pull
docker-compose up -d
🐛 Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| Docker fails in LXC | Enable nesting: features: nesting=1,keyctl=1 |
| Port conflicts | Edit docker-compose.yml port mappings |
| Permission denied | Run chown -R 1000:1000 workspace/ |
| API not responding | Check docker-compose logs agent-core |
| Out of memory | Increase swap or reduce to core services only |
Health Checks
# Check API health
curl http://localhost:8080/health
# Check LiteLLM
curl http://localhost:4000/health
# View all logs
make logs
# Check container status
docker-compose ps
🛡️ Security Considerations
- Change default passwords in
.envbefore deploying - Use HTTPS in production (reverse proxy recommended)
- Restrict network access to admin ports (8080, 8443)
- Rotate API keys regularly
- Review provider rate limits to prevent unexpected costs
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
📄 License
This project is licensed under the MIT License.
Built with ❤️ for the self-hosting community