llm-hub/README.md

13 KiB

🤖 Agentic LLM Hub

Self-hosted AI agent platform with multi-provider LLM aggregation, advanced reasoning engines, and integrated development environment.

Docker License Platform

Quick StartFeaturesDocumentationAPI Reference


📋 Overview

Agentic LLM Hub is a production-ready, self-hosted platform that unifies multiple LLM providers behind a single API gateway. It features intelligent request routing, multiple reasoning engines (ReAct, Plan-and-Execute, Reflexion), persistent memory via vector storage, and a complete web-based IDE with AI assistance.

Why Agentic LLM Hub?

  • 🔌 Multi-Provider Aggregation — Seamlessly route between 7+ LLM providers with automatic failover
  • 🧠 Advanced Reasoning — Choose from multiple reasoning modes based on task complexity
  • 💰 Cost Optimization — Free tier providers prioritized, paid providers as fallback
  • 🖥️ Complete Workspace — Browser-based VS Code with AI assistant integration
  • 🔧 Extensible — MCP tool ecosystem for custom integrations
  • 📊 Persistent Memory — Vector-based conversation history and knowledge storage

Features

Component Technology Purpose
LLM Gateway LiteLLM Unified API for 7+ providers with load balancing
Reasoning Engine Custom implementation ReAct, Plan-and-Execute, Reflexion modes
Vector Memory ChromaDB Persistent embeddings & conversation history
Cache Layer Redis Response caching & session management
Web IDE code-server VS Code in browser with Continue.dev
Chat Interface Open WebUI Modern conversational UI
Tool Gateway MCP Gateway Extensible tool ecosystem

Supported Providers

Provider Free Tier Best For
Groq 20 RPM Speed, quick coding
Mistral 1B tokens/month High volume processing
OpenRouter 50 req/day Universal fallback access
Anthropic Claude $5 trial Complex reasoning
Moonshot Kimi $5 signup Coding, 128K context
DeepSeek Pay-as-you-go Cheapest reasoning
Cohere 1K calls/month Embeddings

🚀 Quick Start

Prerequisites

  • OS: Debian 12, Ubuntu 22.04+, or Proxmox LXC
  • RAM: 4GB minimum (8GB recommended with IDE)
  • Storage: 20GB free space
  • Docker & Docker Compose

Installation

# 1. Clone the repository
git clone https://github.com/yourusername/llm-hub.git
cd llm-hub

# 2. Configure environment
cp .env.example .env
nano .env  # Add your API keys

# 3. Deploy
make setup
make start

Or use scripts directly:

chmod +x setup.sh start.sh
./setup.sh && ./start.sh full

🌐 Access Points

Once running, access the services at:

Service URL Description
📝 Web UI http://localhost:3000 Chat interface with Open WebUI
💻 VS Code IDE http://localhost:8443 Full IDE with AI assistant
🔌 Agent API http://localhost:8080/v1 Main API endpoint
LiteLLM http://localhost:4000 LLM Gateway & model management
🔧 MCP Tools http://localhost:8001/docs Tool OpenAPI documentation
🧠 ChromaDB http://localhost:8000 Vector memory dashboard

🧠 Reasoning Modes

Choose the right reasoning strategy for your task:

Mode Description Speed Accuracy Best For
react Iterative thought-action loops Fast Medium Simple Q&A, debugging
plan_execute Plan first, then execute 🚀 Medium High Multi-step tasks, automation
reflexion Self-correcting with verification 🐢 Slow Very High Code review, critical analysis
auto Automatic mode selection Variable Adaptive General purpose

Set your default mode in .env:

DEFAULT_REASONING=auto

Or specify per-request:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Refactor this codebase",
    "reasoning_mode": "reflexion"
  }'

🛠️ Configuration

1. API Keys

Edit .env and add at least one provider:

# Required: At least one LLM provider
GROQ_API_KEY_1=gsk_xxx
MISTRAL_API_KEY=your_key

# Recommended: Multiple providers for redundancy
ANTHROPIC_API_KEY=sk-ant-xxx
MOONSHOT_API_KEY=sk-xxx
OPENROUTER_API_KEY=sk-or-xxx

2. Security

Change default passwords in .env:

# Code-Server access
IDE_PASSWORD=your-secure-password
IDE_SUDO_PASSWORD=your-admin-password

# API access
MASTER_KEY=sk-agent-$(openssl rand -hex 16)

3. Advanced Settings

# Enable self-reflection
ENABLE_REFLECTION=true

# Maximum iterations per request
MAX_ITERATIONS=10

# Knowledge graph (requires more RAM)
ENABLE_KNOWLEDGE_GRAPH=false

💻 Usage Examples

Python SDK

import requests

API_URL = "http://localhost:8080/v1"
API_KEY = "sk-agent-xxx"

response = requests.post(
    f"{API_URL}/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "message": "Create a Python script to fetch weather data",
        "reasoning_mode": "plan_execute",
        "session_id": "my-session-001"
    }
)

result = response.json()
print(result["response"])
print(f"Steps taken: {len(result['steps'])}")

cURL

# Simple query
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain quantum computing",
    "reasoning_mode": "react"
  }'

# Complex task with history
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Build a REST API with FastAPI",
    "reasoning_mode": "plan_execute",
    "max_iterations": 15
  }'

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-agent-xxx"
)

response = client.chat.completions.create(
    model="agent/orchestrator",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"reasoning_mode": "auto"}
)

📊 Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Agentic LLM Hub                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Web UI     │  │  VS Code IDE │  │    API       │     │
│  │  (:3000)     │  │   (:8443)    │  │  (:8080)     │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                 │                 │              │
│         └─────────────────┼─────────────────┘              │
│                           │                                │
│              ┌────────────┴────────────┐                  │
│              │     Agent Core          │                  │
│              │  (Reasoning Engines)    │                  │
│              └────────────┬────────────┘                  │
│                           │                                │
│       ┌───────────────────┼───────────────────┐           │
│       │                   │                   │            │
│  ┌────┴────┐       ┌─────┴─────┐      ┌──────┴──────┐     │
│  │  Redis  │       │  LiteLLM  │      │  ChromaDB   │     │
│  │ (Cache) │       │ (Gateway) │      │  (Memory)   │     │
│  │(:6379)  │       │  (:4000)  │      │   (:8000)   │     │
│  └─────────┘       └─────┬─────┘      └─────────────┘     │
│                          │                                 │
│            ┌─────────────┼─────────────┐                  │
│            │             │             │                   │
│      ┌─────┴──┐    ┌─────┴──┐   ┌─────┴──┐               │
│      │  Groq  │    │ Claude │   │ Mistral│               │
│      └───┬────┘    └───┬────┘   └───┬────┘               │
│          │             │             │                     │
│      ┌───┴──┐     ┌────┴───┐   ┌────┴───┐                │
│      │ Kimi │     │DeepSeek│   │  ...   │                │
│      └──────┘     └────────┘   └────────┘                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🧰 Management Commands

Use the Makefile for common operations:

make setup      # Initial setup
make start      # Start all services (full profile)
make start-ide  # Start with IDE only
make stop       # Stop all services
make logs       # View logs
make status     # Check service status
make update     # Pull latest and update images
make backup     # Backup data directories
make clean      # Remove containers (data preserved)

Or use Docker Compose profiles:

# Core services only
docker-compose up -d

# Full stack with IDE and UI
docker-compose --profile ide --profile ui up -d

# With MCP tools
docker-compose --profile mcp up -d

📚 Documentation


🔄 Updates

Update to the latest version:

# Automatic update
make update

# Or manual
git pull origin main
docker-compose pull
docker-compose up -d

🐛 Troubleshooting

Common Issues

Issue Solution
Docker fails in LXC Enable nesting: features: nesting=1,keyctl=1
Port conflicts Edit docker-compose.yml port mappings
Permission denied Run chown -R 1000:1000 workspace/
API not responding Check docker-compose logs agent-core
Out of memory Increase swap or reduce to core services only

Health Checks

# Check API health
curl http://localhost:8080/health

# Check LiteLLM
curl http://localhost:4000/health

# View all logs
make logs

# Check container status
docker-compose ps

🛡️ Security Considerations

  1. Change default passwords in .env before deploying
  2. Use HTTPS in production (reverse proxy recommended)
  3. Restrict network access to admin ports (8080, 8443)
  4. Rotate API keys regularly
  5. Review provider rate limits to prevent unexpected costs

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

📄 License

This project is licensed under the MIT License.


⬆ Back to Top

Built with ❤️ for the self-hosting community