13 KiB

Raw Permalink Blame History

🤖 Agentic LLM Hub

Self-hosted AI agent platform with multi-provider LLM aggregation, advanced reasoning engines, and integrated development environment.

Quick Start • Features • Documentation • API Reference

📋 Overview

Agentic LLM Hub is a production-ready, self-hosted platform that unifies multiple LLM providers behind a single API gateway. It features intelligent request routing, multiple reasoning engines (ReAct, Plan-and-Execute, Reflexion), persistent memory via vector storage, and a complete web-based IDE with AI assistance.

Why Agentic LLM Hub?

🔌 Multi-Provider Aggregation — Seamlessly route between 7+ LLM providers with automatic failover
🧠 Advanced Reasoning — Choose from multiple reasoning modes based on task complexity
💰 Cost Optimization — Free tier providers prioritized, paid providers as fallback
🖥️ Complete Workspace — Browser-based VS Code with AI assistant integration
🔧 Extensible — MCP tool ecosystem for custom integrations
📊 Persistent Memory — Vector-based conversation history and knowledge storage

✨ Features

Component	Technology	Purpose
LLM Gateway	LiteLLM	Unified API for 7+ providers with load balancing
Reasoning Engine	Custom implementation	ReAct, Plan-and-Execute, Reflexion modes
Vector Memory	ChromaDB	Persistent embeddings & conversation history
Cache Layer	Redis	Response caching & session management
Web IDE	code-server	VS Code in browser with Continue.dev
Chat Interface	Open WebUI	Modern conversational UI
Tool Gateway	MCP Gateway	Extensible tool ecosystem

Supported Providers

Provider	Free Tier	Best For
Groq	20 RPM	Speed, quick coding
Mistral	1B tokens/month	High volume processing
OpenRouter	50 req/day	Universal fallback access
Anthropic Claude	$5 trial	Complex reasoning
Moonshot Kimi	$5 signup	Coding, 128K context
DeepSeek	Pay-as-you-go	Cheapest reasoning
Cohere	1K calls/month	Embeddings

🚀 Quick Start

Prerequisites

OS: Debian 12, Ubuntu 22.04+, or Proxmox LXC
RAM: 4GB minimum (8GB recommended with IDE)
Storage: 20GB free space
Docker & Docker Compose

Installation

# 1. Clone the repository
git clone https://github.com/yourusername/llm-hub.git
cd llm-hub

# 2. Configure environment
cp .env.example .env
nano .env  # Add your API keys

# 3. Deploy
make setup
make start

Or use scripts directly:

chmod +x setup.sh start.sh
./setup.sh && ./start.sh full

🌐 Access Points

Once running, access the services at:

Service	URL	Description
📝 Web UI	`http://localhost:3000`	Chat interface with Open WebUI
💻 VS Code IDE	`http://localhost:8443`	Full IDE with AI assistant
🔌 Agent API	`http://localhost:8080/v1`	Main API endpoint
⚡ LiteLLM	`http://localhost:4000`	LLM Gateway & model management
🔧 MCP Tools	`http://localhost:8001/docs`	Tool OpenAPI documentation
🧠 ChromaDB	`http://localhost:8000`	Vector memory dashboard

🧠 Reasoning Modes

Choose the right reasoning strategy for your task:

Mode	Description	Speed	Accuracy	Best For
`react`	Iterative thought-action loops	⚡ Fast	⭐⭐⭐ Medium	Simple Q&A, debugging
`plan_execute`	Plan first, then execute	🚀 Medium	⭐⭐⭐⭐ High	Multi-step tasks, automation
`reflexion`	Self-correcting with verification	🐢 Slow	⭐⭐⭐⭐⭐ Very High	Code review, critical analysis
`auto`	Automatic mode selection	⚡ Variable	⭐⭐⭐⭐ Adaptive	General purpose

Set your default mode in .env:

DEFAULT_REASONING=auto

Or specify per-request:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Refactor this codebase",
    "reasoning_mode": "reflexion"
  }'

🛠️ Configuration

1. API Keys

Edit .env and add at least one provider:

# Required: At least one LLM provider
GROQ_API_KEY_1=gsk_xxx
MISTRAL_API_KEY=your_key

# Recommended: Multiple providers for redundancy
ANTHROPIC_API_KEY=sk-ant-xxx
MOONSHOT_API_KEY=sk-xxx
OPENROUTER_API_KEY=sk-or-xxx

2. Security

Change default passwords in .env:

# Code-Server access
IDE_PASSWORD=your-secure-password
IDE_SUDO_PASSWORD=your-admin-password

# API access
MASTER_KEY=sk-agent-$(openssl rand -hex 16)

3. Advanced Settings

# Enable self-reflection
ENABLE_REFLECTION=true

# Maximum iterations per request
MAX_ITERATIONS=10

# Knowledge graph (requires more RAM)
ENABLE_KNOWLEDGE_GRAPH=false

💻 Usage Examples

Python SDK

import requests

API_URL = "http://localhost:8080/v1"
API_KEY = "sk-agent-xxx"

response = requests.post(
    f"{API_URL}/chat/completions",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "message": "Create a Python script to fetch weather data",
        "reasoning_mode": "plan_execute",
        "session_id": "my-session-001"
    }
)

result = response.json()
print(result["response"])
print(f"Steps taken: {len(result['steps'])}")

cURL

# Simple query
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain quantum computing",
    "reasoning_mode": "react"
  }'

# Complex task with history
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-agent-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Build a REST API with FastAPI",
    "reasoning_mode": "plan_execute",
    "max_iterations": 15
  }'

OpenAI-Compatible API

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="sk-agent-xxx"
)

response = client.chat.completions.create(
    model="agent/orchestrator",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"reasoning_mode": "auto"}
)

📊 Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Agentic LLM Hub                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │   Web UI     │  │  VS Code IDE │  │    API       │     │
│  │  (:3000)     │  │   (:8443)    │  │  (:8080)     │     │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘     │
│         │                 │                 │              │
│         └─────────────────┼─────────────────┘              │
│                           │                                │
│              ┌────────────┴────────────┐                  │
│              │     Agent Core          │                  │
│              │  (Reasoning Engines)    │                  │
│              └────────────┬────────────┘                  │
│                           │                                │
│       ┌───────────────────┼───────────────────┐           │
│       │                   │                   │            │
│  ┌────┴────┐       ┌─────┴─────┐      ┌──────┴──────┐     │
│  │  Redis  │       │  LiteLLM  │      │  ChromaDB   │     │
│  │ (Cache) │       │ (Gateway) │      │  (Memory)   │     │
│  │(:6379)  │       │  (:4000)  │      │   (:8000)   │     │
│  └─────────┘       └─────┬─────┘      └─────────────┘     │
│                          │                                 │
│            ┌─────────────┼─────────────┐                  │
│            │             │             │                   │
│      ┌─────┴──┐    ┌─────┴──┐   ┌─────┴──┐               │
│      │  Groq  │    │ Claude │   │ Mistral│               │
│      └───┬────┘    └───┬────┘   └───┬────┘               │
│          │             │             │                     │
│      ┌───┴──┐     ┌────┴───┐   ┌────┴───┐                │
│      │ Kimi │     │DeepSeek│   │  ...   │                │
│      └──────┘     └────────┘   └────────┘                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

🧰 Management Commands

Use the Makefile for common operations:

make setup      # Initial setup
make start      # Start all services (full profile)
make start-ide  # Start with IDE only
make stop       # Stop all services
make logs       # View logs
make status     # Check service status
make update     # Pull latest and update images
make backup     # Backup data directories
make clean      # Remove containers (data preserved)

Or use Docker Compose profiles:

# Core services only
docker-compose up -d

# Full stack with IDE and UI
docker-compose --profile ide --profile ui up -d

# With MCP tools
docker-compose --profile mcp up -d

📚 Documentation

Setup Guide — Detailed installation and configuration
API Reference — Complete API documentation with examples
Provider Guide — Provider setup and rate limits

🔄 Updates

Update to the latest version:

# Automatic update
make update

# Or manual
git pull origin main
docker-compose pull
docker-compose up -d

🐛 Troubleshooting

Common Issues

Issue	Solution
Docker fails in LXC	Enable nesting: `features: nesting=1,keyctl=1`
Port conflicts	Edit `docker-compose.yml` port mappings
Permission denied	Run `chown -R 1000:1000 workspace/`
API not responding	Check `docker-compose logs agent-core`
Out of memory	Increase swap or reduce to core services only

Health Checks

# Check API health
curl http://localhost:8080/health

# Check LiteLLM
curl http://localhost:4000/health

# View all logs
make logs

# Check container status
docker-compose ps

🛡️ Security Considerations

Change default passwords in .env before deploying
Use HTTPS in production (reverse proxy recommended)
Restrict network access to admin ports (8080, 8443)
Rotate API keys regularly
Review provider rate limits to prevent unexpected costs

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

📄 License

This project is licensed under the MIT License.

⬆ Back to Top

Built with ❤️ for the self-hosting community

13 KiB Raw Permalink Blame History