Connect Any LLM to Bytebot with LiteLLM

LiteLLM acts as a unified proxy that lets you use 100+ LLM providers with Bytebot - including Azure OpenAI, AWS Bedrock, Anthropic, Hugging Face, Ollama, and more. This guide shows you how to set up LiteLLM with Bytebot.

Why Use LiteLLM?

100+ LLM Providers

Use Azure, AWS, GCP, Anthropic, OpenAI, Cohere, and local models

Cost Tracking

Monitor spending across all providers in one place

Load Balancing

Distribute requests across multiple models and providers

Fallback Models

Automatic failover when primary models are unavailable

Quick Start with Bytebot’s Built-in LiteLLM Proxy

Bytebot includes a pre-configured LiteLLM proxy service that makes it easy to use any LLM provider. Here’s how to set it up:

Use Docker Compose with Proxy

The easiest way is to use the proxy-enabled Docker Compose file:

# Clone Bytebot
git clone https://github.com/bytebot-ai/bytebot.git
cd bytebot

# Set up your API keys in docker/.env
cat > docker/.env << EOF
# Add any combination of these keys
ANTHROPIC_API_KEY=sk-ant-your-key-here
OPENAI_API_KEY=sk-your-key-here  
GEMINI_API_KEY=your-key-here
EOF

# Start Bytebot with LiteLLM proxy
docker-compose -f docker/docker-compose.proxy.yml up -d

This automatically:

Starts the bytebot-llm-proxy service on port 4000
Configures the agent to use the proxy via BYTEBOT_LLM_PROXY_URL
Makes all configured models available through the proxy

Customize Model Configuration

To add custom models or providers, edit the LiteLLM config:

# packages/bytebot-llm-proxy/litellm-config.yaml
model_list:
  # Add Azure OpenAI
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o-deployment
      api_base: https://your-resource.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"
  
  # Add AWS Bedrock
  - model_name: claude-bedrock
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet
      aws_region_name: us-east-1
  
  # Add local models via Ollama
  - model_name: local-llama
    litellm_params:
      model: ollama/llama3:70b
      api_base: http://host.docker.internal:11434

Then rebuild:

docker-compose -f docker/docker-compose.proxy.yml up -d --build

Verify Models are Available

The Bytebot agent automatically queries the proxy for available models:

# Check available models through Bytebot API
curl http://localhost:9991/tasks/models

# Or directly from LiteLLM proxy
curl http://localhost:4000/model/info

The UI will show all available models in the model selector.

How It Works

Architecture

Key Components

bytebot-llm-proxy Service: A LiteLLM instance running in Docker that:
- Runs on port 4000 within the Bytebot network
- Uses the config from packages/bytebot-llm-proxy/litellm-config.yaml
- Inherits API keys from environment variables
Agent Integration: The Bytebot agent:
- Checks for BYTEBOT_LLM_PROXY_URL environment variable
- If set, queries the proxy at /model/info for available models
- Routes all LLM requests through the proxy
Pre-configured Models: Out of the box support for:
- Anthropic: Claude Opus 4, Claude Sonnet 4
- OpenAI: GPT-4.1, GPT-4o
- Google: Gemini 2.5 Pro, Gemini 2.5 Flash

Provider Configurations

Azure OpenAI

model_list:
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/gpt-4o-deployment-name
      api_base: https://your-resource.openai.azure.com/
      api_key: your-azure-key
      api_version: "2024-02-15-preview"
  
  - model_name: azure-gpt-4o-vision
    litellm_params:
      model: azure/gpt-4o-deployment-name
      api_base: https://your-resource.openai.azure.com/
      api_key: your-azure-key
      api_version: "2024-02-15-preview"
      supports_vision: true

AWS Bedrock

model_list:
  - model_name: claude-bedrock
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
      aws_region_name: us-east-1
      # Uses AWS credentials from environment
  
  - model_name: llama-bedrock
    litellm_params:
      model: bedrock/meta.llama3-70b-instruct-v1:0
      aws_region_name: us-east-1

Google Vertex AI

model_list:
  - model_name: gemini-vertex
    litellm_params:
      model: vertex_ai/gemini-1.5-pro
      vertex_project: your-gcp-project
      vertex_location: us-central1
      # Uses GCP credentials from environment

Local Models (Ollama)

model_list:
  - model_name: local-llama
    litellm_params:
      model: ollama/llama3:70b
      api_base: http://ollama:11434
  
  - model_name: local-mixtral
    litellm_params:
      model: ollama/mixtral:8x7b
      api_base: http://ollama:11434

Hugging Face

model_list:
  - model_name: hf-llama
    litellm_params:
      model: huggingface/meta-llama/Llama-3-70b-chat-hf
      api_key: hf_your_token

Advanced Features

Load Balancing

Distribute requests across multiple providers:

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: sk-openai-key
  
  - model_name: gpt-4o  # Same name for load balancing
    litellm_params:
      model: azure/gpt-4o
      api_base: https://azure.openai.azure.com/
      api_key: azure-key

router_settings:
  routing_strategy: "least-busy"  # or "round-robin", "latency-based"

Fallback Models

Configure automatic failover:

model_list:
  - model_name: primary-model
    litellm_params:
      model: claude-3-5-sonnet-20241022
      api_key: sk-ant-key
  
  - model_name: fallback-model
    litellm_params:
      model: gpt-4o
      api_key: sk-openai-key

router_settings:
  model_group_alias:
    "smart-model": ["primary-model", "fallback-model"]
  
# Use "smart-model" in Bytebot config

Cost Controls

Set spending limits and track usage:

general_settings:
  master_key: sk-litellm-master
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  
  # Budget limits
  max_budget: 100  # $100 monthly limit
  budget_duration: "30d"
  
  # Per-model limits
  model_max_budget:
    gpt-4o: 50
    claude-3-5-sonnet: 50

litellm_settings:
  callbacks: ["langfuse"]  # For detailed tracking

Rate Limiting

Prevent API overuse:

model_list:
  - model_name: rate-limited-gpt
    litellm_params:
      model: gpt-4o
      api_key: sk-key
      rpm: 100  # Requests per minute
      tpm: 100000  # Tokens per minute

Alternative Setup: External LiteLLM Proxy

If you prefer to run LiteLLM separately or have an existing LiteLLM deployment:

Option 1: Modify docker-compose.yml

# docker-compose.yml (without built-in proxy)
services:
  bytebot-agent:
    environment:
      # Point to your external LiteLLM instance
      - BYTEBOT_LLM_PROXY_URL=http://your-litellm-server:4000
    # ... rest of config

Option 2: Use Environment Variable

# Set the proxy URL before starting
export BYTEBOT_LLM_PROXY_URL=http://your-litellm-server:4000

# Start normally
docker-compose -f docker/docker-compose.yml up -d

Option 3: Run Standalone LiteLLM

# Run your own LiteLLM instance
docker run -d \
  --name litellm-external \
  -p 4000:4000 \
  -v $(pwd)/custom-config.yaml:/app/config.yaml \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  ghcr.io/berriai/litellm:main \
  --config /app/config.yaml

# Then start Bytebot with:
export BYTEBOT_LLM_PROXY_URL=http://localhost:4000
docker-compose up -d

Kubernetes Setup

Deploy with Helm:

# litellm-values.yaml
replicaCount: 2

image:
  repository: ghcr.io/berriai/litellm
  tag: main

service:
  type: ClusterIP
  port: 4000

config:
  model_list:
    - model_name: claude-3-5-sonnet
      litellm_params:
        model: claude-3-5-sonnet-20241022
        api_key: ${ANTHROPIC_API_KEY}
  
  general_settings:
    master_key: ${LITELLM_MASTER_KEY}

# Then in Bytebot values.yaml:
agent:
  openai:
    enabled: true
    apiKey: "${LITELLM_MASTER_KEY}"
    baseUrl: "http://litellm:4000/v1"
    model: "claude-3-5-sonnet"

Monitoring & Debugging

LiteLLM Dashboard

Access metrics and logs:

# Port forward to dashboard
kubectl port-forward svc/litellm 4000:4000

# Access at http://localhost:4000/ui
# Login with your master_key

Debug Requests

Enable detailed logging:

litellm_settings:
  debug: true
  detailed_debug: true
  
general_settings:
  master_key: sk-key
  store_model_in_db: true  # Store request history

Common Issues

Model not found

Check model name matches exactly:

curl http://localhost:4000/v1/models \
  -H "Authorization: Bearer sk-key"

Authentication errors

Verify master key in both LiteLLM and Bytebot:

# Test LiteLLM
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model", "messages": [{"role": "user", "content": "test"}]}'

Slow responses

Check latency per provider:

router_settings:
  routing_strategy: "latency-based"
  enable_pre_call_checks: true

Best Practices

Model Selection for Bytebot

Choose models with strong vision capabilities for best results:

Recommended
Budget Options
Local Models

Claude 3.5 Sonnet (Best overall)
GPT-4o (Good vision + reasoning)
Gemini 1.5 Pro (Large context)

Performance Optimization

# Optimize for Bytebot workloads
router_settings:
  routing_strategy: "latency-based"
  cooldown_time: 60  # Seconds before retrying failed provider
  num_retries: 2
  request_timeout: 600  # 10 minutes for complex tasks
  
  # Cache for repeated requests
  cache: true
  cache_params:
    type: "redis"
    host: "redis"
    port: 6379
    ttl: 3600  # 1 hour

Security

general_settings:
  master_key: ${LITELLM_MASTER_KEY}
  
  # IP allowlist
  allowed_ips: ["10.0.0.0/8", "172.16.0.0/12"]
  
  # Audit logging
  store_model_in_db: true
  
  # Encryption
  encrypt_keys: true
  
  # Headers to forward
  forward_headers: ["X-Request-ID", "X-User-ID"]

Next Steps

Supported Models

Full list of 100+ providers

LiteLLM Proxy Docs

Official LiteLLM proxy server documentation

LiteLLM Docs

Complete LiteLLM documentation

Pro tip: Start with a single provider, then add more as needed. LiteLLM makes it easy to switch or combine models without changing Bytebot configuration.

Getting Started

User Guides

Deployment

Core Concepts

​Connect Any LLM to Bytebot with LiteLLM

​Why Use LiteLLM?

100+ LLM Providers

Cost Tracking

Load Balancing

Fallback Models

​Quick Start with Bytebot’s Built-in LiteLLM Proxy

​How It Works

​Architecture

​Key Components

​Provider Configurations

​Azure OpenAI

​AWS Bedrock

​Google Vertex AI

​Local Models (Ollama)

​Hugging Face

​Advanced Features

​Load Balancing

​Fallback Models

​Cost Controls

​Rate Limiting

​Alternative Setup: External LiteLLM Proxy

​Option 1: Modify docker-compose.yml

​Option 2: Use Environment Variable

​Option 3: Run Standalone LiteLLM

​Kubernetes Setup

​Monitoring & Debugging

​LiteLLM Dashboard

​Debug Requests

​Common Issues

​Best Practices

​Model Selection for Bytebot

​Performance Optimization

​Security

​Next Steps

Supported Models

LiteLLM Proxy Docs

LiteLLM Docs

Connect Any LLM to Bytebot with LiteLLM

Why Use LiteLLM?

Quick Start with Bytebot’s Built-in LiteLLM Proxy

How It Works

Architecture

Key Components

Provider Configurations

Azure OpenAI

AWS Bedrock

Google Vertex AI

Local Models (Ollama)

Hugging Face

Advanced Features

Load Balancing

Fallback Models

Cost Controls

Rate Limiting

Alternative Setup: External LiteLLM Proxy

Option 1: Modify docker-compose.yml

Option 2: Use Environment Variable

Option 3: Run Standalone LiteLLM

Kubernetes Setup

Monitoring & Debugging

LiteLLM Dashboard

Debug Requests

Common Issues

Best Practices

Model Selection for Bytebot

Performance Optimization

Security

Next Steps