v1.70.1-stable - Gemini Realtime API Support

May 17, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.70.1-stable

pip install litellm

pip install litellm==1.70.1

Key Highlights

LiteLLM v1.70.1-stable is live now. Here are the key highlights of this release:

Gemini Realtime API: You can now call Gemini's Live API via the OpenAI /v1/realtime API
Spend Logs Retention Period: Enable deleting spend logs older than a certain period.
PII Masking 2.0: Easily configure masking or blocking specific PII/PHI entities on the UI

Gemini Realtime API

This release brings support for calling Gemini's realtime models (e.g. gemini-2.0-flash-live) via OpenAI's /v1/realtime API. This is great for developers as it lets them easily switch from OpenAI to Gemini by just changing the model name.

Key Highlights:

Support for text + audio input/output
Support for setting session configurations (modality, instructions, activity detection) in the OpenAI format
Support for logging + usage tracking for realtime sessions

This is currently supported via Google AI Studio. We plan to release VertexAI support over the coming week.

Read more

Spend Logs Retention Period

This release enables deleting LiteLLM Spend Logs older than a certain period. Since we now enable storing the raw request/response in the logs, deleting old logs ensures the database remains performant in production.

Read more

PII Masking 2.0

This release brings improvements to our Presidio PII Integration. As a Proxy Admin, you now have the ability to:

Mask or block specific entities (e.g., block medical licenses while masking other entities like emails).
Monitor guardrails in production. LiteLLM Logs will now show you the guardrail run, the entities it detected, and its confidence score for each entity.

Read more

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- /chat/completion
  - Handle audio input - PR
  - Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
  - Capture reasoning tokens in streaming mode - PR
Google AI Studio
- /realtime
  - Gemini Multimodal Live API support
  - Audio input/output support, optional param mapping, accurate usage calculation - PR
VertexAI
- /chat/completion
  - Fix llama streaming error - where model response was nested in returned streaming chunk - PR
Ollama
- /chat/completion
  - structure responses fix - PR
Bedrock
- /chat/completion
  - Handle thinking_blocks when assistant.content is None - PR
  - Fixes to only allow accepted fields for tool json schema - PR
  - Add bedrock sonnet prompt caching cost information
  - Mistral Pixtral support - PR
  - Tool caching support - PR
- /messages
  - allow using dynamic AWS Params - PR
Nvidia NIM
- /chat/completion
  - Add tools, tool_choice, parallel_tool_calls support - PR
Novita AI
- New Provider added for /chat/completion routes - PR
Azure
- /image/generation
  - Fix azure dall e 3 call with custom model name - PR
Cohere
- /embeddings
  - Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
Anthropic
- /chat/completion
  - Web search tool support - native + openai format - Get Started
VLLM
- /embeddings
  - Support embedding input as list of integers
OpenAI
- /chat/completion
  - Fix - b64 file data input handling - Get Started
  - Add ‘supports_pdf_input’ to all vision models - PR

LLM API Endpoints

Responses API
- Fix delete API support - PR
Rerank API
- /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR

Spend Tracking Improvements

/chat/completion, /messages
- Anthropic - web search tool cost tracking - PR
- Groq - update model max tokens + cost information - PR
/audio/transcription
- Azure - Add gpt-4o-mini-tts pricing - PR
- Proxy - Fix tracking spend by tag - PR
/embeddings
- Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI

Models
- Ollama - adds api base param to UI
Logs
- Add team id, key alias, key hash filter on logs - https://github.com/BerriAI/litellm/pull/10831
- Guardrail tracing now in Logs UI - https://github.com/BerriAI/litellm/pull/10893
Teams
- Patch for updating team info when team in org and members not in org - https://github.com/BerriAI/litellm/pull/10835
Guardrails
- Add Bedrock, Presidio, Lakers guardrails on UI - https://github.com/BerriAI/litellm/pull/10874
- See guardrail info page - https://github.com/BerriAI/litellm/pull/10904
- Allow editing guardrails on UI - https://github.com/BerriAI/litellm/pull/10907
Test Key
- select guardrails to test on UI

Logging / Alerting Integrations

StandardLoggingPayload
- Log any x- headers in requester metadata - Get Started
- Guardrail tracing now in standard logging payload - Get Started
Generic API Logger
- Support passing application/json header
Arize Phoenix
- fix: URL encode OTEL_EXPORTER_OTLP_TRACES_HEADERS for Phoenix Integration - PR
- add guardrail tracing to OTEL, Arize phoenix - PR
PagerDuty
- Pagerduty is now a free feature - PR
Alerting
- Sending slack alerts on virtual key/user/team updates is now free - PR

Guardrails

Guardrails
- New /apply_guardrail endpoint for directly testing a guardrail - PR
Lakera
- /v2 endpoints support - PR
Presidio
- Fixes handling of message content on presidio guardrail integration - PR
- Allow specifying PII Entities Config - PR
Aim Security
- Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements

Allow overriding all constants using a .env variable - PR
Maximum retention period for spend logs
- Add retention flag to config - PR
- Support for cleaning up logs based on configured time period - PR

General Proxy Improvements

Authentication
- Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
Proxy CLI
- Add models import command - PR
OpenWebUI
- Configure LiteLLM to Parse User Headers from Open Web UI
LiteLLM Proxy w/ LiteLLM SDK
- Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors

@imdigitalashish made their first contribution in PR #10617
@LouisShark made their first contribution in PR #10688
@OscarSavNS made their first contribution in PR #10764
@arizedatngo made their first contribution in PR #10654
@jugaldb made their first contribution in PR #10805
@daikeren made their first contribution in PR #10781
@naliotopier made their first contribution in PR #10077
@damienpontifex made their first contribution in PR #10813
@Dima-Mediator made their first contribution in PR #10789
@igtm made their first contribution in PR #10814
@shibaboy made their first contribution in PR #10752
@camfarineau made their first contribution in PR #10629
@ajac-zero made their first contribution in PR #10439
@damgem made their first contribution in PR #9802
@hxdror made their first contribution in PR #10757
@wwwillchen made their first contribution in PR #10894

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Git Diff

v1.69.0-stable - Loadbalance Batch API Models

May 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.69.0-stable

pip install litellm

pip install litellm==1.69.0.post1

Key Highlights

LiteLLM v1.69.0-stable brings the following key improvements:

Loadbalance Batch API Models: Easily loadbalance across multiple azure batch deployments using LiteLLM Managed Files
Email Invites 2.0: Send new users onboarded to LiteLLM an email invite.
Nscale: LLM API for compliance with European regulations.
Bedrock /v1/messages: Use Bedrock Anthropic models with Anthropic's /v1/messages.

Batch API Load Balancing

This release brings LiteLLM Managed File support to Batches. This is great for:

Proxy Admins: You can now control which Batch models users can call.
Developers: You no longer need to know the Azure deployment name when creating your batch .jsonl files - just specify the model your LiteLLM key has access to.

Over time, we expect LiteLLM Managed Files to be the way most teams use Files across /chat/completions, /batch, /fine_tuning endpoints.

Read more here

Email Invites

This release brings the following improvements to our email invite integration:

New templates for user invited and key created events.
Fixes for using SMTP email providers.
Native support for Resend API.
Ability for Proxy Admins to control email events.

For LiteLLM Cloud Users, please reach out to us if you want this enabled for your instance.

Read more here

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- Added gemini-2.5-pro-preview-05-06 models with pricing and context window info - PR
- Set correct context window length for all Gemini 2.5 variants - PR
Perplexity:
- Added new Perplexity models - PR
- Added sonar-deep-research model pricing - PR
Azure OpenAI:
- Fixed passing through of azure_ad_token_provider parameter - PR
OpenAI:
- Added support for pdf url's in 'file' parameter - PR
Sagemaker:
- Fix content length for sagemaker_chat provider - PR
Azure AI Foundry:
- Added cost tracking for the following models PR
  - DeepSeek V3 0324
  - Llama 4 Scout
  - Llama 4 Maverick
Bedrock:
- Added cost tracking for Bedrock Llama 4 models - PR
- Fixed template conversion for Llama 4 models in Bedrock - PR
- Added support for using Bedrock Anthropic models with /v1/messages format - PR
- Added streaming support for Bedrock Anthropic models with /v1/messages format - PR
OpenAI: Added reasoning_effort support for o3 models - PR
Databricks:
- Fixed issue when Databricks uses external model and delta could be empty - PR
Cerebras: Fixed Llama-3.1-70b model pricing and context window - PR
Ollama:
- Fixed custom price cost tracking and added 'max_completion_token' support - PR
- Fixed KeyError when using JSON response format - PR
🆕 Nscale:
- Added support for chat, image generation endpoints - PR

LLM API Endpoints

Messages API:
- 🆕 Added support for using Bedrock Anthropic models with /v1/messages format - PR and streaming support - PR
Moderations API:
- Fixed bug to allow using LiteLLM UI credentials for /moderations API - PR
Realtime API:
- Fixed setting 'headers' in scope for websocket auth requests and infinite loop issues - PR
Files API:
- Unified File ID output support - PR
- Support for writing files to all deployments - PR
- Added target model name validation - PR
Batches API:
- Complete unified batch ID support - replacing model in jsonl to be deployment model name - PR
- Beta support for unified file ID (managed files) for batches - PR

Spend Tracking / Budget Improvements

Bug Fix - PostgreSQL Integer Overflow Error in DB Spend Tracking - PR

Management Endpoints / UI

Models
- Fixed model info overwriting when editing a model on UI - PR
- Fixed team admin model updates and organization creation with specific models - PR
Logs:
- Bug Fix - copying Request/Response on Logs Page - PR
- Bug Fix - log did not remain in focus on QA Logs page + text overflow on error logs - PR
- Added index for session_id on LiteLLM_SpendLogs for better query performance - PR
User Management:
- Added user management functionality to Python client library & CLI - PR
- Bug Fix - Fixed SCIM token creation on Admin UI - PR
- Bug Fix - Added 404 response when trying to delete verification tokens that don't exist - PR

Logging / Guardrail Integrations

Custom Logger API: v2 Custom Callback API (send llm logs to custom api) - PR, Get Started
OpenTelemetry:
- Fixed OpenTelemetry to follow genai semantic conventions + support for 'instructions' param for TTS - PR
Bedrock PII:
- Add support for PII Masking with bedrock guardrails - Get Started, PR
Documentation:
- Added documentation for StandardLoggingVectorStoreRequest - PR

Performance / Reliability Improvements

Python Compatibility:
- Added support for Python 3.11- (fixed datetime UTC handling) - PR
- Fixed UnicodeDecodeError: 'charmap' on Windows during litellm import - PR
Caching:
- Fixed embedding string caching result - PR
- Fixed cache miss for Gemini models with response_format - PR

General Proxy Improvements

Proxy CLI:
- Added --version flag to litellm-proxy CLI - PR
- Added dedicated litellm-proxy CLI - PR
Alerting:
- Fixed Slack alerting not working when using a DB - PR
Email Invites:
- Added V2 Emails with fixes for sending emails when creating keys + Resend API support - PR
- Added user invitation emails - PR
- Added endpoints to manage email settings - PR
General:
- Fixed bug where duplicate JSON logs were getting emitted - PR

New Contributors

@zoltan-ongithub made their first contribution in PR #10568
@mkavinkumar1 made their first contribution in PR #10548
@thomelane made their first contribution in PR #10549
@frankzye made their first contribution in PR #10540
@aholmberg made their first contribution in PR #10591
@aravindkarnam made their first contribution in PR #10611
@xsg22 made their first contribution in PR #10648
@casparhsws made their first contribution in PR #10635
@hypermoose made their first contribution in PR #10370
@tomukmatthews made their first contribution in PR #10638
@keyute made their first contribution in PR #10652
@GPTLocalhost made their first contribution in PR #10687
@husnain7766 made their first contribution in PR #10697
@claralp made their first contribution in PR #10694
@mollux made their first contribution in PR #10690

v1.68.0-stable

May 3, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.68.0-stable

pip install litellm

pip install litellm==1.68.0.post1

Key Highlights

LiteLLM v1.68.0-stable will be live soon. Here are the key highlights of this release:

Bedrock Knowledge Base: You can now call query your Bedrock Knowledge Base with all LiteLLM models via /chat/completion or /responses API.
Rate Limits: This release brings accurate rate limiting across multiple instances, reducing spillover to at most 10 additional requests in high traffic.
Meta Llama API: Added support for Meta Llama API Get Started
LlamaFile: Added support for LlamaFile Get Started

Bedrock Knowledge Base (Vector Store)

This release adds support for Bedrock vector stores (knowledge bases) in LiteLLM. With this update, you can:

Use Bedrock vector stores in the OpenAI /chat/completions spec with all LiteLLM supported models.
View all available vector stores through the LiteLLM UI or API.
Configure vector stores to be always active for specific models.
Track vector store usage in LiteLLM Logs.

For the next release we plan on allowing you to set key, user, team, org permissions for vector stores.

Read more here

Rate Limiting

This release brings accurate multi-instance rate limiting across keys/users/teams. Outlining key engineering changes below:

Change: Instances now increment cache value instead of setting it. To avoid calling Redis on each request, this is synced every 0.01s.
Accuracy: In testing, we saw a maximum spill over from expected of 10 requests, in high traffic (100 RPS, 3 instances), vs. current 189 request spillover
Performance: Our load tests show this to reduce median response time by 100ms in high traffic

This is currently behind a feature flag, and we plan to have this be the default by next week. To enable this today, just add this environment variable:

export LITELLM_RATE_LIMIT_ACCURACY=true

Read more here

New Models / Updated Models

Gemini (VertexAI + Google AI Studio)
- Handle more json schema - openapi schema conversion edge cases PR
- Tool calls - return ‘finish_reason=“tool_calls”’ on gemini tool calling response PR
VertexAI
- Meta/llama-4 model support PR
- Meta/llama3 - handle tool call result in content PR
- Meta/* - return ‘finish_reason=“tool_calls”’ on tool calling response PR
Bedrock
- Image Generation - Support new ‘stable-image-core’ models - PR
- Knowledge Bases - support using Bedrock knowledge bases with /chat/completions PR
- Anthropic - add ‘supports_pdf_input’ for claude-3.7-bedrock models PR, Get Started
OpenAI
- Support OPENAI_BASE_URL in addition to OPENAI_API_BASE PR
- Correctly re-raise 504 timeout errors PR
- Native Gpt-4o-mini-tts support PR
🆕 Meta Llama API provider PR
🆕 LlamaFile provider PR

LLM API Endpoints

Response API
- Fix for handling multi turn sessions PR
Embeddings
- Caching fixes - PR
  - handle str -> list cache
  - Return usage tokens for cache hit
  - Combine usage tokens on partial cache hits
🆕 Vector Stores
- Allow defining Vector Store Configs - PR
- New StandardLoggingPayload field for requests made when a vector store is used - PR
- Show Vector Store / KB Request on LiteLLM Logs Page - PR
- Allow using vector store in OpenAI API spec with tools - PR
MCP
- Ensure Non-Admin virtual keys can access /mcp routes - PR
  Note: Currently, all Virtual Keys are able to access the MCP endpoints. We are working on a feature to allow restricting MCP access by keys/teams/users/orgs. Follow here for updates.
Moderations
- Add logging callback support for /moderations API - PR

Spend Tracking / Budget Improvements

OpenAI
- computer-use-preview cost tracking / pricing PR
- gpt-4o-mini-tts input cost tracking - PR
Fireworks AI - pricing updates - new 0-4b model pricing tier + llama4 model pricing
Budgets
- Budget resets now happen as start of day/week/month - PR
- Trigger Soft Budget Alerts When Key Crosses Threshold - PR
Token Counting
- Rewrite of token_counter() function to handle to prevent undercounting tokens - PR

Management Endpoints / UI

Virtual Keys
- Fix filtering on key alias - PR
- Support global filtering on keys - PR
- Pagination - fix clicking on next/back buttons on table - PR
Models
- Triton - Support adding model/provider on UI - PR
- VertexAI - Fix adding vertex models with reusable credentials - PR
- LLM Credentials - show existing credentials for easy editing - PR
Teams
- Allow reassigning team to other org - PR
Organizations
- Fix showing org budget on table - PR

Logging / Guardrail Integrations

Langsmith
- Respect langsmith_batch_size param - PR

Performance / Loadbalancing / Reliability improvements

Redis
- Ensure all redis queues are periodically flushed, this fixes an issue where redis queue size was growing indefinitely when request tags were used - PR
Rate Limits
- Multi-instance rate limiting support across keys/teams/users/customers - PR, PR, PR
Azure OpenAI OIDC
- allow using litellm defined params for OIDC Auth - PR

General Proxy Improvements

Security
- Allow blocking web crawlers - PR
Auth
- Support x-litellm-api-key header param by default, this fixes an issue from the prior release where x-litellm-api-key was not being used on vertex ai passthrough requests - PR
- Allow key at max budget to call non-llm api endpoints - PR
🆕 Python Client Library for LiteLLM Proxy management endpoints
- Initial PR - PR
- Support for doing HTTP requests - PR
Dependencies
- Don’t require uvloop for windows - PR

v1.67.4-stable - Improved User Management

April 26, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.67.4-stable

pip install litellm

pip install litellm==1.67.4.post1

Key Highlights

Improved User Management: This release enables search and filtering across users, keys, teams, and models.
Responses API Load Balancing: Route requests across provider regions and ensure session continuity.
UI Session Logs: Group several requests to LiteLLM into a session.

Improved User Management

This release makes it easier to manage users and keys on LiteLLM. You can now search and filter across users, keys, teams, and models, and control user settings more easily.

New features include:

Search for users by email, ID, role, or team.
See all of a user's models, teams, and keys in one place.
Change user roles and model access right from the Users Tab.

These changes help you spend less time on user setup and management on LiteLLM.

Responses API Load Balancing

This release introduces load balancing for the Responses API, allowing you to route requests across provider regions and ensure session continuity. It works as follows:

If a previous_response_id is provided, LiteLLM will route the request to the original deployment that generated the prior response — ensuring session continuity.
If no previous_response_id is provided, LiteLLM will load-balance requests across your available deployments.

UI Session Logs

This release allow you to group requests to LiteLLM proxy into a session. If you specify a litellm_session_id in your request LiteLLM will automatically group all logs in the same session. This allows you to easily track usage and request content per session.

OpenAI
1. Added gpt-image-1 cost tracking Get Started
2. Bug fix: added cost tracking for gpt-image-1 when quality is unspecified PR
Azure
1. Fixed timestamp granularities passing to whisper in Azure Get Started
2. Added azure/gpt-image-1 pricing Get Started, PR
3. Added cost tracking for azure/computer-use-preview, azure/gpt-4o-audio-preview-2024-12-17, azure/gpt-4o-mini-audio-preview-2024-12-17 PR
Bedrock
1. Added support for all compatible Bedrock parameters when model="arn:.." (Bedrock application inference profile models) Get started, PR
2. Fixed wrong system prompt transformation PR
VertexAI / Google AI Studio
1. Allow setting budget_tokens=0 for gemini-2.5-flash Get Started,PR
2. Ensure returned usage includes thinking token usage PR
3. Added cost tracking for gemini-2.5-pro-preview-03-25 PR
Cohere
1. Added support for cohere command-a-03-2025 Get Started, PR
SageMaker
1. Added support for max_completion_tokens parameter Get Started, PR
Responses API
1. Added support for GET and DELETE operations - /v1/responses/{response_id} Get Started
2. Added session management support for non-OpenAI models PR
3. Added routing affinity to maintain model consistency within sessions Get Started, PR

Spend Tracking Improvements

Bug Fix: Fixed spend tracking bug, ensuring default litellm params aren't modified in memory PR
Deprecation Dates: Added deprecation dates for Azure, VertexAI models PR

Management Endpoints / UI

Users

Filtering and Searching:
- Filter users by user_id, role, team, sso_id
- Search users by email
User Info Panel: Added a new user information pane PR
- View teams, keys, models associated with User
- Edit user role, model permissions

Teams

Filtering and Searching:
- Filter teams by Organization, Team ID PR
- Search teams by Team Name PR

Keys

Key Management:
- Support for cross-filtering and filtering by key hash PR
- Fixed key alias reset when resetting filters PR
- Fixed table rendering on key creation PR

UI Logs Page

Session Logs: Added UI Session Logs Get Started

UI Authentication & Security

Required Authentication: Authentication now required for all dashboard pages PR
SSO Fixes: Fixed SSO user login invalid token error PR
[BETA] Encrypted Tokens: Moved UI to encrypted token usage PR
Token Expiry: Support token refresh by re-routing to login page (fixes issue where expired token would show a blank page) PR

UI General fixes

Fixed UI Flicker: Addressed UI flickering issues in Dashboard PR
Improved Terminology: Better loading and no-data states on Keys and Tools pages PR
Azure Model Support: Fixed editing Azure public model names and changing model names after creation PR
Team Model Selector: Bug fix for team model selection PR

Logging / Guardrail Integrations

Datadog:
1. Fixed Datadog LLM observability logging Get Started, PR
Prometheus / Grafana:
1. Enable datasource selection on LiteLLM Grafana Template Get Started, PR
AgentOps:
1. Added AgentOps Integration Get Started, PR
Arize:
1. Added missing attributes for Arize & Phoenix Integration Get Started, PR

General Proxy Improvements

Caching: Fixed caching to account for thinking or reasoning_effort when calculating cache key PR
Model Groups: Fixed handling for cases where user sets model_group inside model_info PR
Passthrough Endpoints: Ensured PassthroughStandardLoggingPayload is logged with method, URL, request/response body PR
Fix SQL Injection: Fixed potential SQL injection vulnerability in spend_management_endpoints.py PR

Helm

Fixed serviceAccountName on migration job PR

Full Changelog

The complete list of changes can be found in the GitHub release notes.

v1.67.0-stable - SCIM Integration

April 19, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Key Highlights

SCIM Integration: Enables identity providers (Okta, Azure AD, OneLogin, etc.) to automate user and team (group) provisioning, updates, and deprovisioning
Team and Tag based usage tracking: You can now see usage and spend by team and tag at 1M+ spend logs.
Unified Responses API: Support for calling Anthropic, Gemini, Groq, etc. via OpenAI's new Responses API.

Let's dive in.

SCIM Integration

This release adds SCIM support to LiteLLM. This allows your SSO provider (Okta, Azure AD, etc) to automatically create/delete users, teams, and memberships on LiteLLM. This means that when you remove a team on your SSO provider, your SSO provider will automatically delete the corresponding team on LiteLLM.

Team and Tag based usage tracking

This release improves team and tag based usage tracking at 1m+ spend logs, making it easy to monitor your LLM API Spend in production. This covers:

View daily spend by teams + tags
View usage / spend by key, within teams
View spend by multiple tags
Allow internal users to view spend of teams they're a member of

Unified Responses API

This release allows you to call Azure OpenAI, Anthropic, AWS Bedrock, and Google Vertex AI models via the POST /v1/responses endpoint on LiteLLM. This means you can now use popular tools like OpenAI Codex with your own models.

OpenAI
1. gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing - Get Started, PR
2. o4 - correctly map o4 to openai o_series model
Azure AI
1. Phi-4 output cost per token fix - PR
2. Responses API support Get Started,PR
Anthropic
1. redacted message thinking support - Get Started,PR
Cohere
1. /v2/chat Passthrough endpoint support w/ cost tracking - Get Started, PR
Azure
1. Support azure tenant_id/client_id env vars - Get Started, PR
2. Fix response_format check for 2025+ api versions - PR
3. Add gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3, o3-mini, o4-mini pricing
VLLM
1. Files - Support 'file' message type for VLLM video url's - Get Started, PR
2. Passthrough - new /vllm/ passthrough endpoint support Get Started, PR
Mistral
1. new /mistral passthrough endpoint support Get Started, PR
AWS
1. New mapped bedrock regions - PR
VertexAI / Google AI Studio
1. Gemini - Response format - Retain schema field ordering for google gemini and vertex by specifying propertyOrdering - Get Started, PR
2. Gemini-2.5-flash - return reasoning content Google AI Studio, Vertex AI
3. Gemini-2.5-flash - pricing + model information PR
4. Passthrough - new /vertex_ai/discovery route - enables calling AgentBuilder API routes Get Started, PR
Fireworks AI
1. return tool calling responses in tool_calls field (fireworks incorrectly returns this as a json str in content) PR
Triton
1. Remove fixed remove bad_words / stop words from /generate call - Get Started, PR
Other
1. Support for all litellm providers on Responses API (works with Codex) - Get Started, PR
2. Fix combining multiple tool calls in streaming response - Get Started, PR

Spend Tracking Improvements

Cost Control - inject cache control points in prompt for cost reduction Get Started, PR
Spend Tags - spend tags in headers - support x-litellm-tags even if tag based routing not enabled Get Started, PR
Gemini-2.5-flash - support cost calculation for reasoning tokens PR

Management Endpoints / UI

Users
1. Show created_at and updated_at on users page - PR
Virtual Keys
1. Filter by key alias - https://github.com/BerriAI/litellm/pull/10085
Usage Tab
1. Team based usage
  - New LiteLLM_DailyTeamSpend Table for aggregate team based usage logging - PR
  - New Team based usage dashboard + new /team/daily/activity API - PR
  - Return team alias on /team/daily/activity API - PR
  - allow internal user view spend for teams they belong to - PR
  - allow viewing top keys by team - PR
2. Tag Based Usage
  - New LiteLLM_DailyTagSpend Table for aggregate tag based usage logging - PR
  - Restrict to only Proxy Admins - PR
  - allow viewing top keys by tag
  - Return tags passed in request (i.e. dynamic tags) on /tag/list API - PR
3. Track prompt caching metrics in daily user, team, tag tables - PR
4. Show usage by key (on all up, team, and tag usage dashboards) - PR
5. swap old usage with new usage tab
Models
1. Make columns resizable/hideable - PR
API Playground
1. Allow internal user to call api playground - PR
SCIM
1. Add LiteLLM SCIM Integration for Team and User management - Get Started, PR

Logging / Guardrail Integrations

GCS
1. Fix gcs pub sub logging with env var GCS_PROJECT_ID - Get Started, PR
AIM
1. Add litellm call id passing to Aim guardrails on pre and post-hooks calls - Get Started, PR
Azure blob storage
1. Ensure logging works in high throughput scenarios - Get Started, PR

General Proxy Improvements

Support setting litellm.modify_params via env var PR
Model Discovery - Check provider’s /models endpoints when calling proxy’s /v1/models endpoint - Get Started, PR
/utils/token_counter - fix retrieving custom tokenizer for db models - Get Started, PR
Prisma migrate - handle existing columns in db table - PR

v1.66.0-stable - Realtime API Cost Tracking

April 12, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.66.0-stable

pip install litellm

pip install litellm==1.66.0.post1

v1.66.0-stable is live now, here are the key highlights of this release

Key Highlights

Realtime API Cost Tracking: Track cost of realtime API calls
Microsoft SSO Auto-sync: Auto-sync groups and group members from Azure Entra ID to LiteLLM
xAI grok-3: Added support for xai/grok-3 models
Security Fixes: Fixed CVE-2025-0330 and CVE-2024-6825 vulnerabilities

Let's dive in.

Realtime API Cost Tracking

This release adds Realtime API logging + cost tracking.

Logging: LiteLLM now logs the complete response from realtime calls to all logging integrations (DB, S3, Langfuse, etc.)
Cost Tracking: You can now set 'base_model' and custom pricing for realtime models. Custom Pricing
Budgets: Your key/user/team budgets now work for realtime models as well.

Start here

Microsoft SSO Auto-sync

Auto-sync groups and members from Azure Entra ID to LiteLLM

This release adds support for auto-syncing groups and members on Microsoft Entra ID with LiteLLM. This means that LiteLLM proxy administrators can spend less time managing teams and members and LiteLLM handles the following:

Auto-create teams that exist on Microsoft Entra ID
Sync team members on Microsoft Entra ID with LiteLLM teams

Get started with this here

New Models / Updated Models

xAI
1. Added reasoning_effort support for xai/grok-3-mini-beta Get Started
2. Added cost tracking for xai/grok-3 models PR
Hugging Face
1. Added inference providers support Get Started
Azure
1. Added azure/gpt-4o-realtime-audio cost tracking PR
VertexAI
1. Added enterpriseWebSearch tool support Get Started
2. Moved to only passing keys accepted by the Vertex AI response schema PR
Google AI Studio
1. Added cost tracking for gemini-2.5-pro PR
2. Fixed pricing for 'gemini/gemini-2.5-pro-preview-03-25' PR
3. Fixed handling file_data being passed in PR
Azure
1. Updated Azure Phi-4 pricing PR
2. Added azure/gpt-4o-realtime-audio cost tracking PR
Databricks
1. Removed reasoning_effort from parameters PR
2. Fixed custom endpoint check for Databricks PR
General
1. Added litellm.supports_reasoning() util to track if an llm supports reasoning Get Started
2. Function Calling - Handle pydantic base model in message tool calls, handle tools = [], and support fake streaming on tool calls for meta.llama3-3-70b-instruct-v1:0 PR
3. LiteLLM Proxy - Allow passing thinking param to litellm proxy via client sdk PR
4. Fixed correctly translating 'thinking' param for litellm PR

Spend Tracking Improvements

OpenAI, Azure
1. Realtime API Cost tracking with token usage metrics in spend logs Get Started
Anthropic
1. Fixed Claude Haiku cache read pricing per token PR
2. Added cost tracking for Claude responses with base_model PR
3. Fixed Anthropic prompt caching cost calculation and trimmed logged message in db PR
General
1. Added token tracking and log usage object in spend logs PR
2. Handle custom pricing at deployment level PR

Management Endpoints / UI

Test Key Tab
1. Added rendering of Reasoning content, ttft, usage metrics on test key page PR
  View input, output, reasoning tokens, ttft metrics.
Tag / Policy Management
1. Added Tag/Policy Management. Create routing rules based on request metadata. This allows you to enforce that requests with tags="private" only go to specific models. Get Started
  
  Create and manage tags.
Redesigned Login Screen
1. Polished login screen PR
Microsoft SSO Auto-Sync
1. Added debug route to allow admins to debug SSO JWT fields PR
2. Added ability to use MSFT Graph API to assign users to teams PR
3. Connected litellm to Azure Entra ID Enterprise Application PR
4. Added ability for admins to set default_team_params for when litellm SSO creates default teams PR
5. Fixed MSFT SSO to use correct field for user email PR
6. Added UI support for setting Default Team setting when litellm SSO auto creates teams PR
UI Bug Fixes
1. Prevented team, key, org, model numerical values changing on scrolling PR
2. Instantly reflect key and team updates in UI PR

Logging / Guardrail Improvements

Prometheus
1. Emit Key and Team Budget metrics on a cron job schedule Get Started

Security Fixes

Fixed CVE-2025-0330 - Leakage of Langfuse API keys in team exception handling PR
Fixed CVE-2024-6825 - Remote code execution in post call rules PR

Helm

Added service annotations to litellm-helm chart PR
Added extraEnvVars to the helm deployment PR

Demo

Try this on the demo instance today

Complete Git Diff

See the complete git diff since v1.65.4-stable, here

v1.65.4-stable

April 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable

pip install litellm

pip install litellm==1.65.4.post1

v1.65.4-stable is live. Here are the improvements since v1.65.0-stable.

Key Highlights

Preventing DB Deadlocks: Fixes a high-traffic issue when multiple instances were writing to the DB at the same time.
New Usage Tab: Enables viewing spend by model and customizing date range

Let's dive in.

Preventing DB Deadlocks

This release fixes the DB deadlocking issue that users faced in high traffic (10K+ RPS). This is great because it enables user/key/team spend tracking works at that scale.

Read more about the new architecture here

New Usage Tab

The new Usage tab now brings the ability to track daily spend by model. This makes it easier to catch any spend tracking or token counting errors, when combined with the ability to view successful requests, and token usage.

To test this out, just go to Experimental > New Usage > Activity.

New Models / Updated Models

Databricks - claude-3-7-sonnet cost tracking PR
VertexAI - gemini-2.5-pro-exp-03-25 cost tracking PR
VertexAI - gemini-2.0-flash cost tracking PR
Groq - add whisper ASR models to model cost map PR
IBM - Add watsonx/ibm/granite-3-8b-instruct to model cost map PR
Google AI Studio - add gemini/gemini-2.5-pro-preview-03-25 to model cost map PR

LLM Translation

Vertex AI - Support anyOf param for OpenAI json schema translation Get Started
Anthropic- response_format + thinking param support (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens (works across Anthropic API, Bedrock, Vertex) PR
Bedrock - latency optimized inference support Get Started
Sagemaker - handle special tokens + multibyte character code in response Get Started
MCP - add support for using SSE MCP servers Get Started
Anthropic - new litellm.messages.create interface for calling Anthropic /v1/messages via passthrough Get Started
Anthropic - support ‘file’ content type in message param (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - map openai 'reasoning_effort' to anthropic 'thinking' param (works across Anthropic API, Bedrock, Vertex) Get Started
Google AI Studio (Gemini) - [BETA] /v1/files upload support Get Started
Azure - fix o-series tool calling Get Started
Unified file id - [ALPHA] allow calling multiple providers with same file id PR
- This is experimental, and not recommended for production use.
- We plan to have a production-ready implementation by next week.
Google AI Studio (Gemini) - return logprobs PR
Anthropic - Support prompt caching for Anthropic tool calls Get Started
OpenRouter - unwrap extra body on open router calls PR
VertexAI - fix credential caching issue PR
XAI - filter out 'name' param for XAI PR
Gemini - image generation output support Get Started
Databricks - support claude-3-7-sonnet w/ thinking + response_format Get Started

Spend Tracking Improvements

Reliability fix - Check sent and received model for cost calculation PR
Vertex AI - Multimodal embedding cost tracking Get Started, PR

Management Endpoints / UI

New Usage Tab
- Report 'total_tokens' + report success/failure calls
- Remove double bars on scroll
- Ensure ‘daily spend’ chart ordered from earliest to latest date
- showing spend per model per day
- show key alias on usage tab
- Allow non-admins to view their activity
- Add date picker to new usage tab
Virtual Keys Tab
- remove 'default key' on user signup
- fix showing user models available for personal key creation
Test Key Tab
- Allow testing image generation models
Models Tab
- Fix bulk adding models
- support reusable credentials for passthrough endpoints
- Allow team members to see team models
Teams Tab
- Fix json serialization error on update team metadata
Request Logs Tab
- Add reasoning_content token tracking across all providers on streaming
API
- return key alias on /user/daily/activity Get Started
SSO
- Allow assigning SSO users to teams on MSFT SSO PR

Logging / Guardrail Integrations

Console Logs - Add json formatting for uncaught exceptions PR
Guardrails - AIM Guardrails support for virtual key based policies Get Started
Logging - fix completion start time tracking PR
Prometheus
- Allow adding authentication on Prometheus /metrics endpoints PR
- Distinguish LLM Provider Exception vs. LiteLLM Exception in metric naming PR
- Emit operational metrics for new DB Transaction architecture PR

Performance / Loadbalancing / Reliability improvements

Preventing Deadlocks
- Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB PR
- Ensure no deadlocks occur when updating DailyUserSpendTransaction PR
- High Traffic fix - ensure new DB + Redis architecture accurately tracks spend PR
- Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) PR
- v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism PR
Prisma Migrations Get Started
- connects litellm proxy to litellm's prisma migration files
- Handle db schema updates from new litellm-proxy-extras sdk
Redis - support password for sync sentinel clients PR
Fix "Circular reference detected" error when max_parallel_requests = 0 PR
Code QA - Ban hardcoded numbers PR

Helm

fix: wrong indentation of ttlSecondsAfterFinished in chart PR

General Proxy Improvements

Fix - only apply service_account_settings.enforced_params on service accounts PR
Fix - handle metadata null on /chat/completion PR
Fix - Move daily user transaction logging outside of 'disable_spend_logs' flag, as they’re unrelated PR

Demo

Try this on the demo instance today

Complete Git Diff

See the complete git diff since v1.65.0-stable, here

v1.65.0-stable - Model Context Protocol

March 30, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.65.0-stable is live now. Here are the key highlights of this release:

MCP Support: Support for adding and using MCP servers on the LiteLLM proxy.
UI view total usage after 1M+ logs: You can now view usage analytics after crossing 1M+ logs in DB.

Model Context Protocol (MCP)

This release introduces support for centrally adding MCP servers on LiteLLM. This allows you to add MCP server endpoints and your developers can list and call MCP tools through LiteLLM.

UI view total usage after 1M+ logs

This release brings the ability to view total usage analytics even after exceeding 1M+ logs in your database. We've implemented a scalable architecture that stores only aggregate usage data, resulting in significantly more efficient queries and reduced database CPU utilization.

View total usage after 1M+ logs

How this works:
- We now aggregate usage data into a dedicated DailyUserSpend table, significantly reducing query load and CPU usage even beyond 1M+ logs.

Daily Spend Breakdown API:

Retrieve granular daily usage data (by model, provider, and API key) with a single endpoint. Example Request:

Daily Spend Breakdown API
curl -L -X GET 'http://localhost:4000/user/daily/activity?start_date=2025-03-20&end_date=2025-03-27' \
-H 'Authorization: Bearer sk-...'

Daily Spend Breakdown API Response
{
    "results": [
        {
            "date": "2025-03-27",
            "metrics": {
                "spend": 0.0177072,
                "prompt_tokens": 111,
                "completion_tokens": 1711,
                "total_tokens": 1822,
                "api_requests": 11
            },
            "breakdown": {
                "models": {
                    "gpt-4o-mini": {
                        "spend": 1.095e-05,
                        "prompt_tokens": 37,
                        "completion_tokens": 9,
                        "total_tokens": 46,
                        "api_requests": 1
                },
                "providers": { "openai": { ... }, "azure_ai": { ... } },
                "api_keys": { "3126b6eaf1...": { ... } }
            }
        }
    ],
    "metadata": {
        "total_spend": 0.7274667,
        "total_prompt_tokens": 280990,
        "total_completion_tokens": 376674,
        "total_api_requests": 14
    }
}

New Models / Updated Models

Support for Vertex AI gemini-2.0-flash-lite & Google AI Studio gemini-2.0-flash-lite PR
Support for Vertex AI Fine-Tuned LLMs PR
Nova Canvas image generation support PR
OpenAI gpt-4o-transcribe support PR
Added new Vertex AI text embedding model PR

LLM Translation

OpenAI Web Search Tool Call Support PR
Vertex AI topLogprobs support PR
Support for sending images and video to Vertex AI multimodal embedding Doc
Support litellm.api_base for Vertex AI + Gemini across completion, embedding, image_generation PR
Bug fix for returning response_cost when using litellm python SDK with LiteLLM Proxy PR
Support for max_completion_tokens on Mistral API PR
Refactored Vertex AI passthrough routes - fixes unpredictable behaviour with auto-setting default_vertex_region on router model add PR

Spend Tracking Improvements

Log 'api_base' on spend logs PR
Support for Gemini audio token cost tracking PR
Fixed OpenAI audio input token cost tracking PR

UI

Model Management

Allowed team admins to add/update/delete models on UI PR
Added render supports_web_search on model hub PR

Request Logs

Show API base and model ID on request logs PR
Allow viewing keyinfo on request logs PR

Usage Tab

Added Daily User Spend Aggregate view - allows UI Usage tab to work > 1m rows PR
Connected UI to "LiteLLM_DailyUserSpend" spend table PR

Logging Integrations

Fixed StandardLoggingPayload for GCS Pub Sub Logging Integration PR
Track litellm_model_name on StandardLoggingPayload Docs

Performance / Reliability Improvements

LiteLLM Redis semantic caching implementation PR
Gracefully handle exceptions when DB is having an outage PR
Allow Pods to startup + passing /health/readiness when allow_requests_on_db_unavailable: True and DB is down PR

General Improvements

Support for exposing MCP tools on litellm proxy PR
Support discovering Gemini, Anthropic, xAI models by calling their /v1/model endpoint PR
Fixed route check for non-proxy admins on JWT auth PR
Added baseline Prisma database migrations PR
View all wildcard models on /model/info PR

Security

Bumped next from 14.2.21 to 14.2.25 in UI dashboard PR

Complete Git Diff

Here's the complete git diff

v1.65.0 - Team Model Add - update

March 28, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.65.0 updates the /model/new endpoint to prevent non-team admins from creating team models.

This means that only proxy admins or team admins can create team models.

Additional Changes

Allows team admins to call /model/update to update team models.
Allows team admins to call /model/delete to delete team models.
Introduces new user_models_only param to /v2/model/info - only return models added by this user.

These changes enable team admins to add and manage models for their team on the LiteLLM UI + API.

v1.63.14-stable

March 22, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.63.11-stable.

This release brings:

LLM Translation Improvements (MCP Support and Bedrock Application Profiles)
Perf improvements for Usage-based Routing
Streaming guardrail support via websockets
Azure OpenAI client perf fix (from previous release)

Docker Run LiteLLM Proxy

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.14-stable.patch1

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Azure gpt-4o - fixed pricing to latest global pricing - PR
O1-Pro - add pricing + model information - PR
Azure AI - mistral 3.1 small pricing added - PR
Azure - gpt-4.5-preview pricing added - PR

LLM Translation

New LLM Features

Bedrock: Support bedrock application inference profiles Docs
- Infer aws region from bedrock application profile id - (arn:aws:bedrock:us-east-1:...)
Ollama - support calling via /v1/completions Get Started
Bedrock - support us.deepseek.r1-v1:0 model name Docs
OpenRouter - OPENROUTER_API_BASE env var support Docs
Azure - add audio model parameter support - Docs
OpenAI - PDF File support Docs
OpenAI - o1-pro Responses API streaming support Docs
[BETA] MCP - Use MCP Tools with LiteLLM SDK Docs

Bug Fixes

Voyage: prompt token on embedding tracking fix - PR
Sagemaker - Fix ‘Too little data for declared Content-Length’ error - PR
OpenAI-compatible models - fix issue when calling openai-compatible models w/ custom_llm_provider set - PR
VertexAI - Embedding ‘outputDimensionality’ support - PR
Anthropic - return consistent json response format on streaming/non-streaming - PR

Spend Tracking Improvements

litellm_proxy/ - support reading litellm response cost header from proxy, when using client sdk
Reset Budget Job - fix budget reset error on keys/teams/users PR
Streaming - Prevents final chunk w/ usage from being ignored (impacted bedrock streaming + cost tracking) PR

UI

Users Page
- Feature: Control default internal user settings PR
Icons:
- Feature: Replace external "artificialanalysis.ai" icons by local svg PR
Sign In/Sign Out
- Fix: Default login when default_user_id user does not exist in DB PR

Logging Integrations

Support post-call guardrails for streaming responses Get Started
Arize Get Started
- fix invalid package import PR
- migrate to using standardloggingpayload for metadata, ensures spans land successfully PR
- fix logging to just log the LLM I/O PR
- Dynamic API Key/Space param support Get Started
StandardLoggingPayload - Log litellm_model_name in payload. Allows knowing what the model sent to API provider was Get Started
Prompt Management - Allow building custom prompt management integration Get Started

Performance / Reliability improvements

Redis Caching - add 5s default timeout, prevents hanging redis connection from impacting llm calls PR
Allow disabling all spend updates / writes to DB - patch to allow disabling all spend updates to DB with a flag PR
Azure OpenAI - correctly re-use azure openai client, fixes perf issue from previous Stable release PR
Azure OpenAI - uses litellm.ssl_verify on Azure/OpenAI clients PR
Usage-based routing - Wildcard model support Get Started
Usage-based routing - Support batch writing increments to redis - reduces latency to same as ‘simple-shuffle’ PR
Router - show reason for model cooldown on ‘no healthy deployments available error’ PR
Caching - add max value limit to an item in in-memory cache (1MB) - prevents OOM errors on large image url’s being sent through proxy PR

General Improvements

Passthrough Endpoints - support returning api-base on pass-through endpoints Response Headers Docs
SSL - support reading ssl security level from env var - Allows user to specify lower security settings Get Started
Credentials - only poll Credentials table when STORE_MODEL_IN_DB is True PR
Image URL Handling - new architecture doc on image url handling Docs
OpenAI - bump to pip install "openai==1.68.2" PR
Gunicorn - security fix - bump gunicorn==23.0.0 PR

Complete Git Diff

Here's the complete git diff

v1.63.11-stable

March 15, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.63.2-stable.

This release is primarily focused on:

[Beta] Responses API Support
Snowflake Cortex Support, Amazon Nova Image Generation
UI - Credential Management, re-use credentials when adding new models
UI - Test Connection to LLM Provider before adding a model

Known Issues

🚨 Known issue on Azure OpenAI - We don't recommend upgrading if you use Azure OpenAI. This version failed our Azure OpenAI load test

Docker Run LiteLLM Proxy

docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.63.11-stable

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Image Generation support for Amazon Nova Canvas Getting Started
Add pricing for Jamba new models PR
Add pricing for Amazon EU models PR
Add Bedrock Deepseek R1 model pricing PR
Update Gemini pricing: Gemma 3, Flash 2 thinking update, LearnLM PR
Mark Cohere Embedding 3 models as Multimodal PR
Add Azure Data Zone pricing PR
- LiteLLM Tracks cost for azure/eu and azure/us models

LLM Translation

New Endpoints

[Beta] POST /responses API. Getting Started

New LLM Providers

Snowflake Cortex Getting Started

New LLM Features

Support OpenRouter reasoning_content on streaming Getting Started

Bug Fixes

OpenAI: Return code, param and type on bad request error More information on litellm exceptions
Bedrock: Fix converse chunk parsing to only return empty dict on tool use PR
Bedrock: Support extra_headers PR
Azure: Fix Function Calling Bug & Update Default API Version to 2025-02-01-preview PR
Azure: Fix AI services URL PR
Vertex AI: Handle HTTP 201 status code in response PR
Perplexity: Fix incorrect streaming response PR
Triton: Fix streaming completions bug PR
Deepgram: Support bytes.IO when handling audio files for transcription PR
Ollama: Fix "system" role has become unacceptable PR
All Providers (Streaming): Fix String data: stripped from entire content in streamed responses PR

Spend Tracking Improvements

Support Bedrock converse cache token tracking Getting Started
Cost Tracking for Responses API Getting Started
Fix Azure Whisper cost tracking Getting Started

UI

Re-Use Credentials on UI

You can now onboard LLM provider credentials on LiteLLM UI. Once these credentials are added you can re-use them when adding new models Getting Started

Test Connections before adding models

Before adding a model you can test the connection to the LLM provider to verify you have setup your API Base + API Key correctly

General UI Improvements

Add Models Page
- Allow adding Cerebras, Sambanova, Perplexity, Fireworks, Openrouter, TogetherAI Models, Text-Completion OpenAI on Admin UI
- Allow adding EU OpenAI models
- Fix: Instantly show edit + deletes to models
Keys Page
- Fix: Instantly show newly created keys on Admin UI (don't require refresh)
- Fix: Allow clicking into Top Keys when showing users Top API Key
- Fix: Allow Filter Keys by Team Alias, Key Alias and Org
- UI Improvements: Show 100 Keys Per Page, Use full height, increase width of key alias
Users Page
- Fix: Show correct count of internal user keys on Users Page
- Fix: Metadata not updating in Team UI
Logs Page
- UI Improvements: Keep expanded log in focus on LiteLLM UI
- UI Improvements: Minor improvements to logs page
- Fix: Allow internal user to query their own logs
- Allow switching off storing Error Logs in DB Getting Started
Sign In/Sign Out
- Fix: Correctly use PROXY_LOGOUT_URL when set Getting Started

Security

Support for Rotating Master Keys Getting Started
Fix: Internal User Viewer Permissions, don't allow internal_user_viewer role to see Test Key Page or Create Key Button More information on role based access controls
Emit audit logs on All user + model Create/Update/Delete endpoints Getting Started
JWT
- Support multiple JWT OIDC providers Getting Started
- Fix JWT access with Groups not working when team is assigned All Proxy Models access
Using K/V pairs in 1 AWS Secret Getting Started

Logging Integrations

Prometheus: Track Azure LLM API latency metric Getting Started
Athina: Added tags, user_feedback and model_options to additional_keys which can be sent to Athina Getting Started

Performance / Reliability improvements

Redis + litellm router - Fix Redis cluster mode for litellm router PR

General Improvements

OpenWebUI Integration - display thinking tokens

Guide on getting started with LiteLLM x OpenWebUI. Getting Started
Display thinking tokens on OpenWebUI (Bedrock, Anthropic, Deepseek) Getting Started

Complete Git Diff

Here's the complete git diff

v1.63.2-stable

March 8, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.61.20-stable.

This release is primarily focused on:

LLM Translation improvements (more thinking content improvements)
UI improvements (Error logs now shown on UI)

info

This release will be live on 03/09/2025

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Add supports_pdf_input for specific Bedrock Claude models PR
Add pricing for amazon eu models PR
Fix Azure O1 mini pricing PR

LLM Translation

Support /openai/ passthrough for Assistant endpoints. Get Started
Bedrock Claude - fix tool calling transformation on invoke route. Get Started
Bedrock Claude - response_format support for claude on invoke route. Get Started
Bedrock - pass description if set in response_format. Get Started
Bedrock - Fix passing response_format: {"type": "text"}. PR
OpenAI - Handle sending image_url as str to openai. Get Started
Deepseek - return 'reasoning_content' missing on streaming. Get Started
Caching - Support caching on reasoning content. Get Started
Bedrock - handle thinking blocks in assistant message. Get Started
Anthropic - Return signature on streaming. Get Started

Note: We've also migrated from signature_delta to signature. Read more

Support format param for specifying image type. Get Started
Anthropic - /v1/messages endpoint - thinking param support. Get Started

Note: this refactors the [BETA] unified /v1/messages endpoint, to just work for the Anthropic API.

Vertex AI - handle $id in response schema when calling vertex ai. Get Started

Spend Tracking Improvements

Batches API - Fix cost calculation to run on retrieve_batch. Get Started
Batches API - Log batch models in spend logs / standard logging payload. Get Started

Management Endpoints / UI

Virtual Keys Page
- Allow team/org filters to be searchable on the Create Key Page
- Add created_by and updated_by fields to Keys table
- Show 'user_email' on key table
- Show 100 Keys Per Page, Use full height, increase width of key alias
Logs Page
- Show Error Logs on LiteLLM UI
- Allow Internal Users to View their own logs
Internal Users Page
- Allow admin to control default model access for internal users
Fix session handling with cookies

Logging / Guardrail Integrations

Fix prometheus metrics w/ custom metrics, when keys containing team_id make requests. PR

Performance / Loadbalancing / Reliability improvements

Cooldowns - Support cooldowns on models called with client side credentials. Get Started
Tag-based Routing - ensures tag-based routing across all endpoints (/embeddings, /image_generation, etc.). Get Started

General Proxy Improvements

Raise BadRequestError when unknown model passed in request
Enforce model access restrictions on Azure OpenAI proxy route
Reliability fix - Handle emoji’s in text - fix orjson error
Model Access Patch - don't overwrite litellm.anthropic_models when running auth checks
Enable setting timezone information in docker image

Complete Git Diff

Here's the complete git diff

v1.63.0 - Anthropic 'thinking' response update

March 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

v1.63.0 fixes Anthropic 'thinking' response on streaming to return the signature block. Github Issue

It also moves the response structure from signature_delta to signature to be the same as Anthropic. Anthropic Docs

Diff

"message": {
    ...
    "reasoning_content": "The capital of France is Paris.",
    "thinking_blocks": [
        {
            "type": "thinking",
            "thinking": "The capital of France is Paris.",
-            "signature_delta": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..." # 👈 OLD FORMAT
+            "signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..." # 👈 KEY CHANGE
        }
    ]
}

v1.61.20-stable

March 1, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

These are the changes since v1.61.13-stable.

This release is primarily focused on:

LLM Translation improvements (claude-3-7-sonnet + 'thinking'/'reasoning_content' support)
UI improvements (add model flow, user management, etc)

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

New Models / Updated Models

Anthropic 3-7 sonnet support + cost tracking (Anthropic API + Bedrock + Vertex AI + OpenRouter)
1. Anthropic API Start here
2. Bedrock API Start here
3. Vertex AI API See here
4. OpenRouter See here
Gpt-4.5-preview support + cost tracking See here
Azure AI - Phi-4 cost tracking See here
Claude-3.5-sonnet - vision support updated on Anthropic API See here
Bedrock llama vision support See here
Cerebras llama3.3-70b pricing See here

LLM Translation

Infinity Rerank - support returning documents when return_documents=True Start here
Amazon Deepseek - <think> param extraction into ‘reasoning_content’ Start here
Amazon Titan Embeddings - filter out ‘aws_’ params from request body Start here
Anthropic ‘thinking’ + ‘reasoning_content’ translation support (Anthropic API, Bedrock, Vertex AI) Start here
VLLM - support ‘video_url’ Start here
Call proxy via litellm SDK: Support litellm_proxy/ for embedding, image_generation, transcription, speech, rerank Start here
OpenAI Pass-through - allow using Assistants GET, DELETE on /openai pass through routes Start here
Message Translation - fix openai message for assistant msg if role is missing - openai allows this
O1/O3 - support ‘drop_params’ for o3-mini and o1 parallel_tool_calls param (not supported currently) See here

Spend Tracking Improvements

Cost tracking for rerank via Bedrock See PR
Anthropic pass-through - fix race condition causing cost to not be tracked See PR
Anthropic pass-through: Ensure accurate token counting See PR

Management Endpoints / UI

Models Page - Allow sorting models by ‘created at’
Models Page - Edit Model Flow Improvements
Models Page - Fix Adding Azure, Azure AI Studio models on UI
Internal Users Page - Allow Bulk Adding Internal Users on UI
Internal Users Page - Allow sorting users by ‘created at’
Virtual Keys Page - Allow searching for UserIDs on the dropdown when assigning a user to a team See PR
Virtual Keys Page - allow creating a user when assigning keys to users See PR
Model Hub Page - fix text overflow issue See PR
Admin Settings Page - Allow adding MSFT SSO on UI
Backend - don't allow creating duplicate internal users in DB

Helm

support ttlSecondsAfterFinished on the migration job - See PR
enhance migrations job with additional configurable properties - See PR

Logging / Guardrail Integrations

Arize Phoenix support
‘No-log’ - fix ‘no-log’ param support on embedding calls

Performance / Loadbalancing / Reliability improvements

Single Deployment Cooldown logic - Use allowed_fails or allowed_fail_policy if set Start here

General Proxy Improvements

Hypercorn - fix reading / parsing request body
Windows - fix running proxy in windows
DD-Trace - fix dd-trace enablement on proxy

Complete Git Diff

View the complete git diff here.

v1.59.8-stable

January 31, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Models / Updated Models

New OpenAI /image/variations endpoint BETA support Docs
Topaz API support on OpenAI /image/variations BETA endpoint Docs
Deepseek - r1 support w/ reasoning_content (Deepseek API, Vertex AI, Bedrock)
Azure - Add azure o1 pricing See Here
Anthropic - handle -latest tag in model for cost calculation
Gemini-2.0-flash-thinking - add model pricing (it’s 0.0) See Here
Bedrock - add stability sd3 model pricing See Here (s/o Marty Sullivan)
Bedrock - add us.amazon.nova-lite-v1:0 to model cost map See Here
TogetherAI - add new together_ai llama3.3 models See Here

LLM Translation

LM Studio -> fix async embedding call
Gpt 4o models - fix response_format translation
Bedrock nova - expand supported document types to include .md, .csv, etc. Start Here
Bedrock - docs on IAM role based access for bedrock - Start Here
Bedrock - cache IAM role credentials when used
Google AI Studio (gemini/) - support gemini 'frequency_penalty' and 'presence_penalty'
Azure O1 - fix model name check
WatsonX - ZenAPIKey support for WatsonX Docs
Ollama Chat - support json schema response format Start Here
Bedrock - return correct bedrock status code and error message if error during streaming
Anthropic - Supported nested json schema on anthropic calls
OpenAI - metadata param preview support
1. SDK - enable via litellm.enable_preview_features = True
2. PROXY - enable via litellm_settings::enable_preview_features: true
Replicate - retry completion response on status=processing

Spend Tracking Improvements

Bedrock - QA asserts all bedrock regional models have same supported_ as base model
Bedrock - fix bedrock converse cost tracking w/ region name specified
Spend Logs reliability fix - when user passed in request body is int instead of string
Ensure ‘base_model’ cost tracking works across all endpoints
Fixes for Image generation cost tracking
Anthropic - fix anthropic end user cost tracking
JWT / OIDC Auth - add end user id tracking from jwt auth

Management Endpoints / UI

allows team member to become admin post-add (ui + endpoints)
New edit/delete button for updating team membership on UI
If team admin - show all team keys
Model Hub - clarify cost of models is per 1m tokens
Invitation Links - fix invalid url generated
New - SpendLogs Table Viewer - allows proxy admin to view spend logs on UI
1. New spend logs - allow proxy admin to ‘opt in’ to logging request/response in spend logs table - enables easier abuse detection
2. Show country of origin in spend logs
3. Add pagination + filtering by key name/team name
/key/delete - allow team admin to delete team keys
Internal User ‘view’ - fix spend calculation when team selected
Model Analytics is now on Free
Usage page - shows days when spend = 0, and round spend on charts to 2 sig figs
Public Teams - allow admins to expose teams for new users to ‘join’ on UI - Start Here
Guardrails
1. set/edit guardrails on a virtual key
2. Allow setting guardrails on a team
3. Set guardrails on team create + edit page
Support temporary budget increases on /key/update - new temp_budget_increase and temp_budget_expiry fields - Start Here
Support writing new key alias to AWS Secret Manager - on key rotation Start Here

Helm

add securityContext and pull policy values to migration job (s/o https://github.com/Hexoplon)
allow specifying envVars on values.yaml
new helm lint test

Logging / Guardrail Integrations

Log the used prompt when prompt management used. Start Here
Support s3 logging with team alias prefixes - Start Here
Prometheus Start Here
1. fix litellm_llm_api_time_to_first_token_metric not populating for bedrock models
2. emit remaining team budget metric on regular basis (even when call isn’t made) - allows for more stable metrics on Grafana/etc.
3. add key and team level budget metrics
4. emit litellm_overhead_latency_metric
5. Emit litellm_team_budget_reset_at_metric and litellm_api_key_budget_remaining_hours_metric
Datadog - support logging spend tags to Datadog. Start Here
Langfuse - fix logging request tags, read from standard logging payload
GCS - don’t truncate payload on logging
New GCS Pub/Sub logging support Start Here
Add AIM Guardrails support Start Here

Security

New Enterprise SLA for patching security vulnerabilities. See Here
Hashicorp - support using vault namespace for TLS auth. Start Here
Azure - DefaultAzureCredential support

Health Checks

Cleanup pricing-only model names from wildcard route list - prevent bad health checks
Allow specifying a health check model for wildcard routes - https://docs.litellm.ai/docs/proxy/health#wildcard-routes
New ‘health_check_timeout ‘ param with default 1min upperbound to prevent bad model from health check to hang and cause pod restarts. Start Here
Datadog - add data dog service health check + expose new /health/services endpoint. Start Here

Performance / Reliability improvements

3x increase in RPS - moving to orjson for reading request body
LLM Routing speedup - using cached get model group info
SDK speedup - using cached get model info helper - reduces CPU work to get model info
Proxy speedup - only read request body 1 time per request
Infinite loop detection scripts added to codebase
Bedrock - pure async image transformation requests
Cooldowns - single deployment model group if 100% calls fail in high traffic - prevents an o1 outage from impacting other calls
Response Headers - return
1. x-litellm-timeout
2. x-litellm-attempted-retries
3. x-litellm-overhead-duration-ms
4. x-litellm-response-duration-ms
ensure duplicate callbacks are not added to proxy
Requirements.txt - bump certifi version

General Proxy Improvements

JWT / OIDC Auth - new enforce_rbac param,allows proxy admin to prevent any unmapped yet authenticated jwt tokens from calling proxy. Start Here
fix custom openapi schema generation for customized swagger’s
Request Headers - support reading x-litellm-timeout param from request headers. Enables model timeout control when using Vercel’s AI SDK + LiteLLM Proxy. Start Here
JWT / OIDC Auth - new role based permissions for model authentication. See Here

Complete Git Diff

This is the diff between v1.57.8-stable and v1.59.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.59.0

January 17, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

UI Improvements

[Opt In] Admin UI - view messages / responses

You can now view messages and response logs on Admin UI.

How to enable it - add store_prompts_in_spend_logs: true to your proxy_config.yaml

Once this flag is enabled, your messages and responses will be stored in the LiteLLM_Spend_Logs table.

general_settings:
  store_prompts_in_spend_logs: true

DB Schema Change

Added messages and responses to the LiteLLM_Spend_Logs table.

By default this is not logged. If you want messages and responses to be logged, you need to opt in with this setting

general_settings:
  store_prompts_in_spend_logs: true

v1.57.8-stable

January 11, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

alerting, prometheus, secret management, management endpoints, ui, prompt management, finetuning, batch

New / Updated Models

Mistral large pricing - https://github.com/BerriAI/litellm/pull/7452
Cohere command-r7b-12-2024 pricing - https://github.com/BerriAI/litellm/pull/7553/files
Voyage - new models, prices and context window information - https://github.com/BerriAI/litellm/pull/7472
Anthropic - bump Bedrock claude-3-5-haiku max_output_tokens to 8192

General Proxy Improvements

Health check support for realtime models
Support calling Azure realtime routes via virtual keys
Support custom tokenizer on /utils/token_counter - useful when checking token count for self-hosted models
Request Prioritization - support on /v1/completion endpoint as well

LLM Translation Improvements

Deepgram STT support. Start Here
OpenAI Moderations - omni-moderation-latest support. Start Here
Azure O1 - fake streaming support. This ensures if a stream=true is passed, the response is streamed. Start Here
Anthropic - non-whitespace char stop sequence handling - PR
Azure OpenAI - support Entra ID username + password based auth. Start Here
LM Studio - embedding route support. Start Here
WatsonX - ZenAPIKeyAuth support. Start Here

Prompt Management Improvements

Langfuse integration
HumanLoop integration
Support for using load balanced models
Support for loading optional params from prompt manager

Start Here

Finetuning + Batch APIs Improvements

Improved unified endpoint support for Vertex AI finetuning - PR
Add support for retrieving vertex api batch jobs - PR

NEW Alerting Integration

PagerDuty Alerting Integration.

Handles two types of alerts:

High LLM API Failure Rate. Configure X fails in Y seconds to trigger an alert.
High Number of Hanging LLM Requests. Configure X hangs in Y seconds to trigger an alert.

Start Here

Prometheus Improvements

Added support for tracking latency/spend/tokens based on custom metrics. Start Here

NEW Hashicorp Secret Manager Support

Support for reading credentials + writing LLM API keys. Start Here

Management Endpoints / UI Improvements

Create and view organizations + assign org admins on the Proxy UI
Support deleting keys by key_alias
Allow assigning teams to org on UI
Disable using ui session token for 'test key' pane
Show model used in 'test key' pane
Support markdown output in 'test key' pane

Helm Improvements

Prevent istio injection for db migrations cron job
allow using migrationJob.enabled variable within job

Logging Improvements

braintrust logging: respect project_id, add more metrics - https://github.com/BerriAI/litellm/pull/7613
Athina - support base url - ATHINA_BASE_URL
Lunary - Allow passing custom parent run id to LLM Calls

Git Diff

This is the diff between v1.56.3-stable and v1.57.8-stable.

Use this to see the changes in the codebase.

Git Diff

v1.57.7

January 10, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

langfuse, management endpoints, ui, prometheus, secret management

Langfuse Prompt Management

Langfuse Prompt Management is being labelled as BETA. This allows us to iterate quickly on the feedback we're receiving, and making the status clearer to users. We expect to make this feature to be stable by next month (February 2025).

Changes:

Include the client message in the LLM API Request. (Previously only the prompt template was sent, and the client message was ignored).
Log the prompt template in the logged request (e.g. to s3/langfuse).
Log the 'prompt_id' and 'prompt_variables' in the logged request (e.g. to s3/langfuse).

Start Here

Team/Organization Management + UI Improvements

Managing teams and organizations on the UI is now easier.

Changes:

Support for editing user role within team on UI.
Support updating team member role to admin via api - /team/member_update
Show team admins all keys for their team.
Add organizations with budgets
Assign teams to orgs on the UI
Auto-assign SSO users to teams

Start Here

Hashicorp Vault Support

We now support writing LiteLLM Virtual API keys to Hashicorp Vault.

Start Here

Custom Prometheus Metrics

Define custom prometheus metrics, and track usage/latency/no. of requests against them

This allows for more fine-grained tracking - e.g. on prompt template passed in request metadata

Start Here

v1.57.3 - New Base Docker Image

January 8, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

docker image, security, vulnerability

0 Critical/High Vulnerabilities

What changed?

LiteLLMBase image now uses cgr.dev/chainguard/python:latest-dev

Why the change?

To ensure there are 0 critical/high vulnerabilities on LiteLLM Docker Image

Migration Guide

If you use a custom dockerfile with litellm as a base image + apt-get

Instead of apt-get use apk, the base litellm image will no longer have apt-get installed.

You are only impacted if you use apt-get in your Dockerfile

# Use the provided base image
FROM ghcr.io/berriai/litellm:main-latest

# Set the working directory
WORKDIR /app

# Install dependencies - CHANGE THIS to `apk`
RUN apt-get update && apt-get install -y dumb-init 

Before Change

RUN apt-get update && apt-get install -y dumb-init

After Change

RUN apk update && apk add --no-cache dumb-init

v1.56.4

December 29, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

deepgram, fireworks ai, vision, admin ui, dependency upgrades

New Models

Deepgram Speech to Text

New Speech to Text support for Deepgram models. Start Here

from litellm import transcription
import os 

# set api keys 
os.environ["DEEPGRAM_API_KEY"] = ""
audio_file = open("/path/to/audio.mp3", "rb")

response = transcription(model="deepgram/nova-2", file=audio_file)

print(f"response: {response}")

Fireworks AI - Vision support for all models

LiteLLM supports document inlining for Fireworks AI models. This is useful for models that are not vision models, but still need to parse documents/images/etc. LiteLLM will add #transform=inline to the url of the image_url, if the model is not a vision model See Code

Proxy Admin UI

Test Key Tab displays model used in response

Test Key Tab renders content in .md, .py (any code/markdown format)

Dependency Upgrades

(Security fix) Upgrade to fastapi==0.115.5 https://github.com/BerriAI/litellm/pull/7447

Bug Fixes

Add health check support for realtime models Here
Health check error with audio_transcription model https://github.com/BerriAI/litellm/issues/5999

v1.56.3

December 28, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

guardrails, logging, virtual key management, new models

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

New Features

✨ Log Guardrail Traces

Track guardrail failure rate and if a guardrail is going rogue and failing requests. Start here

Traced Guardrail Success

Traced Guardrail Failure

`/guardrails/list`

/guardrails/list allows clients to view available guardrails + supported guardrail params

curl -X GET 'http://0.0.0.0:4000/guardrails/list'

Expected response

{
    "guardrails": [
        {
        "guardrail_name": "aporia-post-guard",
        "guardrail_info": {
            "params": [
            {
                "name": "toxicity_score",
                "type": "float",
                "description": "Score between 0-1 indicating content toxicity level"
            },
            {
                "name": "pii_detection",
                "type": "boolean"
            }
            ]
        }
        }
    ]
}

✨ Guardrails with Mock LLM

Send mock_response to test guardrails without making an LLM call. More info on mock_response here

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-npnwjPQciVRok5yNZgKmFQ" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi my email is ishaan@berri.ai"}
    ],
    "mock_response": "This is a mock response",
    "guardrails": ["aporia-pre-guard", "aporia-post-guard"]
  }'

Assign Keys to Users

You can now assign keys to users via Proxy UI

New Models

openrouter/openai/o1
vertex_ai/mistral-large@2411

Fixes

Fix vertex_ai/ mistral model pricing: https://github.com/BerriAI/litellm/pull/7345
Missing model_group field in logs for aspeech call types https://github.com/BerriAI/litellm/pull/7392

v1.56.1

December 27, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

key management, budgets/rate limits, logging, guardrails

info

Get a 7 day free trial for LiteLLM Enterprise here.

no call needed

✨ Budget / Rate Limit Tiers

Define tiers with rate limits. Assign them to keys.

Use this to control access and budgets across a lot of keys.

Start here

curl -L -X POST 'http://0.0.0.0:4000/budget/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
    "budget_id": "high-usage-tier",
    "model_max_budget": {
        "gpt-4o": {"rpm_limit": 1000000}
    }
}'

OTEL Bug Fix

LiteLLM was double logging litellm_request span. This is now fixed.

Relevant PR

Logging for Finetuning Endpoints

Logs for finetuning requests are now available on all logging providers (e.g. Datadog).

What's logged per request:

file_id
finetuning_job_id
any key/team metadata

Start Here:

Dynamic Params for Guardrails

You can now set custom parameters (like success threshold) for your guardrails in each request.

See guardrails spec for more details

v1.55.10

December 24, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

batches, guardrails, team management, custom auth

info

Get a free 7-day LiteLLM Enterprise trial here. Start here

No call needed

✨ Cost Tracking, Logging for Batches API (`/batches`)

Track cost, usage for Batch Creation Jobs. Start here

✨ `/guardrails/list` endpoint

Show available guardrails to users. Start here

✨ Allow teams to add models

This enables team admins to call their own finetuned models via litellm proxy. Start here

✨ Common checks for custom auth

Calling the internal common_checks function in custom auth is now enforced as an enterprise feature. This allows admins to use litellm's default budget/auth checks within their custom auth implementation. Start here

✨ Assigning team admins

Team admins is graduating from beta and moving to our enterprise tier. This allows proxy admins to allow others to manage keys/models for their own teams (useful for projects in production). Start here

v1.55.8-stable

December 22, 2024

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

A new LiteLLM Stable release just went out. Here are 5 updates since v1.52.2-stable.

langfuse, fallbacks, new models, azure_storage

Langfuse Prompt Management

This makes it easy to run experiments or change the specific models gpt-4o to gpt-4o-mini on Langfuse, instead of making changes in your applications. Start here

Control fallback prompts client-side

Claude prompts are different than OpenAI

Pass in prompts specific to model when doing fallbacks. Start here

New Providers / Models

NVIDIA Triton /infer endpoint. Start here
Infinity Rerank Models Start here

✨ Azure Data Lake Storage Support

Send LLM usage (spend, tokens) data to Azure Data Lake. This makes it easy to consume usage data on other services (eg. Databricks) Start here

Docker Run LiteLLM

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.55.8-stable

Get Daily Updates

LiteLLM ships new releases every day. Follow us on LinkedIn to get daily updates.

Deploy this version​

Key Highlights​

Gemini Realtime API​

Spend Logs Retention Period​

PII Masking 2.0​

New Models / Updated Models​

LLM API Endpoints​

Spend Tracking Improvements​

Management Endpoints / UI​

Logging / Alerting Integrations​

Guardrails​

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

New Contributors​

Demo Instance​

Deploy this version​

Key Highlights​

Batch API Load Balancing​

Email Invites​

New Models / Updated Models​

LLM API Endpoints​

Spend Tracking / Budget Improvements​

Management Endpoints / UI​

Logging / Guardrail Integrations​

Performance / Reliability Improvements​

General Proxy Improvements​

New Contributors​

Deploy this version​

Key Highlights​

Bedrock Knowledge Base (Vector Store)​

Rate Limiting​

New Models / Updated Models​

LLM API Endpoints​

Spend Tracking / Budget Improvements​

Management Endpoints / UI​

Logging / Guardrail Integrations​

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

Deploy this version​

Key Highlights​

Improved User Management​

Responses API Load Balancing​

UI Session Logs​

New Models / Updated Models​

Spend Tracking Improvements​

Management Endpoints / UI​

Users​

Teams​

Keys​

UI Logs Page​

UI Authentication & Security​

UI General fixes​

Logging / Guardrail Integrations​

General Proxy Improvements​

Helm​

Full Changelog​

Key Highlights​

SCIM Integration​

Team and Tag based usage tracking​

Unified Responses API​

New Models / Updated Models​

Spend Tracking Improvements​

Management Endpoints / UI​

Logging / Guardrail Integrations​

General Proxy Improvements​

Deploy this version​

Key Highlights​

Realtime API Cost Tracking​

Microsoft SSO Auto-sync​

New Models / Updated Models​

Spend Tracking Improvements​

Management Endpoints / UI​

Logging / Guardrail Improvements​

Security Fixes​

Helm​

Demo​

Complete Git Diff​

Deploy this version​

Key Highlights​

Preventing DB Deadlocks​

Deploy this version

Key Highlights

Gemini Realtime API

Spend Logs Retention Period

PII Masking 2.0

New Models / Updated Models

LLM API Endpoints

Spend Tracking Improvements

Management Endpoints / UI

Logging / Alerting Integrations

Guardrails

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

New Contributors

Demo Instance

Deploy this version

Key Highlights

Batch API Load Balancing

Email Invites

New Models / Updated Models

LLM API Endpoints

Spend Tracking / Budget Improvements

Management Endpoints / UI

Logging / Guardrail Integrations

Performance / Reliability Improvements

General Proxy Improvements

New Contributors

Deploy this version

Key Highlights

Bedrock Knowledge Base (Vector Store)

Rate Limiting

New Models / Updated Models

LLM API Endpoints

Spend Tracking / Budget Improvements

Management Endpoints / UI

Logging / Guardrail Integrations

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

Deploy this version

Key Highlights

Improved User Management

Responses API Load Balancing

UI Session Logs

New Models / Updated Models

Spend Tracking Improvements

Management Endpoints / UI

Users

Teams

Keys

UI Logs Page

UI Authentication & Security

UI General fixes

Logging / Guardrail Integrations

General Proxy Improvements

Helm

Full Changelog

Key Highlights

SCIM Integration

Team and Tag based usage tracking

Unified Responses API

New Models / Updated Models

Spend Tracking Improvements

Management Endpoints / UI

Logging / Guardrail Integrations

General Proxy Improvements

Deploy this version

Key Highlights

Realtime API Cost Tracking

Microsoft SSO Auto-sync

New Models / Updated Models

Spend Tracking Improvements

Management Endpoints / UI

Logging / Guardrail Improvements

Security Fixes

Helm

Demo

Complete Git Diff

Deploy this version

Key Highlights

Preventing DB Deadlocks