AI Cost Save
AICostSave

Model Cost Pages

High-intent pricing pages for users already comparing OpenAI, Claude, DeepSeek, and more.

Live pricing models

Claude Haiku 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.0008 | Output: 0.004

Claude Opus 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.005 | Output: 0.025

Claude Opus 4.6

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.005 | Output: 0.025

Claude Sonnet 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.003 | Output: 0.015

Claude Sonnet 4.6

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.003 | Output: 0.015

Deepseek-chat

DeepSeek models known for cost-efficient reasoning and coding-focused performance.

Input: 0.00014 | Output: 0.00028

Deepseek-reasoner

DeepSeek models known for cost-efficient reasoning and coding-focused performance.

Input: 0.002 | Output: 0.004

Doubao-lite

General-purpose model suitable for text generation and reasoning in common API workflows.

Input: 0.00004 | Output: 0.00008

Doubao-pro

General-purpose model suitable for text generation and reasoning in common API workflows.

Input: 0.0001 | Output: 0.0003

Gemini 2.5 Flash

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0003 | Output: 0.0025

Gemini 2.5 Flash Lite

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0001 | Output: 0.0004

Gemini 3.1 Flash Image Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0003 | Output: 0.0025

Gemini 3.1 Flash Lite Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.00025 | Output: 0.0015

Gemini 3.1 Pro Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.002 | Output: 0.01

Gemini 3.1 Pro Preview Custom Tools

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.002 | Output: 0.012

Gemini 3 Flash Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0005 | Output: 0.003

GPT-4.1

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.002 | Output: 0.008

GPT-4.1 mini

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0004 | Output: 0.0016

GPT-4.1 nano

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0001 | Output: 0.0004

GPT-4o

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0025 | Output: 0.01

GPT-4o mini

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00015 | Output: 0.0006

GPT-5.2

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.2-Codex

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.3 Chat

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.3-Codex

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.4

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0025 | Output: 0.015

GPT-5.4 Pro

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.03 | Output: 0.18

GPT-5 Nano

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00005 | Output: 0.0004

Kimi-k2-0711-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2-0905-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2.5

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.0001 | Output: 0.00292

Kimi-k2-thinking

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2-thinking-turbo

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00806

kimi-k2-turbo-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00806

Qwen3.5-Flash

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.00004 | Output: 0.00008

Qwen3.5-Plus

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.00012 | Output: 0.00024

Qwen3-max

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.0006 | Output: 0.0018

Guides and comparisons

Claude : coût par token (quoi suivre)

Estimer Claude en séparant input/output et en tenant compte du volume de calls.

DeepSeek API pricing (facteurs de coût)

Comment estimer les coûts DeepSeek et comparer la valeur quand vous optimisez prompts et retries.

GPT-4 vs Claude : le coût, comment choisir

Un cadre simple pour choisir entre GPT-4 et Claude en fonction du total tokens facturés.

Tarification Kimi (ce qu’il faut suivre)

Comprendre la tarification Kimi avec input/output tokens + volume d’appels de votre workflow pour mieux contrôler vos dépenses.

OpenAI : coût par token (guide pratique)

Comprendre le coût OpenAI et estimer le coût par token pour input et output.

Coût de l’API Qwen (par token & estimation pratique)

Estimez le coût Qwen à partir des tokens input/output et du volume d’appels réel — puis optimisez l’endroit où le gaspillage se cache.