AI Cost Save
AICostSave

Model Cost Pages

High-intent pricing pages for users already comparing OpenAI, Claude, DeepSeek, and more.

Live pricing models

Claude Haiku 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.0008 | Output: 0.004

Claude Opus 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.005 | Output: 0.025

Claude Opus 4.6

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.005 | Output: 0.025

Claude Sonnet 4.5

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.003 | Output: 0.015

Claude Sonnet 4.6

Anthropic Claude models focused on long-context reasoning and stable enterprise usage.

Input: 0.003 | Output: 0.015

Deepseek-chat

DeepSeek models known for cost-efficient reasoning and coding-focused performance.

Input: 0.00014 | Output: 0.00028

Deepseek-reasoner

DeepSeek models known for cost-efficient reasoning and coding-focused performance.

Input: 0.002 | Output: 0.004

Doubao-lite

General-purpose model suitable for text generation and reasoning in common API workflows.

Input: 0.00004 | Output: 0.00008

Doubao-pro

General-purpose model suitable for text generation and reasoning in common API workflows.

Input: 0.0001 | Output: 0.0003

Gemini 2.5 Flash

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0003 | Output: 0.0025

Gemini 2.5 Flash Lite

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0001 | Output: 0.0004

Gemini 3.1 Flash Image Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0003 | Output: 0.0025

Gemini 3.1 Flash Lite Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.00025 | Output: 0.0015

Gemini 3.1 Pro Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.002 | Output: 0.01

Gemini 3.1 Pro Preview Custom Tools

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.002 | Output: 0.012

Gemini 3 Flash Preview

Google Gemini models for text, multimodal workloads, and high-throughput inference.

Input: 0.0005 | Output: 0.003

GPT-4.1

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.002 | Output: 0.008

GPT-4.1 mini

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0004 | Output: 0.0016

GPT-4.1 nano

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0001 | Output: 0.0004

GPT-4o

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0025 | Output: 0.01

GPT-4o mini

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00015 | Output: 0.0006

GPT-5.2

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.2-Codex

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.3 Chat

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.3-Codex

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00175 | Output: 0.014

GPT-5.4

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.0025 | Output: 0.015

GPT-5.4 Pro

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.03 | Output: 0.18

GPT-5 Nano

OpenAI general-purpose text and multimodal models for chat, tools, and content generation.

Input: 0.00005 | Output: 0.0004

Kimi-k2-0711-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2-0905-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2.5

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.0001 | Output: 0.00292

Kimi-k2-thinking

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00222

Kimi-k2-thinking-turbo

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00806

kimi-k2-turbo-preview

Moonshot Kimi models designed for long-context processing and Chinese-language Q&A.

Input: 0.00014 | Output: 0.00806

Qwen3.5-Flash

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.00004 | Output: 0.00008

Qwen3.5-Plus

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.00012 | Output: 0.00024

Qwen3-max

Alibaba Cloud Qwen models optimized for general chat and Chinese language scenarios.

Input: 0.0006 | Output: 0.0018

Guides and comparisons

Claude Cost Per Token (What to Track)

A clear way to estimate Claude costs, including input vs output tokens and workflow call volume.

DeepSeek API Pricing (Cost Drivers)

How to estimate DeepSeek costs and compare value across tasks, especially when you optimize prompts and reduce retries.

GPT-4 vs Claude Cost: How to Decide

A decision framework for choosing between GPT-4 and Claude based on real workflow token usage.

Kimi API Pricing (What to Track)

Understand Kimi pricing with input/output tokens and workflow call volume so you can control spend.

OpenAI Cost Per Token (Practical Guide)

Understand OpenAI cost drivers and how to estimate cost per token for input and output.

Qwen API Cost (Cost Per Token & Practical Estimate)

Estimate Qwen API cost using input/output tokens and real workflow call volume — then optimize where waste hides.