Question 1

How is AI API pricing calculated?

Accepted Answer

Most AI APIs charge per token (roughly 0.75 words). You pay separately for input tokens (your prompt) and output tokens (the response). Rates are usually quoted per 1 million tokens.

Question 2

What is the cheapest AI API?

Accepted Answer

Open-source models like Llama self-hosted on a GPU rental are cheapest ($0.10–0.50/1M tokens). Among commercial APIs, GPT-4o Mini and Gemini Pro are among the most affordable for high volumes.

Question 3

What is batch processing for AI APIs?

Accepted Answer

Many providers offer 50% discounts for batch requests — where you submit many requests at once and accept results within 24 hours rather than in real-time. Ideal for offline workloads.

Question 4

What is prompt caching and how does it save money?

Accepted Answer

Prompt caching stores the processed version of your system prompt or context. If the same prefix is sent repeatedly, the provider charges at a reduced rate (up to 90% off) for cached tokens.

Question 5

How much does it cost to run your own LLM on a GPU?

Accepted Answer

An A100 GPU cloud instance runs $3–5/hr. At 500 requests/hr capacity, that is $0.006–0.010 per request. At high volumes (50K+ req/month), self-hosting can be significantly cheaper than commercial APIs.

AI Cost Calculator

How to Use This Calculator

Formula

Example

Frequently Asked Questions

Related Calculators