Choosing a model

Provider Model Description Notes Takeaway
OpenAI GPT-4.1 Good general-purpose model 1 million token context length Good models for general-purpose use
OpenAI GPT-4.1-mini Faster, cheaper, and dumber version of GPT-4.1
OpenAI GPT-4.1-nano Even faster, cheaper, and dumber
OpenAI o3 Better at complex math and coding Slower and more expensive
OpenAI o4-mini Reasoning model, not as strong as o3 Cheaper than GPT-4.1
OpenAI API Access via OpenAI or Azure OpenAI, Azure
Anthropic Claude 3.7 Sonnet Good general-purpose model Best for code generation Best model for code generation
Anthropic Claude 3.5 Sonnet v2 Older but still excellent Some prefer it to 3.7
Anthropic Claude 3.5 Haiku Faster, cheaper, and dumber
Anthropic API Access via Anthropic or AWS Bedrock Anthropic, AWS Bedrock
Google Gemini 2.0 Flash Very fast 1 million token context length Largest context length — good for big input
Google Gemini 2.0 Pro Smarter than Flash 2 million token context length
LLaMA LLaMA 3.1 405b Text-only 229GB, not as smart as best closed models Good for on-premise/local use
LLaMA LLaMA 3.2 90b Text + vision 55GB
LLaMA LLaMA 3.2 11b Text + vision 7.9GB, can run on a MacBook
LLaMA Open weights + API access Can run locally Via Ollama, OpenRouter, Groq, AWS Bedrock
DeepSeek DeepSeek R1 671b Uses chain of thought 404GB, claimed similar performance to OpenAI o1
DeepSeek DeepSeek R1 32b Smaller variant 20GB, not actually DeepSeek architecture, significantly worse
DeepSeek DeepSeek R1 70b Mid-size variant 43GB, not actually DeepSeek architecture
DeepSeek Open weights + API access Can run locally Via DeepSeek, OpenRouter

Table values taken from Joe Cheng’s LLM Quickstart and converted into a table using ChatGPT.