Choosing a model
Provider | Model | Description | Notes | Takeaway |
---|---|---|---|---|
OpenAI | GPT-4.1 | Good general-purpose model | 1 million token context length | Good models for general-purpose use |
OpenAI | GPT-4.1-mini | Faster, cheaper, and dumber version of GPT-4.1 | ||
OpenAI | GPT-4.1-nano | Even faster, cheaper, and dumber | ||
OpenAI | o3 | Better at complex math and coding | Slower and more expensive | |
OpenAI | o4-mini | Reasoning model, not as strong as o3 | Cheaper than GPT-4.1 | |
OpenAI | API | Access via OpenAI or Azure | OpenAI, Azure | |
Anthropic | Claude 3.7 Sonnet | Good general-purpose model | Best for code generation | Best model for code generation |
Anthropic | Claude 3.5 Sonnet v2 | Older but still excellent | Some prefer it to 3.7 | |
Anthropic | Claude 3.5 Haiku | Faster, cheaper, and dumber | ||
Anthropic | API | Access via Anthropic or AWS Bedrock | Anthropic, AWS Bedrock | |
Gemini 2.0 Flash | Very fast | 1 million token context length | Largest context length — good for big input | |
Gemini 2.0 Pro | Smarter than Flash | 2 million token context length | ||
LLaMA | LLaMA 3.1 405b | Text-only | 229GB, not as smart as best closed models | Good for on-premise/local use |
LLaMA | LLaMA 3.2 90b | Text + vision | 55GB | |
LLaMA | LLaMA 3.2 11b | Text + vision | 7.9GB, can run on a MacBook | |
LLaMA | Open weights + API access | Can run locally | Via Ollama, OpenRouter, Groq, AWS Bedrock | |
DeepSeek | DeepSeek R1 671b | Uses chain of thought | 404GB, claimed similar performance to OpenAI o1 | |
DeepSeek | DeepSeek R1 32b | Smaller variant | 20GB, not actually DeepSeek architecture, significantly worse | |
DeepSeek | DeepSeek R1 70b | Mid-size variant | 43GB, not actually DeepSeek architecture | |
DeepSeek | Open weights + API access | Can run locally | Via DeepSeek, OpenRouter |
Table values taken from Joe Cheng’s LLM Quickstart and converted into a table using ChatGPT.