Choosing a model

Provider	Model	Description	Notes	Takeaway
OpenAI	GPT-4.1	Good general-purpose model	1 million token context length	Good models for general-purpose use
OpenAI	GPT-4.1-mini	Faster, cheaper, and dumber version of GPT-4.1
OpenAI	GPT-4.1-nano	Even faster, cheaper, and dumber
OpenAI	o3	Better at complex math and coding	Slower and more expensive
OpenAI	o4-mini	Reasoning model, not as strong as o3	Cheaper than GPT-4.1
OpenAI	API	Access via OpenAI or Azure	OpenAI, Azure
Anthropic	Claude 3.7 Sonnet	Good general-purpose model	Best for code generation	Best model for code generation
Anthropic	Claude 3.5 Sonnet v2	Older but still excellent	Some prefer it to 3.7
Anthropic	Claude 3.5 Haiku	Faster, cheaper, and dumber
Anthropic	API	Access via Anthropic or AWS Bedrock	Anthropic, AWS Bedrock
Google	Gemini 2.0 Flash	Very fast	1 million token context length	Largest context length — good for big input
Google	Gemini 2.0 Pro	Smarter than Flash	2 million token context length
LLaMA	LLaMA 3.1 405b	Text-only	229GB, not as smart as best closed models	Good for on-premise/local use
LLaMA	LLaMA 3.2 90b	Text + vision	55GB
LLaMA	LLaMA 3.2 11b	Text + vision	7.9GB, can run on a MacBook
LLaMA	Open weights + API access	Can run locally	Via Ollama, OpenRouter, Groq, AWS Bedrock
DeepSeek	DeepSeek R1 671b	Uses chain of thought	404GB, claimed similar performance to OpenAI o1
DeepSeek	DeepSeek R1 32b	Smaller variant	20GB, not actually DeepSeek architecture, significantly worse
DeepSeek	DeepSeek R1 70b	Mid-size variant	43GB, not actually DeepSeek architecture
DeepSeek	Open weights + API access	Can run locally	Via DeepSeek, OpenRouter

Table values taken from Joe Cheng’s LLM Quickstart and converted into a table using ChatGPT.