Most AI tools today run on the same few models, i.e. the large language models (LLMs), e.g.GPT, Claude, and Gemini. LLMs are the engines behind apps. Each has its own strengths, limitations, and quirks. I’ve tested them all myself. Based on my research and experience, I’ve ranked them by how useful and accessible they are.
Here’s the summary list:
Rank | Model | Why It Ranks Here | Where to Get | Paid/Free |
---|---|---|---|---|
#1 | GPT-4 | Most well-rounded: excellent reasoning, creativity, tools (vision, code), and app integrations. | chat.openai.com, API via OpenAI or Azure | Paid (Free GPT-3.5) |
#2 | Claude 3 | Near GPT-4 in IQ, best for long documents, very safe and coherent outputs. | claude.ai (Free & Pro), Anthropic API | Free + Pro |
#3 | Gemini 1.5 | Massive context window (1M tokens), strong research and multimodal capacity. | gemini.google.com, Google Cloud Vertex AI | Free + Advanced tier |
#4 | LLaMA 3 | High-performance open model, instruction-tuned, and backed by Meta’s distribution channels. | Hugging Face, Meta AI | Free (open weights) |
#5 | DeepSeek | Strong code/math model, bilingual (English/Chinese), rapidly improving, and open. | deepseek.com, Hugging Face | Free (open weights) |
#6 | Mistral | Powerful, open-weight, fast — great for devs, but less refined without fine-tuning. | Hugging Face, Together.ai | Free (open weights) |
#7 | Command R+ | Best-in-class for RAG and enterprise document Q&A — niche but excellent. | cohere.com, Hugging Face | Free + API pricing |
#8 | Grok | Unique, trend-aware personality — fun, fast, but limited in serious tasks or depth. | X (Twitter), Premium subscribers | Free with X Premium |
Here’s the detail breakdown:
#1: GPT-4 (by OpenAI)
Who made it: OpenAI, backed by MicrosoftTrained on: Web data, books, code, conversations, and human feedback
Core features:
- Memory (Pro only, still limited)
- Web browsing (GPT-4 Turbo)
- Code interpreter, file upload, image analysis
- Custom GPTs (build your own bots)
Other Insights:
- What it’s good at: Writing, explaining, coding, summarizing — a strong all-around model
- Where it falls short: Doesn’t know real-time info unless browsing is enabled
- Best use case: Daily tasks, idea generation, tutoring, and writing help
- Where to use it: chat.openai.com (Free for GPT-3.5, $20/month for GPT-4)
#2: Claude 3 (by Anthropic)
Who made it: Anthropic, founded by former OpenAI researchersTrained on: Public data, books, academic content, and code — safety-aligned and instruction-tuned
Core features:
- Long context (200,000+ tokens)
- Clear, structured, professional tone
- Strong reasoning and math skills
- No memory yet, but consistent responses
Other Insights:
- What it’s good at: Reading long PDFs, legal or policy documents, thoughtful writing
- Where it falls short: Can be overly cautious or neutral in tone
- Best use case: Research, professional writing, strategy work
- Where to use it: claude.ai (Free and Pro tiers)
#3: Gemini 1.5 (by Google)
Who made it: Google DeepMind
Trained on: Web content, code, books, YouTube transcripts, images, and other multimodal inputs
Core features:
- Up to 1 million token context
- Multimodal: can handle text, code, images, video
- Strong math, logic, and science capabilities
- Integrated with Google Workspace
Other Insights:
- What it’s good at: Technical writing, scientific content, multimodal projects, summarizing long material
- Where it falls short: UI feels beta, and some answers may lag behind GPT-4 in polish
- Best use case: Deep research, long-form summarization, technical and academic work
- Where to use it: gemini.google.com (Free and Advanced tiers)
#4: LLaMA 3 (by Meta)
Who made it: Meta AITrained on: Public web content, books, code, and multilingual data
Core features:
- High-performance open models (8B and 70B)
- Instruction-tuned for general use
- Multilingual and reasoning capable
- Open for commercial use
Other Insights:
- What it’s good at: Building chatbots, research apps, AI-powered assistants
- Where it falls short: No hosted version from Meta yet, requires setup
- Best use case: Developers or teams building scalable AI solutions
- Where to use it: Hugging Face, Meta AI
#5: DeepSeek (by DeepSeek AI)
Who made it: DeepSeek AI, based in ChinaTrained on: Bilingual datasets, math/code repositories, and open web data
Core features:
- Great at logic, math, and code generation
- Bilingual (Chinese and English)
- Free to use and open-source
- Strong benchmarks in technical tasks
Other Insights:
- What it’s good at: Programming, reasoning, math-heavy tasks
- Where it falls short: Less instruction tuning and polish for general writing
- Best use case: Developers, engineers, and code assistants
- Where to use it: deepseek.com, Hugging Face
#6: Mistral (by Mistral AI)
Who made it: Mistral AI, based in ParisTrained on: Open web data, code, multilingual corpora
Core features:
- Lightweight and fast
- Mixtral model uses mixture-of-experts for performance
- Open weights, good for local deployment
- Strong for RAG and dev workflows
Other Insights:
- What it’s good at: Custom deployment, developer workflows, fast inference
- Where it falls short: Not as polished without fine-tuning, lacks a UI
- Best use case: Developers building AI products, chatbots, or tools
- Where to use it: Hugging Face, Together.ai
#7: Command R+ (by Cohere)
Who made it: Cohere, based in Toronto, CanadaTrained on: Diverse corpora tuned for retrieval-augmented generation (RAG) and enterprise Q&A
Core features:
- Optimized for document retrieval and search
- Open weights available
- Good accuracy on domain-specific queries
- API-first enterprise integration
Other Insights:
- What it’s good at: Searching and summarizing internal knowledge, building RAG systems
- Where it falls short: Not designed for general chat or creative tasks
- Best use case: Knowledge bases, internal search, smart business tools
- Where to use it: cohere.com, Hugging Face
#8: Grok (by xAI)
Who made it: xAI, founded by Elon Musk, integrated with X (Twitter)Trained on: X data, public web content, and trending events
Core features:
- Always updated with real-time info
- Sarcastic and informal tone
- Integrated inside the X platform
- Fast, edgy, sometimes opinionated
Other Insights:
- What it’s good at: Commentary on trending topics, pop culture, light conversational use
- Where it falls short: Lacks depth, accuracy, and general utility beyond X
- Best use case: Quick takes, current events, infotainment
- Where to use it: Available to X Premium subscribers
Final Thoughts
As AI models continue to improve, the focus is shifting. It’s no longer just about which LLM is the most powerful on paper – it’s about how we use them in the real world. Whether you’re writing code, drafting emails, researching complex topics, or building tools on top of open models, what matters most is how well a model fits your workflow.
I believe LLMs are becoming a commodity. Many are free, fast, and increasingly open. The real power now lies in application in how we integrate, combine, and deploy these tools to solve real problems.
I’ll keep testing, comparing, and updating this page as things will change.