Anthropic released Claude 4 Opus last month and OpenAI followed with GPT-5 two weeks later. I have been testing both daily for work and here are my honest impressions after real usage, not benchmark hype.
Claude 4 Opus:
- The extended thinking is genuinely different. You can watch it reason through multi-step problems in a way that feels less like autocomplete and more like a colleague working through something.
- Coding ability is a significant jump. I gave it a 400-line React component with a subtle race condition and it found it in one pass. Claude 3.5 Sonnet would have missed it.
- The 1M token context window is real. I loaded an entire codebase and asked it to trace a bug across 12 files. It did it correctly.
- Weakness: still hallucinates on niche library APIs. If you are using something with under 1000 GitHub stars, verify everything.
GPT-5:
- Multimodal is where it shines. Image understanding, audio processing, and video analysis in a single model. I fed it a whiteboard photo from a meeting and it generated accurate user stories.
- The reasoning mode is competitive with Claude but feels more rigid. It follows instructions precisely but sometimes misses the spirit of what you are asking.
- The voice mode is genuinely useful now. I had a 20-minute conversation debugging an architecture problem while driving.
- Weakness: the pricing. GPT-5 Pro is $200/month. That is a tough sell when Claude 4 does 80% of the same things at a lower price.
My workflow now:
- Claude 4 for coding, writing, and analysis
- GPT-5 for multimodal tasks and voice interactions
- Still using Gemini 2.5 for quick throwaway queries because the free tier is generous
Sources:
- Anthropic blog — Claude 4 release notes
- OpenAI blog — GPT-5 technical report
- Personal usage across 4 weeks of daily testing
What is your AI stack in 2026?
Hot take: the benchmarks do not matter anymore. Every frontier model is good enough for 95% of tasks. The differentiator is UX and integration, not raw capability.