AI & Machine Learning

New AI model dropped — first impressions of Claude 4 Opus and GPT-5

Anthropic released Claude 4 Opus last month and OpenAI followed with GPT-5 two weeks later. I have been testing both daily for work and here are my honest impressions after real usage, not benchmark hype.

Claude 4 Opus:

  • The extended thinking is genuinely different. You can watch it reason through multi-step problems in a way that feels less like autocomplete and more like a colleague working through something.
  • Coding ability is a significant jump. I gave it a 400-line React component with a subtle race condition and it found it in one pass. Claude 3.5 Sonnet would have missed it.
  • The 1M token context window is real. I loaded an entire codebase and asked it to trace a bug across 12 files. It did it correctly.
  • Weakness: still hallucinates on niche library APIs. If you are using something with under 1000 GitHub stars, verify everything.

GPT-5:

  • Multimodal is where it shines. Image understanding, audio processing, and video analysis in a single model. I fed it a whiteboard photo from a meeting and it generated accurate user stories.
  • The reasoning mode is competitive with Claude but feels more rigid. It follows instructions precisely but sometimes misses the spirit of what you are asking.
  • The voice mode is genuinely useful now. I had a 20-minute conversation debugging an architecture problem while driving.
  • Weakness: the pricing. GPT-5 Pro is $200/month. That is a tough sell when Claude 4 does 80% of the same things at a lower price.

My workflow now:

  • Claude 4 for coding, writing, and analysis
  • GPT-5 for multimodal tasks and voice interactions
  • Still using Gemini 2.5 for quick throwaway queries because the free tier is generous

Sources:

  • Anthropic blog — Claude 4 release notes
  • OpenAI blog — GPT-5 technical report
  • Personal usage across 4 weeks of daily testing

What is your AI stack in 2026?

Community ReportAutomatedSource: Community ReportPublished: Apr 4, 2026, 2:39 AM

5 Comments

Hot take: the benchmarks do not matter anymore. Every frontier model is good enough for 95% of tasks. The differentiator is UX and integration, not raw capability.

I work at a DFW defense contractor and we are not allowed to use any of these. Everything has to be on-prem models. The gap between what we use internally and what is available commercially is painful.

The pricing war is heating up. Gemini 2.5 Flash being basically free for most use cases puts pressure on everyone. Google is playing the long game.

GPT-5 voice mode while driving is the sleeper feature nobody talks about. I plan my entire day during my commute on 635 now. It is like having a chief of staff in my car.

Claude 4 Opus genuinely changed my workflow. I am a backend engineer at a DFW fintech company and the codebase understanding is insane. Loaded our entire repo and it found a production bug we had been hunting for two weeks.