OpenAI's new benchmark shows Claude and GPT-5 matching human experts at real work tasks. The worst part? Models improved 300% ...