Sebastian Crossa is the Co-founder of ZeroEval (YC S25), a platform to measure and optimize the quality of AI agents.
Coral Protocol’s multi-agent system achieved high performance on the GAIA Benchmark, with internal testing indicating a potential 34% performance gain. This result suggests an alternative to vertical ...
There are a lot of AI models, and it can be tricky to know which are best. Tech companies often use "benchmarks" to measure how an AI model performs. But industry observers are becoming increasingly ...
OpenAI's AI coding agent, Codex, can now spend anywhere from a few seconds to several hours on a task, thanks to a new, ...
A popular benchmark for measuring the performance of artificial intelligence models could be flawed, a group of Meta ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results