News
OpenAI's gpt-oss models deliver real-world performance without requiring expensive infrastructure. Do hallucination scores ...
As per the official statement, both models also perform strongly on tool use, few-shot function calling, CoT reasoning (as seen in results on the Tau-Bench agentic evaluation suite) and HealthBench ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results