Coding Language Performance Bench

News

Self-invoking code benchmarks help you decide which LLMs to use for ...

As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That’s because though many LLMs have similar ...

OpenAI unveils GPT-5, a new flagship AI model with high accuracy and coding power

GPT-5 is rolling out today as the new default model for signed-in ChatGPT users, replacing GPT-4o. It auto-switches between ...

15d

Free Qwen 3 Coder AI Coding Assistant : Insanely Powerful and Open Source

Discover Qwen 3 Coder, Alibaba’s open-source LLM with 480B parameters, transforming AI coding with speed, precision, and ...

InfoWorld4d

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world ...

Claude Opus 4.1 AI Released : Code, Translate and Solve Advanced Problems

Explore Claude Opus 4.1, Anthropic’s groundbreaking new AI model with advanced coding, multilingual, and problem-solving ...

Design And Reuse14d

SWE-Bench-C Evaluation Framework

Software engineering (SWE) encompasses a wide range of activities including requirements analysis, design, code development, testing, deployment, and maintenance. These tasks constitute a significant ...

2don MSN

OpenAI unveils GPT-5 model, featuring improved coding and problem-solving chops

The powerful new multi-modal model boosts reasoning, code generation, and real-time inference, and will be available to both free and paid users. OpenAI on Thursday unveiled its highly anticipated GPT ...

OpenAI, Anthropic release new reasoning-optimized language models

OpenAI’s new algorithms, gpt-oss-120b and gpt-oss-20b, are available under an open-source license. Anthropic, for its part, ...

VentureBeat10mon

Microsoft’s GRIN-MoE AI model takes on coding and math, beating ...

GRIN MoE, Microsoft’s new AI model, achieves high performance on the MMLU benchmark with just 6.6 billion activated parameters, outperforming comparable models like Mixtral and LLaMA 3 70B.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results