Coding Language Performance Bench

News

Self-invoking code benchmarks help you decide which LLMs to use for ...

As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That’s because though many LLMs have similar ...

OpenAI unveils GPT-5, a new flagship AI model with high accuracy and coding power

GPT-5 is rolling out today as the new default model for signed-in ChatGPT users, replacing GPT-4o. It auto-switches between ...

16d

Free Qwen 3 Coder AI Coding Assistant : Insanely Powerful and Open Source

Discover Qwen 3 Coder, Alibaba’s open-source LLM with 480B parameters, transforming AI coding with speed, precision, and ...

InfoWorld5d

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world ...

Claude Opus 4.1 AI Released : Code, Translate and Solve Advanced Problems

Explore Claude Opus 4.1, Anthropic’s groundbreaking new AI model with advanced coding, multilingual, and problem-solving ...

Design And Reuse15d

SWE-Bench-C Evaluation Framework

Software engineering (SWE) encompasses a wide range of activities including requirements analysis, design, code development, testing, deployment, and maintenance. These tasks constitute a significant ...

18d

Alibaba rolls out new AI coding model Qwen3-Coder, says it’s their most powerful

The company also launched a command-line tool based on Gemini Code, optimized for agentic coding and compatible with popular ...

23d

Exhausted man defeats AI model in world coding championship

While Dębiak won 500,000 yen and survived his ordeal better than the legendary steel driver, the AtCoder World Tour Finals ...

VentureBeat10mon

Microsoft’s GRIN-MoE AI model takes on coding and math, beating ...

GRIN MoE, Microsoft’s new AI model, achieves high performance on the MMLU benchmark with just 6.6 billion activated parameters, outperforming comparable models like Mixtral and LLaMA 3 70B.

OpenAI, Anthropic release new reasoning-optimized language models

OpenAI’s new algorithms, gpt-oss-120b and gpt-oss-20b, are available under an open-source license. Anthropic, for its part, ...

ZDNet3y

Programming languages: How Google is improving C++ memory safety

But one version of *Scan caused a memory regressions of 8% in Speedometer2 browser performance benchmark tests. *Scan in the render process regressed memory consumption by about 12%, Google notes.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results