MacMusic | PcMusic | 440 Software | 440 Forums | 440TV | Zicos

Navigation

Search

Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark

Thursday May 1, 2025. 03:00 PM , from Slashdot

An anonymous reader shares a report: A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals.

According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say.

'Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others,' said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch. 'This is gamification.' Further reading: Meta Got Caught Gaming AI Benchmarks.

Read more of this story at Slashdot.

Read more at Slashdot

https://slashdot.org/story/25/05/01/0525208/study-accuses-lm-arena-of-helping-top-ai-labs-game-its-b...

Related News

companies

Psilocybin effectively reduces symptoms of depression, study finds

BoingBoingMay 13

arena

Asking Chatbots For Short Answers Can Increase Hallucinations, Study Finds

benchmark

Attackers pwn charter airline helping Trump's deportation campaign

TheRegisterMay 12

study

Next Mafia game goes back to basics by rewinding to early 1900s Sicily

BoingBoingMay 12

top

AI Use Damages Professional Reputation, Study Suggests

accuses

Surfshark study probes data hunger of web browsers

ComputerWorldMay 9

helping

Scientists Have Explored Just 0.001% of Deep Ocean Floor, New Study Finds

scores

They Fell in Love Playing ‘Minecraft.’ Then the Game Became Their Wedding Venue

Wired: Tech.May 7

authors

Significance of Intel’s NPU benchmark claim questioned

ComputerWorldMay 7

leaderboard

Open source AI hiring models are weighted toward male candidates, study finds

ComputerWorldMay 6

some

How Riot Games is Fighting the War Against Video Game Hackers

private

AI Writing Assistants Guilty of ‘Cultural Stereotyping and Language Homogenization’ – Cornell Study

eWeekMay 5

not

Beijing's 'Made in China' Plan Is Narrowing Tech Gap, Study Finds

testing

AI provides exciting commentary to a game of Pong

BoingBoingMay 5

meta

Class Action Accuses Toyota of Illegally Sharing Drivers' Data

companies

Dying Satellites Can Drive Climate Change and Ozone Depletion, Study Finds

arena

Study: AI Adoption Benefits Underwhelm – ‘No Significant Impact on Earnings or Hours’

eWeekMay 2

benchmark

Leaderboard illusion: How big tech skewed AI rankings on Chatbot Arena

study

Leaderboard illusion: How big tech skewed AI rankings on Chatbot Arena

ComputerWorldMay 2

top

Meta's Reality Labs Has Now Lost Over $60 Billion Since 2020

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network

Current Date

Oct, Sun 19 - 03:26 CEST