LLM Leaderboard Platform

LizAbout 2 min

LLM Leaderboard Platform

The Platform for LLM evals
LLM Organization and Product

1. The Platform for LLM evals

1.1. LMSYS

Organization:
LMSYS and UC Berkeley SkyLab

Evaluate Way:
Chatbot Arena - a crowdsourced, randomized battle platform.
Evaluate LLMs by human preference in the real-world.
Ask any question to two anonymous models (e.g., ChatGPT, Gemini, Claude, Llama) and vote for the better one!

Evaluate Result:
Arena Elo
Elo, short for Elo rating system, is named after its inventor, Hungarian-American physicist Arpad Elo. It was originally developed for ranking chess players in the 1960s.

Website:
https://chat.lmsys.org/open in new window
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboardopen in new window

1.2. LiveBench

Organization:
Abacus.AI

Properties:

LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses.
Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge.
LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.

Website:
https://livebench.ai/open in new window

1.3. Fine-tuning Index

The Fine-tuning Leaderboard compares the performance of GPT-4 and popular open-source models that They fine-tuned across a series of tasks.

Remarkably, most of the fine-tuned open-source models surpass GPT-4 with Llama-3, Phi-3 and Zephyr demonstrating the strongest performance.

Website:
https://predibase.com/fine-tuning-indexopen in new window

1.4. SuperCLUE

Domestic Leaderboard

Website:
https://www.superclueai.com/open in new window

2. LLM Organization and Product

Organization	Product	OpenSource	Location
Foreign
OpenAI	GPT	Close	US, UK
Google	Gemini/Bard/Gemma/PaLM	Open	-
Anthropic	Claude	Close	US, UK
Meta	Llama/Alpaca	Open	-
Microsoft	Phi/WizardLM/Bing	Open	-
Mistral	Mistral/Mixtral	Open	US, France
HuggingFace	Zephyr	Open	-
Cohere	Command R	Open	-
NousResearch	Nous/OpenHermes	Open	-
LMSYS	Vicuna/FastChat	-	-
Reka AI	Reka	Open	US, UK, Singapore
Nvidia	Nemotron/NV/ChipNeMo	Open	-
Nexusflow	Starling	Open	Palo Alto, CA
Databricks/MosaicML	DBRX/Dolly/ MPT	Open	Many
OpenChat	OpenChat	-	-
Snowflake	Sonwflake	Close	-
UC Berkeley	Starling/Koala/Gorilla	Close	-
Perplexity AI	pplx	Close	-
Cognitive Computations	Dolphin	Open	Personal
Upstage AI	SOLAR	Open	Korea
TII	falcon	Open	Abu Dhabi, UAE
Together AI	StripedHyena	Open	San Francisco
Allen AI	Tulu/OLMo	Open	Seattle, WA, United States
Nomic AI	GPT4All	Open	New York
RWKV	RWKV	Open	-
OpenAssistant	OpenAssistant	Open	-
Stability AI	StableLM	Open	Canada
Bloomberg	BloombergGPT	Close	US, UK
inflection.ai	Inflection	Close	San Francisco Bay Area
xAI（Elon Mask）	Grōk	Close	San Francisco Bay Area, California, U.S
Scale	Scale	Close	San Francisco
Character AI	Character	Close	Menlo Park, CA
Domestic
Alibaba	Qwen	Open	Hangzhou
Tsinghua/Zhipu AI	GLM/ChatGLM	Open	Beijing
Baichuan	Baichuan	Open	Beijing
ModelBest	CPM	Open	Beijing
01 AI	Yi	Open	Beijing
DeepSeek AI	DeepSeek	Open	Hangzhou
Colossal AI	Colossal	Open	Beijing
Moonshot	Moonshot	Close	Beijing
Step	Step	Close	Shanghai
MiniMax	ABAB	Close	Shanghai
Baidu	ERNIE	Close	Beijing
SenseTime	SenseChat	Close	Shanghai
Bytedance	Doubao/Coze	Close	Beijing
Tencent	Hunyuan	Close	Shenzhen
360	360gpt	Close	Beijing
XVERSE	XVERSE	Open	Shenzhen

LLM Leaderboard Platform

# LLM Leaderboard Platform

# 1. The Platform for LLM evals

# 1.1. LMSYS

# 1.2. LiveBench

# 1.3. Fine-tuning Index

# 1.4. SuperCLUE

# 2. LLM Organization and Product

LLM Leaderboard Platform

1. The Platform for LLM evals

1.1. LMSYS

1.2. LiveBench

1.3. Fine-tuning Index

1.4. SuperCLUE

2. LLM Organization and Product