Awesome AI Benchmarks & Evaluation

Evaluation tools, benchmark datasets, leaderboards, frameworks, and resources for assessing model performance across reasoning, safety, robustness, multimodality, RAG, LLMs, and traditional ML tasks.

AI View on GitHub

← Back to all lists

This list

Live-fetched from GitHub and cached for faster reloads.

GitHub

Status: Loading…

Last fetched: —

Keep exploring

Jump back to discovery, or browse what you’ve saved.

Browse all lists Read the blog RSS feed

Related lists

All

Similar lists based on category.

Loading related lists…