Awesome AI Benchmarks & Evaluation
Evaluation tools, benchmark datasets, leaderboards, frameworks, and resources for assessing model performance across reasoning, safety, robustness, multimodality, RAG, LLMs, and traditional ML tasks.
We couldn’t load the README right now. You can still open it directly on GitHub:
https://github.com/awesomelistsio/awesome-ai-benchmarks-evaluation