Blog

A growing collection of curated awesome lists covering programming, AI, science, sustainability, digital rights, and beyond.

Awesome AI Benchmarks & Evaluation

Evaluation tools, benchmark datasets, leaderboards, frameworks, and resources for assessing model performance across reasoning, safety, robustness, multimodality, RAG, LLMs, and traditional ML tasks.