“We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.
State-of-the-art AIs get <10% accuracy and are highly overconfident.”
@ai_risk @scaleai