Even some of the best AI cant beat this new benchmark
techcrunch.com
In BriefPosted:3:29 PM PST January 23, 2025Image Credits:Getty ImagesEven some of the best AI cant beat this new benchmarkThe nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems.The benchmark, called Humanitys Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences.To make the evaluation tougher, the questions are in multiple formats, including formats that incorporate diagrams and images. In a preliminary study, not a single publicly available flagship AI system managed to score better than 10% on Humanitys Last Exam. CAIS and Scale AI say they plan open up the benchmark to the research community so that researchers can dig deeper into the variations and evaluate new AI models.Topics
0 Comments ·0 Shares ·58 Views