Google's Gemini 3 scores 48.4% on a difficult AI reasoning exam, but this does not indicate artificial general intelligence

livescience.com — February 27, 2026 at 09:01 PM UTC

Google's Gemini 3 achieved a 48.4% score on "Humanity's Last Exam," a difficult AI test, but this does not signal artificial general intelligence. The PhD-level exam, launched in January 2025, features 2,500 questions across over 100 subjects, designed to prevent simple internet retrieval and test deep reasoning. Human experts score around 90% on the exam, which aims to measure AI capabilities against human-level knowledge across diverse domains.

With a significance score of 4.8, this news ranks in the top 2.1% of today's 29588 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 10,000+ subscribers: