Close Menu
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release

Subscribe to Updates

Get the latest USA news and updates directly to your inbox.

What's On
Replicate Carolyn Bessette Kennedy’s Effortless ’90s Style With These 13 Minimalist Pieces

Replicate Carolyn Bessette Kennedy’s Effortless ’90s Style With These 13 Minimalist Pieces

February 27, 2026
Pirates prospect Konnor Griffin, 19, celebrates ‘first adventure with wife’ before Opening Day push

Pirates prospect Konnor Griffin, 19, celebrates ‘first adventure with wife’ before Opening Day push

February 27, 2026
March 2026 night sky — what’s happening and what products do you need?

March 2026 night sky — what’s happening and what products do you need?

February 27, 2026
Facebook X (Twitter) Instagram
Trending
  • Replicate Carolyn Bessette Kennedy’s Effortless ’90s Style With These 13 Minimalist Pieces
  • Pirates prospect Konnor Griffin, 19, celebrates ‘first adventure with wife’ before Opening Day push
  • March 2026 night sky — what’s happening and what products do you need?
  • New sperm discovery finds why it may be easier to get pregnant in the summer
  • Block shares spike 20% after Jack Dorsey orders sweeping layoffs to ride AI wave
  • Exclusive | NY reps silent on accepting pay while DHS workers go unpaid during shutdown
  • HHS Secretary Robert F. Kennedy Jr kicks off ‘BBQ tour’ at Texas mainstay
  • Megyn Kelly Reveals Where She Stands With Candace Owens Amid Erika Kirk Drama in Resurfaced Clip
  • Privacy
  • Terms
  • Advertise
  • Contact Us
Join Us
USA TimesUSA Times
Newsletter Login
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release
USA TimesUSA Times
Home » Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI
Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI
Science

Acing this new AI exam — which its creators say is the toughest in the world — might point to the first signs of AGI

News RoomBy News RoomFebruary 27, 20262 ViewsNo Comments

Researchers at the Center for AI Safety and Scale AI have published “Humanity’s Last Exam” — a test designed to measure how close today’s most powerful artificial intelligence (AI) models are to meeting or exceeding human-level knowledge across several domains.

The test was launched in January 2025, but scientists outlined the framework and their thinking behind its design for the first time in a new study published Jan. 28 in the journal Nature. It contains a corpus of 2,500 questions across more than 100 subjects, with input from more than 1,000 subject-matter experts from 500 institutions across 50 countries.

The exam consists of multiple-choice and short-answer questions, each of which has a known solution that is “unambiguous and easily verifiable but cannot be quickly answered by internet retrieval.”


You may like

At launch, the researchers tested OpenAI’s GPT-4o and o1 models, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet and DeepSeek R1. OpenAI’s o1 system notched the top spot with a score of just 8.3%.

Despite this poor performance, the researchers wrote at the time that “given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025.”

As of Feb. 12, 2026, the highest score achieved so far is 48.4%, set by Google’s Gemini 3 Deep Think. Human experts, meanwhile, score around 90% in their respective domains.

Testing the smartest machines in the world

Humanity’s Last Exam was intentionally designed to be extremely difficult for AI models. During early development, the researchers put out a global call for submissions from subject matter experts across numerous domains.

Get the world’s most fascinating discoveries delivered straight to your inbox.

The researchers enforced strict submission criteria requiring questions to be precise, unambiguous, solvable and non-searchable. They didn’t want models to cheat by performing a simple web search, or for any of the questions to already appear online — thus increasing the likelihood a given model would have the answer in its training dataset.

Each question submitted was then fed to the AI models. The team automatically rejected any questions the models could answer correctly.

More than 70,000 submissions were attempted, resulting in approximately 13,000 questions that stumped LLMs. These were then vetted by a team of subject matter experts, approved by the research team, and presented to the scientific community for open feedback.


You may like

Ultimately, the researchers narrowed the total submissions down to 2,500 questions that generally fall within the realm of PhD-level testing.

An example of a trivia question in the exam is: “In Greek mythology, who was Jason’s maternal great-grandfather?”

Meanwhile, an example of a physics question asks for the relationship between different forces during motion in a scenario where a block is placed on a horizontal rail (and can slide frictionlessly) while also being attached to a rigid, massless rod of an unknown length.

The breadth of questions and scope of subjects covered by Humanity’s Last Exam sets it apart from similar benchmarking tools, its creators say.

Common tests, such as the Massive Multitask Language Understanding (MMLU) dataset, which was authored with participation from Center for AI Safety founder Dan Hendrycks, only test a small subset of expert-level domain knowledge, primarily focusing on coding and mathematics.

Even state-of-the-art benchmarks such as Francois Chollet’s ARC-AGI suite struggle to outpace the memorization and searchability problems that the creators of Humanity’s Last Exam suggest the new test addresses. Gemini’s Deep Think, for example, achieved 84.6% on the ARC-AGI-2 benchmark, just a week after failing to reach 50% on the HLE test.

The ultimate prize is general intelligence

Humanity’s Last Exam likely represents the AI world’s best attempt to date at measuring the broad-spectrum capabilities of modern AI models relative to human experts, but the study’s authors categorically state that achieving a high score on the HLE is in no way indicative of the arrival of artificial general intelligence (AGI).

“High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or artificial general intelligence,” the scientists said in the study.

“Doing well on HLE is a necessary, but not a sufficient criterion to say that machines have reached true intelligence,” Manuel Schottdorf, a neuroscientist at the University of Delaware’s Department of Psychological and Brain Sciences, said in a recent statement. Schottdorf is one of the many experts whose question was accepted into the HLE’s corpus.

“They will have to be good enough to solve these questions, but that as a fact alone can’t allow us to conclude that machines are truly intelligent.”

Share. Facebook Twitter LinkedIn Telegram WhatsApp Email

Keep Reading

March 2026 night sky — what’s happening and what products do you need?

March 2026 night sky — what’s happening and what products do you need?

Just in time for the total lunar eclipse, this beginner-friendly telescope is now 0 off at Amazon

Just in time for the total lunar eclipse, this beginner-friendly telescope is now $100 off at Amazon

NASA announces sweeping overhaul of Artemis return to moon, targeting a 2028 landing and a 2027 in-orbit docking flight

NASA announces sweeping overhaul of Artemis return to moon, targeting a 2028 landing and a 2027 in-orbit docking flight

The sun just experienced its first ‘spotless days’ in 4 years — but we’re not in the clear yet

The sun just experienced its first ‘spotless days’ in 4 years — but we’re not in the clear yet

Inherited diseases don’t work like we thought they did

Inherited diseases don’t work like we thought they did

‘It doesn’t lie. So who are you?’: What happens when DNA tests show a woman is not the mother of the child she gave birth to?

‘It doesn’t lie. So who are you?’: What happens when DNA tests show a woman is not the mother of the child she gave birth to?

Science history: Carbon-14 is discovered, opening a window into past civilizations — Feb. 27, 1940

Science history: Carbon-14 is discovered, opening a window into past civilizations — Feb. 27, 1940

Humans and Neanderthals interbred — but it was mostly male Neanderthals and female humans who coupled up, study finds

Humans and Neanderthals interbred — but it was mostly male Neanderthals and female humans who coupled up, study finds

Giant ‘spiderwebs’ on Mars contain tiny egg-like structures that scientists ‘can’t quite explain,’ NASA rover reveals

Giant ‘spiderwebs’ on Mars contain tiny egg-like structures that scientists ‘can’t quite explain,’ NASA rover reveals

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Pirates prospect Konnor Griffin, 19, celebrates ‘first adventure with wife’ before Opening Day push

Pirates prospect Konnor Griffin, 19, celebrates ‘first adventure with wife’ before Opening Day push

February 27, 2026
March 2026 night sky — what’s happening and what products do you need?

March 2026 night sky — what’s happening and what products do you need?

February 27, 2026
New sperm discovery finds why it may be easier to get pregnant in the summer

New sperm discovery finds why it may be easier to get pregnant in the summer

February 27, 2026
Block shares spike 20% after Jack Dorsey orders sweeping layoffs to ride AI wave

Block shares spike 20% after Jack Dorsey orders sweeping layoffs to ride AI wave

February 27, 2026

Subscribe to News

Get the latest USA news and updates directly to your inbox.

Latest News
Exclusive | NY reps silent on accepting pay while DHS workers go unpaid during shutdown

Exclusive | NY reps silent on accepting pay while DHS workers go unpaid during shutdown

February 27, 2026
HHS Secretary Robert F. Kennedy Jr kicks off ‘BBQ tour’ at Texas mainstay

HHS Secretary Robert F. Kennedy Jr kicks off ‘BBQ tour’ at Texas mainstay

February 27, 2026
Megyn Kelly Reveals Where She Stands With Candace Owens Amid Erika Kirk Drama in Resurfaced Clip

Megyn Kelly Reveals Where She Stands With Candace Owens Amid Erika Kirk Drama in Resurfaced Clip

February 27, 2026
Facebook X (Twitter) Pinterest WhatsApp TikTok Instagram
© 2026 USA Times. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.