AI reasoning models like o3 and R1 generate up to 50 times more CO₂ than conventional LLMs

The more accurate we try to make AI models, the bigger their carbon footprint — with some prompts producing up to 50 times more carbon dioxide emissions than others, a new study has revealed.

Reasoning models, such as Anthropic’s Claude, OpenAI’s o3 and DeepSeek’s R1, are specialized large language models (LLMs) that dedicate more time and computing power to produce more accurate responses than their predecessors.

Yet, aside from some impressive results, these models have been shown to face severe limitations in their ability to crack complex problems. Now, a team of researchers has highlighted another constraint on the models’ performance — their exorbitant carbon footprint. They published their findings June 19 in the journal Frontiers in Communication.

“The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach, with explicit reasoning processes significantly driving up energy consumption and carbon emissions,” study first author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences in Germany, said in a statement. “We found that reasoning-enabled models produced up to 50 times more CO₂ emissions than concise response models.”

To answer the prompts given to them, LLMs break up language into tokens — word chunks that are converted into a string of numbers before being fed into neural networks. These neural networks are tuned using training data that calculates the probabilities of certain patterns appearing. They then use these probabilities to generate responses.

Reasoning models further attempt to boost accuracy using a process known as “chain-of-thought.” This is a technique that works by breaking down one complex problem into smaller, more digestible intermediary steps that follow a logical flow, mimicking how humans might arrive at the conclusion to the same problem.

Related: AI ‘hallucinates’ constantly, but there’s a solution

However, these models have significantly higher energy demands than conventional LLMs, posing a potential economic bottleneck for companies and users wishing to deploy them. Yet, despite some research into the environmental impacts of growing AI adoption more generally, comparisons between the carbon footprints of different models remain relatively rare.

The cost of reasoning

To examine the CO₂ emissions produced by different models, the scientists behind the new study asked 14 LLMs 1,000 questions across different topics. The different models had between 7 and 72 billion parameters.

The computations were performed using a Perun framework (which analyzes LLM performance and the energy it requires) on an NVIDIA A100 GPU. The team then converted energy usage into CO₂ by assuming each kilowatt-hour of energy produces 480 grams of CO₂.

Their results show that, on average, reasoning models generated 543.5 tokens per question compared to just 37.7 tokens for more concise models. These extra tokens — amounting to more computations — meant that the more accurate reasoning models produced more CO₂.

The most accurate model was the 72 billion parameter Cogito model, which answered 84.9% of the benchmark questions correctly. Cogito released three times the CO₂ emissions of similarly sized models made to generate answers more concisely.

“Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies,” said Dauner. “None of the models that kept emissions below 500 grams of CO₂ equivalent [total greenhouse gases released] achieved higher than 80% accuracy on answering the 1,000 questions correctly.”

But the issues go beyond accuracy. Questions that needed longer reasoning times, like in algebra or philosophy, caused emissions to spike six times higher than straightforward look-up queries.

The researchers’ calculations also show that the emissions depended on the models that were chosen. To answer 60,000 questions, DeepSeek’s 70 billion parameter R1 model would produce the CO₂ emitted by a round-trip flight between New York and London. Alibaba Cloud’s 72 billion parameter Qwen 2.5 model, however, would be able to answer these with similar accuracy rates for a third of the emissions.

The study’s findings aren’t definitive; emissions may vary depending on the hardware used and the energy grids used to supply their power, the researchers emphasized. But they should prompt AI users to think before they deploy the technology, the researchers noted..

“If users know the exact CO₂ cost of their AI-generated outputs, such as casually turning themselves into an action figure, they might be more selective and thoughtful about when and how they use these technologies,” Dauner said.

What's On

Ryan Routh Trump assassination attempt trial resumes with FBI testimony

Hulk Hogan’s Ex Linda Hogan Slams the Emmys for His ‘In Memoriam’ Omission

Arsenal vs. Athletic Club prediction: Best bet, odds, picks for Tuesday’s Champions League match

AI reasoning models like o3 and R1 generate up to 50 times more CO₂ than conventional LLMs

The cost of reasoning

‘This needs to happen fast’: Scientists race to cryopreserve a critically endangered tree before it goes extinct

‘Potentially hazardous’ asteroid Ryugu once had ‘flowing water’ inside it, surprising study claims

Volcanic ‘googly eyes’ stare into space from skull-like peninsula — Earth from space

People in Southeast Asia and China were mummifying their dead thousands of years before the Egyptians did, smoke-dried human remains reveal

1,900-year-old oil lamp that provided ‘light in the journey to the afterlife’ found in Roman cemetery in the Netherlands

Scientists measure the ‘natal kick’ that sent a baby black hole careening through space for the first time

Diet change could make brain cancer easier to treat, early study hints

‘Russian nesting doll’ virus hides inside a deadly fungus, making it even more dangerous to people

Suunto Run review — A great alternative to pricey Garmin watches

Leave A Reply Cancel Reply

Hulk Hogan’s Ex Linda Hogan Slams the Emmys for His ‘In Memoriam’ Omission

Arsenal vs. Athletic Club prediction: Best bet, odds, picks for Tuesday’s Champions League match

Nancy Mace seeks to remove Ilhan Omar from committee over Charlie Kirk remarks

Wendy Williams Attends New York Fashion Week Amid Guardianship Battle

Why the Mets’ playoff run really starts now

‘This needs to happen fast’: Scientists race to cryopreserve a critically endangered tree before it goes extinct

California Democrat turns to TikTok to reach Hispanic voters in governor’s race

Subscribe to Updates

What's On

AI reasoning models like o3 and R1 generate up to 50 times more CO₂ than conventional LLMs

The cost of reasoning

Keep Reading

Leave A Reply Cancel Reply