Close Menu
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release

Subscribe to Updates

Get the latest USA news and updates directly to your inbox.

What's On

UNC set to hire Michael Malone as men’s basketball head coach: report

April 6, 2026
Lindsay Hubbard Shuts Down ‘Summer House’ Reunion Rumors Amid Amanda and West Fallout

Lindsay Hubbard Shuts Down ‘Summer House’ Reunion Rumors Amid Amanda and West Fallout

April 6, 2026
How Chiefs plan on using Justin Fields after surprise Jets trade

How Chiefs plan on using Justin Fields after surprise Jets trade

April 6, 2026
Facebook X (Twitter) Instagram
Trending
  • UNC set to hire Michael Malone as men’s basketball head coach: report
  • Lindsay Hubbard Shuts Down ‘Summer House’ Reunion Rumors Amid Amanda and West Fallout
  • How Chiefs plan on using Justin Fields after surprise Jets trade
  • Associated Press to trim global staff amid restructuring of US business
  • Trump vows to catch ‘leaker’ who revealed US could not initially reach F-15 pilot in Iran: ‘Give it up or go to jail’
  • Accused Charlie Kirk assassin files motion to ban cameras at next hearing
  • Dan Levy Says He Was Considering ‘Schitt’s Creek’ Sequel Before Catherine O’Hara’s Death
  • UCLA women’s basketball team gets hero’s welcome in return to campus after national championship
  • Privacy
  • Terms
  • Advertise
  • Contact Us
Join Us
USA TimesUSA Times
Newsletter Login
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release
USA TimesUSA Times
Home » Biomedical researchers are using AI to write and run code. What are the risks?
Biomedical researchers are using AI to write and run code. What are the risks?
Science

Biomedical researchers are using AI to write and run code. What are the risks?

News RoomBy News RoomApril 6, 20261 ViewsNo Comments

As the general public has embraced large language models (LLMs) such as ChatGPT, Claude and Gemini, scientists have been exploring how these artificial intelligence (AI) tools could enhance medical research.

Some argue that LLMs could dramatically boost researchers’ efficiency in completing certain types of medical studies, and research published in February in the journal Cell Reports Medicine exemplifies that vision for the technology.

The study used massive datasets of patient biomedical information to predict the risk of preterm birth in a given pregnancy. These types of predictions have been a powerful AI use case for years, and were possible with more traditional types of machine learning than LLMs employ. But this study was notable in that LLMs enabled junior researchers — a graduate student and a high school student — to efficiently generate very accurate code.


You may like

That code predicted a baby’s gestational age at birth and the likelihood of preterm birth. The AI’s output matched and, in one case, even beat analyses from expert teams who had used human-generated code to crunch the same data.

“What I saw with junior scientists here and how effective they could be truly inspired and amazed me,” said study co-author Marina Sirota, interim director of the Baker Computational Health Sciences Institute at the University of California, San Francisco.

One big promise of LLMs is to lower the barrier for researchers to produce code and conduct complex analyses — but it comes with risks. As AI quickly improves, researchers must grapple with myriad questions. What guardrails need to be established to ensure AI’s accuracy? How do we measure its output? And how will the role of human researchers evolve as these systems gain prominence?

How AI prediction works

Sirota’s team drew on data used in the Dialogue for Reverse Engineering Assessments and Methods (DREAM) Challenges, international competitions in which teams of scientists tackle complex biomedical problems using shared datasets.

Get the world’s most fascinating discoveries delivered straight to your inbox.

The open-source datasets included blood transcriptomics, which looks at RNA, a molecule that reflects which genes are active in the body. They included epigenetic information from placental cells, which described chemical tags that sit “on top of” DNA and control which genes can be switched on, and microbiome data describing the bacteria present in vaginal fluid samples.

These data points were flagged with the type of sample they came from — blood, placental tissue or vaginal fluid — and labeled with outcomes of interest, namely gestational age and preterm birth. Machine learning algorithms can then be trained to spot links between a sample’s source and its label. For example, they may reveal that microbiome samples with certain mixes of bacteria often come from people who have given birth early.

Once trained on a subset of data, the algorithm can be tested on samples that lack labels, to see if it can predict the label that should be there. For instance, it should flag samples with bacterial mixes similar to those in the training data linked to a higher risk of preterm birth.


What to read next

But we can speed that up as well — the cleaning part and normalization of data — with generative AI.

Marina Sirota, interim director of the Baker Computational Health Sciences Institute at the University of California, San Francisc

The final step is to evaluate the models’ accuracy and compare them. “Accuracy” in the context of machine learning has a specific definition: the number of correct predictions divided by the total number of predictions.

Human- vs. AI-generated code

The DREAM Challenge was aimed at uncovering links between these medical metrics and the risk of preterm birth. Some risk factors, including having infections during pregnancy, are already well known. But the DREAM Challenge wanted to see what signals might be gleaned from clinical samples, like blood.

It’s the kind of work that normally demands months of effort from trained bioinformaticians. But instead of writing the analysis code themselves, the junior researchers in the recent study gave each of eight LLMs a single prompt describing the data available and the labeling task at hand: predicting gestational age or preterm birth.

LLMs tested

  • ChatGPT o3-mini-high
  • ChatGPT 4o
  • DeepSeek R1
  • Gemini 2.0 FlashExpThink
  • Qwen 2.5 Coder
  • Llama 3.2
  • Phi-4
  • DeepSeek-R1-Distill-Qwen

With this simple prompting, four of the eight models — DeepSeekR1, Gemini, and ChatGPT’s o3-mini-high and 4o — produced code that ran successfully. The best performer, OpenAI’s o3-mini, was as accurate as the original human DREAM Challenge teams. For one task, which involved estimating gestational age from epigenetic data, it was more accurate than humans had been.

What’s more, the junior researchers generated results in about three months and submitted a manuscript describing their results within six months, whereas the same process took the original DREAM Challenge teams years.

“We got lucky with the review process here, but six months to generate the results and write the paper is pretty incredible, especially for a junior scientist,” Sirota told Live Science.

Preterm birth, before 37 complete weeks of pregnancy, affects roughly 11% of infants worldwide. Babies born too early are at higher risk than full-term babies for a host of health troubles, including but not limited to problems affecting their brains, eyes and digestive systems. Being able to predict which pregnant patients are more likely to give birth early could mean closer monitoring and treatments to protect the baby and make full-term birth more likely, experts say.

Beyond writing code

The data used in the Cell Reports Medicine paper started “in good shape,” Sirota noted, in tables that AI could easily read. “But we can speed that up as well — the cleaning part and normalization of data — with generative AI,” she said.

Sirota’s team is now exploring other LLM applications, including a new tool called Chat PTB (short for “preterm birth”) that they’ve developed. The Chat GPT-based tool is embedded in papers published by the March of Dimes research network, part of a nonprofit aimed at improving maternal and infant health. Instead of manually combing through this literature, researchers can now query Chat PTB and get synthesized answers with references — a task that used to take hours, compressed into seconds.

But tools like Chat PTB and the code-writing approach in Sirota’s study represent only the first wave. AI-enhanced medical research is moving toward “agentic” AI, meaning systems that don’t respond to only one prompt but instead carry out multistep research workflows with increasing autonomy.

How might AI affect the workflow of biomedical research? (Image credit: Getty Images/Moor Studio)

Instead of responding with only text, an agentic agent is capable of checking and iterating on its own work until it reaches its objective. It can also take action on a user’s behalf, like searching the internet and running code, rather than just writing it.

That shift toward greater AI autonomy and less human oversight brings both enormous potential and serious risk. In a January study published in the journal Nature Biomedical Engineering, researchers evaluated LLMs on 293 coding tasks drawn from 39 published biomedical studies, initially allowing the LLMs to come up with workflows on their own. They found that the overall accuracy came in below 40%.

Their solution was to separate planning from execution: They had the AI produce a step-by-step analysis plan that a human researcher reviewed before any code got written. The approach boosted the accuracy to 74%.

The goal of AI is not perfection, but to do better than people.

Ian McCulloh, professor of computer science at Johns Hopkins University’s Whiting School of Engineering

“The goal is not to ask researchers to blindly trust an AI system,” study co-author Zifeng Wang, who was a doctoral student at the University of Illinois Urbana-Champaign at the time of the study, told Live Science in an email.

Instead, the goal is to “design frameworks where the reasoning, planning, and intermediate steps are visible enough that researchers can supervise and validate the process,” said Wang, who is a co-founder of Keiji AI.

Why safeguards matter

These risks don’t mean researchers should shy away from AI, but they do need to apply the same rigor to AI-generated work that they would to any other collaborator’s output, scientists caution.

“The question is not whether LLMs accelerate science or create ‘AI slop,'” Ian McCulloh, a professor of computer science at Johns Hopkins University’s Whiting School of Engineering, told Live Science in an email. “The question is how we leverage this powerful technology within the scientific method.”

But McCulloh also cautioned against holding AI to an impossible standard. People tend to assume AI is error-prone and downplay human error, he said, when, in reality, both humans and machines make mistakes. He anecdotally described a consulting client who lamented AI’s 15% miss rate on a certain task, not realizing his human employees’ miss rate was 25%.

“The goal of AI is not perfection,” McCulloh said, “but to do better than people.”

That effort will involve agreeing on how to measure AI’s success. Dr. Ethan Goh, a physician-researcher at Stanford University, pointed out that health care still lacks standardized benchmarks for evaluating AI’s performance. Goh recently published a randomized trial in JAMA Network Open that studied how LLMs influence doctors’ reasoning in determining diagnoses.

Because LLMs are trained on such a vast amount of data, “benchmarks are so expensive to produce,” Goh told Live Science. What’s more, he said, AI improves so quickly that most commercial models start beating the few benchmarks that exist and rapidly render them useless. Amid these challenges, Goh’s team at Stanford’s AI Research and Science Evaluation (ARISE) Healthcare Network is working to develop such standards by the end of this year.

For all the uncertainty around standards and safeguards, the researchers who spoke with Live Science shared a common conviction: AI belongs in the lab, but not unsupervised.

“We have to be careful not to forget what we know in terms of the scientific process,” Sirota said. “But I think the opportunity is tremendous.”

Share. Facebook Twitter LinkedIn Telegram WhatsApp Email

Keep Reading

Artemis II moon flyby begins: How to watch and what to know

Artemis II moon flyby begins: How to watch and what to know

‘A cure on the horizon’: Are we finally close to ending type 1 diabetes?

‘A cure on the horizon’: Are we finally close to ending type 1 diabetes?

‘They could spend 4 or 5 hours per day underwater’: How humans adapted to the most challenging environments

‘They could spend 4 or 5 hours per day underwater’: How humans adapted to the most challenging environments

The hungriest black holes in the universe are running out of food, survey of 8,000 cosmic monsters reveals

The hungriest black holes in the universe are running out of food, survey of 8,000 cosmic monsters reveals

We went to Finland to hear about the new ‘sand battery’ that will turn stored renewable energy back into power for the electrical grid

We went to Finland to hear about the new ‘sand battery’ that will turn stored renewable energy back into power for the electrical grid

Beadnet dress: A 4,500-year-old ancient Egyptian funeral ‘gown’ that was in vogue during the Old Kingdom

Beadnet dress: A 4,500-year-old ancient Egyptian funeral ‘gown’ that was in vogue during the Old Kingdom

‘This generation’s moment’: How the Artemis missions will reframe humanity’s relationship with the moon

‘This generation’s moment’: How the Artemis missions will reframe humanity’s relationship with the moon

Diabetes rates are lower in high-altitude environments ‪‪—‬ and scientists may have discovered why

Diabetes rates are lower in high-altitude environments ‪‪—‬ and scientists may have discovered why

Antarctica hides huge caches of gold, silver, copper and iron. As the ice melts, countries may race to harvest them.

Antarctica hides huge caches of gold, silver, copper and iron. As the ice melts, countries may race to harvest them.

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Lindsay Hubbard Shuts Down ‘Summer House’ Reunion Rumors Amid Amanda and West Fallout

Lindsay Hubbard Shuts Down ‘Summer House’ Reunion Rumors Amid Amanda and West Fallout

April 6, 2026
How Chiefs plan on using Justin Fields after surprise Jets trade

How Chiefs plan on using Justin Fields after surprise Jets trade

April 6, 2026
Associated Press to trim global staff amid restructuring of US business

Associated Press to trim global staff amid restructuring of US business

April 6, 2026
Trump vows to catch ‘leaker’ who revealed US could not initially reach F-15 pilot in Iran: ‘Give it up or go to jail’

Trump vows to catch ‘leaker’ who revealed US could not initially reach F-15 pilot in Iran: ‘Give it up or go to jail’

April 6, 2026

Subscribe to News

Get the latest USA news and updates directly to your inbox.

Latest News
Accused Charlie Kirk assassin files motion to ban cameras at next hearing

Accused Charlie Kirk assassin files motion to ban cameras at next hearing

April 6, 2026
Dan Levy Says He Was Considering ‘Schitt’s Creek’ Sequel Before Catherine O’Hara’s Death

Dan Levy Says He Was Considering ‘Schitt’s Creek’ Sequel Before Catherine O’Hara’s Death

April 6, 2026
UCLA women’s basketball team gets hero’s welcome in return to campus after national championship

UCLA women’s basketball team gets hero’s welcome in return to campus after national championship

April 6, 2026
Facebook X (Twitter) Pinterest WhatsApp TikTok Instagram
© 2026 USA Times. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.