Close Menu
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release

Subscribe to Updates

Get the latest USA news and updates directly to your inbox.

What's On
Eric Swalwell sent X-rated videos, pervy messages while married — as he admits affairs

Eric Swalwell sent X-rated videos, pervy messages while married — as he admits affairs

May 5, 2026
Ohio daycare worker sentenced after taping toddler’s eyes, binding hands

Ohio daycare worker sentenced after taping toddler’s eyes, binding hands

May 5, 2026
Yes, Selena Gomez’s Exact Rich-Girl Blouse Is Still in Stock — And it Instantly Elevates Jeans

Yes, Selena Gomez’s Exact Rich-Girl Blouse Is Still in Stock — And it Instantly Elevates Jeans

May 5, 2026
Facebook X (Twitter) Instagram
Trending
  • Eric Swalwell sent X-rated videos, pervy messages while married — as he admits affairs
  • Ohio daycare worker sentenced after taping toddler’s eyes, binding hands
  • Yes, Selena Gomez’s Exact Rich-Girl Blouse Is Still in Stock — And it Instantly Elevates Jeans
  • Knicks have never seen OG Anunoby quite like this
  • California governor’s race on a knife edge as dark horse surges and ballots hit mailboxes
  • Fireworks factory blast in Chine kills 21, injures 61 others: state media
  • Beyoncé stuns in royal return to Met Gala 2026 after a decade away: ‘It’s surreal’
  • Vanessa Bryant Addresses Viral Speculation She’s Pregnant and Getting ‘Remarried’
  • Privacy
  • Terms
  • Advertise
  • Contact Us
Join Us
USA TimesUSA Times
Newsletter Login
  • Home
  • United States
  • World
  • Politics
  • Business
  • Lifestyle
  • Entertainment
  • Health
  • Science
  • Tech
  • Sports
  • More
    • Web Stories
    • Editor’s Picks
    • Press Release
USA TimesUSA Times
Home » Biomedical researchers are using AI to write and run code. What are the risks?
Biomedical researchers are using AI to write and run code. What are the risks?
Science

Biomedical researchers are using AI to write and run code. What are the risks?

News RoomBy News RoomApril 6, 20261 ViewsNo Comments

As the general public has embraced large language models (LLMs) such as ChatGPT, Claude and Gemini, scientists have been exploring how these artificial intelligence (AI) tools could enhance medical research.

Some argue that LLMs could dramatically boost researchers’ efficiency in completing certain types of medical studies, and research published in February in the journal Cell Reports Medicine exemplifies that vision for the technology.

The study used massive datasets of patient biomedical information to predict the risk of preterm birth in a given pregnancy. These types of predictions have been a powerful AI use case for years, and were possible with more traditional types of machine learning than LLMs employ. But this study was notable in that LLMs enabled junior researchers — a graduate student and a high school student — to efficiently generate very accurate code.


You may like

That code predicted a baby’s gestational age at birth and the likelihood of preterm birth. The AI’s output matched and, in one case, even beat analyses from expert teams who had used human-generated code to crunch the same data.

“What I saw with junior scientists here and how effective they could be truly inspired and amazed me,” said study co-author Marina Sirota, interim director of the Baker Computational Health Sciences Institute at the University of California, San Francisco.

One big promise of LLMs is to lower the barrier for researchers to produce code and conduct complex analyses — but it comes with risks. As AI quickly improves, researchers must grapple with myriad questions. What guardrails need to be established to ensure AI’s accuracy? How do we measure its output? And how will the role of human researchers evolve as these systems gain prominence?

How AI prediction works

Sirota’s team drew on data used in the Dialogue for Reverse Engineering Assessments and Methods (DREAM) Challenges, international competitions in which teams of scientists tackle complex biomedical problems using shared datasets.

Get the world’s most fascinating discoveries delivered straight to your inbox.

The open-source datasets included blood transcriptomics, which looks at RNA, a molecule that reflects which genes are active in the body. They included epigenetic information from placental cells, which described chemical tags that sit “on top of” DNA and control which genes can be switched on, and microbiome data describing the bacteria present in vaginal fluid samples.

These data points were flagged with the type of sample they came from — blood, placental tissue or vaginal fluid — and labeled with outcomes of interest, namely gestational age and preterm birth. Machine learning algorithms can then be trained to spot links between a sample’s source and its label. For example, they may reveal that microbiome samples with certain mixes of bacteria often come from people who have given birth early.

Once trained on a subset of data, the algorithm can be tested on samples that lack labels, to see if it can predict the label that should be there. For instance, it should flag samples with bacterial mixes similar to those in the training data linked to a higher risk of preterm birth.


What to read next

But we can speed that up as well — the cleaning part and normalization of data — with generative AI.

Marina Sirota, interim director of the Baker Computational Health Sciences Institute at the University of California, San Francisc

The final step is to evaluate the models’ accuracy and compare them. “Accuracy” in the context of machine learning has a specific definition: the number of correct predictions divided by the total number of predictions.

Human- vs. AI-generated code

The DREAM Challenge was aimed at uncovering links between these medical metrics and the risk of preterm birth. Some risk factors, including having infections during pregnancy, are already well known. But the DREAM Challenge wanted to see what signals might be gleaned from clinical samples, like blood.

It’s the kind of work that normally demands months of effort from trained bioinformaticians. But instead of writing the analysis code themselves, the junior researchers in the recent study gave each of eight LLMs a single prompt describing the data available and the labeling task at hand: predicting gestational age or preterm birth.

LLMs tested

  • ChatGPT o3-mini-high
  • ChatGPT 4o
  • DeepSeek R1
  • Gemini 2.0 FlashExpThink
  • Qwen 2.5 Coder
  • Llama 3.2
  • Phi-4
  • DeepSeek-R1-Distill-Qwen

With this simple prompting, four of the eight models — DeepSeekR1, Gemini, and ChatGPT’s o3-mini-high and 4o — produced code that ran successfully. The best performer, OpenAI’s o3-mini, was as accurate as the original human DREAM Challenge teams. For one task, which involved estimating gestational age from epigenetic data, it was more accurate than humans had been.

What’s more, the junior researchers generated results in about three months and submitted a manuscript describing their results within six months, whereas the same process took the original DREAM Challenge teams years.

“We got lucky with the review process here, but six months to generate the results and write the paper is pretty incredible, especially for a junior scientist,” Sirota told Live Science.

Preterm birth, before 37 complete weeks of pregnancy, affects roughly 11% of infants worldwide. Babies born too early are at higher risk than full-term babies for a host of health troubles, including but not limited to problems affecting their brains, eyes and digestive systems. Being able to predict which pregnant patients are more likely to give birth early could mean closer monitoring and treatments to protect the baby and make full-term birth more likely, experts say.

Beyond writing code

The data used in the Cell Reports Medicine paper started “in good shape,” Sirota noted, in tables that AI could easily read. “But we can speed that up as well — the cleaning part and normalization of data — with generative AI,” she said.

Sirota’s team is now exploring other LLM applications, including a new tool called Chat PTB (short for “preterm birth”) that they’ve developed. The Chat GPT-based tool is embedded in papers published by the March of Dimes research network, part of a nonprofit aimed at improving maternal and infant health. Instead of manually combing through this literature, researchers can now query Chat PTB and get synthesized answers with references — a task that used to take hours, compressed into seconds.

But tools like Chat PTB and the code-writing approach in Sirota’s study represent only the first wave. AI-enhanced medical research is moving toward “agentic” AI, meaning systems that don’t respond to only one prompt but instead carry out multistep research workflows with increasing autonomy.

How might AI affect the workflow of biomedical research? (Image credit: Getty Images/Moor Studio)

Instead of responding with only text, an agentic agent is capable of checking and iterating on its own work until it reaches its objective. It can also take action on a user’s behalf, like searching the internet and running code, rather than just writing it.

That shift toward greater AI autonomy and less human oversight brings both enormous potential and serious risk. In a January study published in the journal Nature Biomedical Engineering, researchers evaluated LLMs on 293 coding tasks drawn from 39 published biomedical studies, initially allowing the LLMs to come up with workflows on their own. They found that the overall accuracy came in below 40%.

Their solution was to separate planning from execution: They had the AI produce a step-by-step analysis plan that a human researcher reviewed before any code got written. The approach boosted the accuracy to 74%.

The goal of AI is not perfection, but to do better than people.

Ian McCulloh, professor of computer science at Johns Hopkins University’s Whiting School of Engineering

“The goal is not to ask researchers to blindly trust an AI system,” study co-author Zifeng Wang, who was a doctoral student at the University of Illinois Urbana-Champaign at the time of the study, told Live Science in an email.

Instead, the goal is to “design frameworks where the reasoning, planning, and intermediate steps are visible enough that researchers can supervise and validate the process,” said Wang, who is a co-founder of Keiji AI.

Why safeguards matter

These risks don’t mean researchers should shy away from AI, but they do need to apply the same rigor to AI-generated work that they would to any other collaborator’s output, scientists caution.

“The question is not whether LLMs accelerate science or create ‘AI slop,'” Ian McCulloh, a professor of computer science at Johns Hopkins University’s Whiting School of Engineering, told Live Science in an email. “The question is how we leverage this powerful technology within the scientific method.”

But McCulloh also cautioned against holding AI to an impossible standard. People tend to assume AI is error-prone and downplay human error, he said, when, in reality, both humans and machines make mistakes. He anecdotally described a consulting client who lamented AI’s 15% miss rate on a certain task, not realizing his human employees’ miss rate was 25%.

“The goal of AI is not perfection,” McCulloh said, “but to do better than people.”

That effort will involve agreeing on how to measure AI’s success. Dr. Ethan Goh, a physician-researcher at Stanford University, pointed out that health care still lacks standardized benchmarks for evaluating AI’s performance. Goh recently published a randomized trial in JAMA Network Open that studied how LLMs influence doctors’ reasoning in determining diagnoses.

Because LLMs are trained on such a vast amount of data, “benchmarks are so expensive to produce,” Goh told Live Science. What’s more, he said, AI improves so quickly that most commercial models start beating the few benchmarks that exist and rapidly render them useless. Amid these challenges, Goh’s team at Stanford’s AI Research and Science Evaluation (ARISE) Healthcare Network is working to develop such standards by the end of this year.

For all the uncertainty around standards and safeguards, the researchers who spoke with Live Science shared a common conviction: AI belongs in the lab, but not unsupervised.

“We have to be careful not to forget what we know in terms of the scientific process,” Sirota said. “But I think the opportunity is tremendous.”

Share. Facebook Twitter LinkedIn Telegram WhatsApp Email

Keep Reading

‘They weren’t burned by accident’: Mysterious green rocks discovered high in Pyrenees reveal ancient copper-smelting camp

‘They weren’t burned by accident’: Mysterious green rocks discovered high in Pyrenees reveal ancient copper-smelting camp

Estrogen in both the male and female brain shapes responses to trauma, study suggests

Estrogen in both the male and female brain shapes responses to trauma, study suggests

NASA just released 12,000 more Artemis II photos ‪—‬ here are a dozen of our favorites

NASA just released 12,000 more Artemis II photos ‪—‬ here are a dozen of our favorites

Hantavirus infects at least 1 on cruise ship, while 5 others fall ill: Here’s what we know

Hantavirus infects at least 1 on cruise ship, while 5 others fall ill: Here’s what we know

‘Moved to tears when we saw them’: Why archaeologists re-created gorgeous outfits from centuries-old Christian Nubian murals

‘Moved to tears when we saw them’: Why archaeologists re-created gorgeous outfits from centuries-old Christian Nubian murals

Athena bowl: A silver and gold vessel of the goddess and her owl, buried in a German forest 2,000 years ago

Athena bowl: A silver and gold vessel of the goddess and her owl, buried in a German forest 2,000 years ago

‘Sacrifice zones’ around critical mineral mines are rife with pollution, child workers and birth defects

‘Sacrifice zones’ around critical mineral mines are rife with pollution, child workers and birth defects

The Eta Aquariid meteor shower peaks this week: How to see ‘shooting stars’ dropped by Halley’s Comet

The Eta Aquariid meteor shower peaks this week: How to see ‘shooting stars’ dropped by Halley’s Comet

Scientists detect an enormous halo around the iconic Sombrero Galaxy — Space photo of the week

Scientists detect an enormous halo around the iconic Sombrero Galaxy — Space photo of the week

Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Ohio daycare worker sentenced after taping toddler’s eyes, binding hands

Ohio daycare worker sentenced after taping toddler’s eyes, binding hands

May 5, 2026
Yes, Selena Gomez’s Exact Rich-Girl Blouse Is Still in Stock — And it Instantly Elevates Jeans

Yes, Selena Gomez’s Exact Rich-Girl Blouse Is Still in Stock — And it Instantly Elevates Jeans

May 5, 2026
Knicks have never seen OG Anunoby quite like this

Knicks have never seen OG Anunoby quite like this

May 5, 2026
California governor’s race on a knife edge as dark horse surges and ballots hit mailboxes

California governor’s race on a knife edge as dark horse surges and ballots hit mailboxes

May 5, 2026

Subscribe to News

Get the latest USA news and updates directly to your inbox.

Latest News
Fireworks factory blast in Chine kills 21, injures 61 others: state media

Fireworks factory blast in Chine kills 21, injures 61 others: state media

May 5, 2026
Beyoncé stuns in royal return to Met Gala 2026 after a decade away: ‘It’s surreal’

Beyoncé stuns in royal return to Met Gala 2026 after a decade away: ‘It’s surreal’

May 5, 2026
Vanessa Bryant Addresses Viral Speculation She’s Pregnant and Getting ‘Remarried’

Vanessa Bryant Addresses Viral Speculation She’s Pregnant and Getting ‘Remarried’

May 5, 2026
Facebook X (Twitter) Pinterest WhatsApp TikTok Instagram
© 2026 USA Times. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.