Don’t give up your day job, because a new study suggests that artificial intelligence (AI) is funnier than you.
In a new study designed to test the co-creative capabilities of large language models (LLMs), internet memes created by OpenAI’s GPT-4o model were, on average, rated funnier, more creative and more shareable than those created by humans, or by humans with chatbot assistance. However, when it came to the quality of top-rated memes, human-generated humor still prevailed.
The findings were uploaded Jan. 20 to the arXiv preprint server and presented at the 30th International Conference on Intelligent User Interfaces, which took place March 24 to 27, in Cagliari, Italy.
Commenting on the results on the social network BlueSky, Ethan Mollick, professor and co-director of the generative AI lab at Wharton University of Pennsylvania, said: “I regret to announce that the meme Turing Test has been passed.”
Related: Can you die from laughter?
The original Turing Test was proposed in 1950 by British mathematician Alan Turing as a benchmark for machine intelligence: if a human judge couldn’t distinguish between a human and a machine in conversation, the machine could be said to exhibit human-level intelligence.
While the study didn’t assess whether AI-generated memes were indistinguishable from those made by humans, it does raise interesting questions about how we evaluate creativity — especially as participants often rated AI-generated content more favorably.
Macheme learning
The researchers, from KTH Royal Institute of Technology, LMU Munich and TU Darmstadt, hadn’t set out to demonstrate the comedic capabilities of AI. Instead, they set out to explore co-creativity, specifically how LLMs can support humans with creative tasks like joke-writing.
They identified meme creation, with its mix of cultural references, sarcasm and low-stakes performance pressure, as the perfect test case. Memes typically take the form of captioned images that riff on familiar situations or pop culture. They’ve become a type of shared internet shorthand, used to make jokes or respond to current events in an easily digestible and often irreverent format.
“The complexity of humor makes it a rich area for exploring the dynamics of co-creativity, as collaborators must navigate these nuances to produce content that resonates with others,” the researchers wrote in the paper.
The experiment involved two parts. In the first, researchers recruited 124 participants and assigned them to one of two groups: one working alone and the other working with an AI chatbot assistant.
Participants were then given three rounds to generate captions for classic meme templates based on the topics of work, food and sports — including Fry from Futurama, Doge and Boromir (one does not simply walk into Mordor) templates. Those in the AI-assisted group could use a chatbot to brainstorm ideas but were responsible for selecting the best ideas and creating the final memes.
The human-only group created 335 memes, while 307 were produced by human-AI hybrid teams. An additional 150 memes were generated by GPT-4o for comparison.
A second group of 98 people then rated the memes on how funny, creative and shareable they were. The memes were randomized so raters didn’t know who or what had made them. Across all three categories, the AI-generated memes came out on top.
“Interestingly, memes created entirely by AI performed better than both human-only and human-AI collaborative memes in all areas on average,” the researchers wrote in the paper. “However, when looking at the top-performing memes, human-created ones were better in humor, while human-AI collaborations stood out in creativity and shareability.”
In other words, while the AI-generated memes scored highest on average, the memes identified as being “the funniest” were more often than not created by humans.
Content regeneration
The researchers credited the AI’s strong average scores to the fact LLMs are trained on huge volumes of internet content, making them good at mimicking broadly popular humor, but not so much at landing a real zinger of a punchline. “LLMs appeal to a broad taste in humor, but humans can be wittier still,” they wrote.
The study also examined the impact of AI assistance on productivity and perceived effort. Participants working with the chatbot generated more ideas than those working alone, but this didn’t always translate to funnier content.
According to the researchers, this is because while LLMs can help with idea generation, they don’t necessarily raise the bar on creative quality. This is particularly true for humor, which the researchers said required “timing, cultural context, shared knowledge, and the ability to subvert expectations.”
The researchers concluded: “While LLMs can generate humorous and contextually appropriate memes, they often face challenges in capturing nuanced cultural references and emotional subtleties inherent in human creativity. While AI can boost productivity and create content that appeals to a broad audience, human creativity remains crucial for content that connects on a deeper level.”