Catching bad content in the age of AI

Why haven’t tech companies improved at content moderation?

Tate Ryan-Mosleyarchive page

May 15, 2023

Stephanie Arnett/MITTR | Getty Images

This article is from The Technocrat, MIT Technology Review's weekly tech policy newsletter about power, politics, and Silicon Valley. To receive it in your inbox every Friday, sign up here.

In the last 10 years, Big Tech has become really good at some things: language, prediction, personalization, archiving, text parsing, and data crunching. But it’s still surprisingly bad at catching, labeling, and removing harmful content. One simply needs to recall the spread of conspiracy theories about elections and vaccines in the United States over the past two years to understand the real-world damage this causes.

And the discrepancy raises some questions. Why haven’t tech companies improved at content moderation? Can they be forced to? And will new advances in AI improve our ability to catch bad information?

Mostly, when they’ve been hauled in front of Congress to account for spreading hatred and misinformation, tech companies tend to blame the inherent complexities of languages for why they’re falling short. Executives at Meta, Twitter, and Google say it’s hard to interpret context-dependent hate speech on a large scale and in different languages. One favorite refrain of Mark Zuckerberg is that tech companies shouldn’t be on the hook for solving all the world’s political problems.

Most companies currently use a combination of technology and human content moderators (whose work is undervalued, as reflected in their meager pay packets).

At Facebook, for example, artificial intelligence currently spots 97% of the content removed from the platform.

However, AI is not very good at interpreting nuance and context, says Renee DiResta, the research manager at the Stanford Internet Observatory, so it’s not possible for it to entirely replace human content moderators—who are not always great at interpreting those things either

Cultural context and language can also pose challenges, because most automated content moderation systems were trained with English data and don’t function well with other languages.

Hany Farid, a professor at the University of California, Berkeley, School of Information, has a more obvious explanation. “Content moderation has not kept up with the threats because it is not in the financial interest of the tech companies,” says Farid. “This is all about greed. Let’s stop pretending this is about anything other than money.”

And the lack of federal regulation in the US means it’s very hard for victims of online abuse to hold platforms financially responsible.

Content moderation seems to be a never-ending war between tech companies and bad actors. Tech companies roll out rules to police content; bad actors find out how to evade them by doing things like posting with emojis or deliberate misspellings to avoid detection. The companies then try to close the loopholes, the perpetrators find new ones, and on and on it goes.

Now, enter the large language model …

That’s difficult enough as it stands. But it’s soon likely to become a lot harder, thanks to the emergence of generative AI and large language models like ChatGPT. The technology has problems—for example, its propensity to confidently make things up and present them as facts—but one thing is clear: AI is getting better at language … like, a lot better.

So what does that mean for content moderation?

DiResta and Farid both say it’s too early to tell how things will shake out, but both seem wary. Though many of the bigger systems like GPT-4 and Bard have built-in content moderation filters, they still can be coaxed to produce undesirable outputs, like hate speech or instructions for how to build a bomb.

Generative AI could allow bad actors to run convincing disinformation campaigns at a much greater scale and speed. That’s a scary prospect, especially given the dire inadequacy of methods for identifying and labeling AI-generated content.

But on the flip side, the newest large language models are much better at interpreting text than previous AI systems. In theory, they could be used to boost automated content moderation.

To make that work, though, tech companies would need to invest in retooling large language models for that specific purpose. And while some companies, like Microsoft, have begun to research this, there hasn’t been much notable activity.

“I am skeptical that we will see any improvements to content moderation, even though we’ve seen many tech advances that should make it better,” says Farid.

Large language models still struggle with context, which means they probably won’t be able to interpret the nuance of posts and images as well as human moderators. Scalability and specificity across different cultures also raise questions. “Do you deploy one model for any particular type of niche? Do you do it by country? Do you do it by community?... It’s not a one-size-fits-all problem,” says DiResta.

New tools for new tech

Whether generative AI ends up being more harmful or helpful to the online information sphere may, to a large extent, depend on whether tech companies can come up with good, widely adopted tools to tell us whether content is AI-generated or not.

That’s quite a technical challenge, and DiResta tells me that the detection of synthetic media is likely to be a high priority. This includes methods like digital watermarking, which embeds a bit of code that serves as a sort of permanent mark to flag that the attached piece of content was made by artificial intelligence. Automated tools for detecting posts generated or manipulated by AI are appealing because, unlike watermarking, they don’t require the creator of the AI-generated content to proactively label it as such. That said, current tools that try to do this have not been particularly good at identifying machine-made content.

Some companies have even proposed cryptographic signatures that use math to securely log information like how a piece of content originated, but this would rely on voluntary disclosure techniques like watermarking.

The newest version of the European Union’s AI Act, which was proposed just this week, requires companies that use generative AI to inform users when content is indeed machine-generated. We’re likely to hear much more about these sorts of emerging tools in the coming months as demand for transparency around AI-generated content increases.

What else I’m reading

The EU could be on the verge of banning facial recognition in public places, as well as predictive policing algorithms. If it goes through, this ban would be a major achievement for the movement against face recognition, which has lost momentum in the US in recent months.
On Tuesday, Sam Altman, the CEO of OpenAI, will testify to the US Congress as part of a hearing about AI oversight following a bipartisan dinner the evening before. I’m looking forward to seeing how fluent US lawmakers are in artificial intelligence and whether anything tangible comes out of the meeting, but my expectations aren’t sky high.
Last weekend, Chinese police arrested a man for using ChatGPT to spread fake news. China banned ChatGPT in February as part of a slate of stricter laws around the use of generative AI. This appears to be the first resulting arrest.

What I learned this week

Misinformation is a big problem for society, but there seems to be a smaller audience for it than you might imagine. Researchers from the Oxford Internet Institute examined over 200,000 Telegram posts and found that although misinformation crops up a lot, most users don’t seem to go on to share it.

In their paper, they conclude that “contrary to popular received wisdom, the audience for misinformation is not a general one, but a small and active community of users.” Telegram is relatively unmoderated, but the research suggests that perhaps there is to some degree an organic, demand-driven effect that keeps bad information in check.