Content Moderation Techniques

Before we engage in a VSD analysis of content moderation, it is helpful to have some context about how content moderation systems have been implemented in the past. Let’s take a moment to discuss the techniques that are commonly used to implement content moderation on online platforms.

Policy

Content moderation often starts with the development of written and unwritten policies. Policy documents are typically a mix of broad value statements (e.g., commitments to free expression) and prohibitions against specific classes of content and behavior. These policies serve three purposes:

  1. To inform people who use a platform what the rules and expectations are when they use the service.
  2. To outline the actions the platform will take when the policy is violated.
  3. To guide specific content moderation decisions and the development of content moderation systems.

In addition to public documentation, platforms often have extensive, non-public content moderation documentation as well. Further, the decision to moderate content, and the development of moderation systems, will necessarily be guided by unwritten policies, norms, values, and engineering decisions made by employees and executives.

Automation

Given the scale of large online platforms, moderating content is a challenge that necessitates scalable solutions. The largest platforms invest heavily in automated systems that attempt to implement content moderation policies as massive scale. Simple approaches may rely on things like keyword blocklists: if a person attempts to post content that contains a blocked term, they may be shown a warning message (“are you sure you want to post this?”), the system may refuse to accept the content (“you may not post content that violates our policies”), or the content may be silently flagged for additional scrutiny by human moderators.

More sophisticated approaches to automated moderation leverage machine learning: a vast corpus of previously moderated content is used to train a predictive model, and that model is then used in the future to assess whether new pieces of content are similar to the previously observed unacceptable content. For example, a social network might train a machine learning model to identify hate speech by leveraging a corpus of ground-truth hate speech that was previously identified by human moderators. These machine learning-based systems sound good in theory: they are far more sophisticated than simple blocklists yet they are highly scalable, and they have a veneer of neutrality because they appear to not rely on the judgment of human moderators.

Although automated content moderation systems are widely used in practice, they have a number of shortcomings. First, not all forms of data are equally amenable to automated analysis: text is relatively straightforward to parse and analyze, but images, video, and other forms of rich media are very challenging to analyze. Machine learning techniques are only just approaching the point where they are sophisticated enough to analyze rich media, but that does not imply they can analyze it reliably and robustly yet. Second, even for “simpler” kinds of data like text, automated systems still struggle to handle the full breadth of variation inherent in human language. Sarcasm and jokes; the broad historical and cultural context surrounding a given piece of text; non-English and non-European languages; shifts in the meaning of words over time; and wholly new words and idioms can all trip-up machine learning models. Finally, even though machine learning systems do not involve human judgement once they are deployed, they may still exhibit bias. For example, researchers have shown that machine learning-based content moderation systems are often biased against minorities (for example, they mistakenly flag benign speech from African Americans). Ironically, these are the very groups that the systems are ostensibly supposed to be protecting.

Crowdsourcing

Given the shortcomings of automated content moderation systems, many platforms layer on additional moderation systems that are powered by humans. One way to achieve this at scale with low cost is to rely on the users of the platforms themselves to moderate content. For example, Reddit uses a hierarchical moderation scheme, where the platform itself defines and enforces content policies across all subreddits, while individual subreddits are policed by human moderators chosen from among the community who define and enforce subreddit-specific policies. Or consider platforms like Facebook and Twitter: they provide mechanisms for people to mark content as “good” (e.g., using likes and hearts), as well as mechanisms for people to report content that they believe is harmful or in violation of the platforms’ policies.

Crowdsourcing has a number of shortcomings. First, it is reactive: people have to notice and report harmful content, at which point it has already been seen by many people and caused untold distress. Second, crowdsourcing has problems with uniformity: people have a wide variety of opinions about what kinds of content are harmful or unacceptable, meaning that the reports that are received may or may not actually violate the platforms’ policies. For example, are photos of breastfeeding unacceptable nudity akin to pornography, or are they images of perfectly natural behavior that is to be celebrated? Third, crowdsourced content moderation systems can be weaponized by bad actors: malicious individuals may disingenuously report content or other people whom they do not like in order to get the platform to censor them.

In short, any data collected from crowdsourced moderation systems must be approached with extreme caution before it is acted upon. Crowdsourcing is potentially useful as a warning system, but it must be backed-up by more robust systems of review that assess the quality and veracity of the reports.

Professional, Human Review

A final approach to content moderation leverages the labor of tens of thousands of behind-the-scenes, professional human moderators who review content. For example, Facebook employs on the order of 15,000 human moderators as of 2020, and it routinely commits to hire more in the wake of content moderation scandals. Unlike crowdsourced moderators, professional moderators can be trained to improve the uniformity and consistency of their decisions, although this task may be complicated by constant revisions to the platforms’ content moderation guidelines.

Professional content moderation takes a toll on workers. Professional moderators are exposed to a never ending stream of the most vile content on the internet, which has caused them to suffer psychological harms. The welfare of these workers must be taken into account and adequate support provided to them.