Google’s new ‘toxicity’ detecting software signals progress in online speech classifiers

Dangerous Speech, like hate speech or pornography, is difficult to define in a precise or objective way. It’s difficult enough to get two people to agree consistently on which speech fits into any of these categories – and therefore even more difficult to train a machine to classify it reliably. Researchers from Jigsaw – a project incubator of Google – have made interesting progress though, by training software to measure “toxicity” by asking people to classify millions of online comments from sites like Wikipedia and The New York Times.

Last week, Jigsaw opened that software to everyone, so now you can type in any English word or phrase, and you’ll be told how similar the software ‘thinks’ it is to comments people said were toxic. The Jigsaw researchers defined toxic language, for this purpose, as “a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.”

Perspective’s definition of toxicity overlaps with, but is different from, online harassment and Dangerous Speech. And act of harassment or Dangerous Speech might be considered rude, but not every rude comment is harassing or dangerous. Perspective’s purpose is therefore not to identify a certain category of speech but rather to measure the perceived civility of a post. For example, a polite expression of Dangerous Speech might be considered less toxic than a disrespectful message of peace.

The software has produced some perplexing results, as people have experimented with it, perhaps because it can only assess language similar to the language (of comments) with which it was ‘trained.’ Despite this, Perspective’s development signals the possibility of software to classify Dangerous Speech automatically in the future. To that end, we collaborated with Prof. Derek Ruths in 2015-16 to create machine learning software to detect hateful speech on subreddits (online message boards) that were known for misogynist, racist, and fat-shaming content. This was an early, imperfect effort, but we believe a classifier for Dangerous Speech is possible. If misused, however, this could lead to censorship of legitimate speech, as a classifier like Perspective also could.

First, it would be tempting for companies to use automatic classifiers to decide which content to take down or suppress, since software is so much cheaper than people. So far, major media and social media companies like Facebook and Twitter rely on their users to report objectionable content, which is then reviewed by people hired by those companies to decide whether to take it down. The New York Times employs 14 people to decide which comments from readers to delete – but this is so laborious that the Times permits comments on only 10% of the articles it publishes. Now the newspaper is experimenting with using Perspective as a first-pass filter. Human reviewers no longer read every comment; instead, Perspective feeds moderators the ones it has identified as similar to comments that humans rated as toxic. The humans have the final say, but only on content the software feeds them. This is worrisome since such software has trouble understanding context, new slang, and misspellings.

The second danger for freedom of expression comes from the fact that classifiers could be built to screen out any sort of speech, such as political dissent or other valuable, controversial speech. To its credit, Jigsaw has been exceptionally transparent about its efforts, as the scholar and researcher Nathan Matias has noted, pointing out that:

Jigsaw/Wikimedia fully documented their progress, with freely licensed data and even version history. Jigsaw/Wikimedia did community engagement, sharing their early results with the Wikimedia community at their annual Wikimania conference. Jigsaw/Wikimedia published notes on the fairness/bias issues in their algorithm. That may be the first time ever.

This transparency may help to forestall overuse or misuse of Perspective, at least in the short run. Jared Cohen, Jigsaw’s founder, has also said that Perspective is not intended to moderate online spaces on its own. Diminishing dangerous and harmful content online – and even improving the civility of discourse, as Jigsaw aims to do – are important goals. While the tools are not ready, Perspective is a thought-provoking step toward future solutions.