Is the very thing that is connecting the world, also tearing us apart?
What is going on? In 2017, a Pew Research Center study reported that 41 percent of Americans had experienced online harassment, a 35 percent increase from 2014. A more recent survey by the Anti-Defamation League showed that already more than half of Americans, 53 percent, say they experienced hate speech and harassment online in 2018.
- Why and how is the internet being inundated with harmful content?
- Why is the concerned crowd growing?
- What is the origin of the problem and can we develop solutions to solve it?
In simple terms, harmful content is anything online which causes a person distress or harm. This encompasses a huge amount of content and can be very subjective depending on who is doing the viewing; what may be harmful to one person might not be considered an issue by someone else. SWGFL
To start sorting out this important topic, we spoke with Jesse McCrosky, Sr. Data Scientist at Mozilla, a nonprofit with the mission to keep the internet open and healthy. In his day-to-day work, Jesse is designing metrics that guide product development. “I’m trying to understand the difference between what we want to measure and what we can measure.” Aside from his work at Mozilla, Jesse has taken a personal interest in the ethical dimension of the use of big data and the ways it may influence our societies.
We have summarized the most salient points below, but we strongly recommend listening to the full length interview to fully benefit from Jesse’s insight.
Click below to hear the complete interview, conducted by Dessie Maliaka, Innovation Manager at HeroX.
Q. In one of the articles on your blog, you mentioned harmful online content as a problem. How pervasive is this problem? What specific forms does it take and who is spreading this harmful content?
How pervasive is the problem? It’s a difficult question. The thing to remember about harmful content is that it’s inherently contextual and subjective. Whether a piece of content is harmful or not, depends on many factors, including, for example, the purpose of the content and the target audience.
And for this reason, it’s difficult to measure the scale of the problem. Of course, large content platforms usually give a definition of harmful content (often outlined in their respective “Terms of use”), specifying what is allowed on the platforms and what is not. Still, drawing a clear line is extremely difficult. And then, there is such a thing as “borderline content,” which makes defining harmful content nearly impossible.
Many specific forms of harmful content exist, and there are multiple actors involved. One example, described by a U.S. researcher Kate Starbird, is the manipulation of the U.S. presidential election in 2016. In this case, harmful content, mostly on Twitter, was spread by a “content farm” of Russian trolls addressing both the right and left extremes of the U.S. political spectrum.
Q. It appears that large content platforms are fully aware that they’re spreading harmful content. Is that a fair thing to say?
Absolutely. However, whether this is their fault, is another question. Both in Europe and the United States, certain government regulations exist essentially providing those platforms with a “license to work.” The basic idea here is that they’re just platforms that let users publish their own content without platforms being responsible or liable for the individuals’ content. After all, this is the basic prerequisite of the internet, isn’t it? If, say, Facebook were legally responsible for every published post, it couldn’t have existed in the first place, because they simply can’t monitor every piece of content on the site.
On the other hand, this abrogation of responsibility on the part of the platforms has, perhaps, gone too far – and now there is a public backlash. Many governments, especially in Europe, are concerned about the spread of harmful content and consider options to make platforms directly responsible for the content they publish. In my opinion, this goes too far, too. What this all means, perhaps, is that it’s time for the platforms to take more responsibility.
But there is a big question that really burns me: is harmful content so abundant because recommendation engines employed by the platforms amplify this content and make it worse – or because, simply speaking, there are a lot of terrible people creating this content? In other words, is this a technological problem or a social one?
In my view, it’s both - and this makes addressing the problem even more complicated.
Q. Is there a point of return? Can we go back to an internet that is not so toxic?
Yes, I do believe there is a path to improvement. The big part of the problem is the information asymmetry. Content platforms hold a lot of data and they obviously know to which degree their recommendation engines and their policies makes the problem of harmful content worse. I believe that the answer to the problem could be government regulations that would require the content platforms to open up and reveal the data fed into their recommendation engines. I want the platforms to become more transparent.
But I want to clarify one point. I don’t want governments to decide which content is “good” or “bad.” It’s not that the government should “fix” the problem; rather it should create a structure that would incentivize businesses to solve the problem themselves.
Q. To follow up on what you’ve just said. Is this a responsibility of the platforms or the users to utilize the knowledge of how the collected data is treated?
I think it’s the platforms’ responsibility, and here is why. The truth is – and I’m speaking from the position of a data scientist who conducted some experiments back at Google – is that platforms have a power to affect user behavior - in a sense, control it - just by changing designs and arrangements.
And if we refuse to accept this fact and keep insisting that people will have to make their own choices – to click on whatever they want and consume whatever that choose - this will strip the platforms of any form of responsibility at all. They will keep saying: users get what they want. But if you leave all the responsibility for making choices on users, they’ll be easily manipulated by the platforms. You need a larger agent that could shape their behavior – and governments should step up and play this role.
Q. The solution that you propose sounds like a long-term one. Can we do anything in the short term? Is there a band aid of sorts to mitigate the problem?
I think that education is a key. The better people are educated and the stronger analytical skills they have, the better they’ll be equipped to protect themselves from the negative effects of harmful content.
But let’s not forget that spreading harmful content is a problem happening at the social level, not the individual level. I think that harmful content exists because of the divisiveness in our societies, because we don’t share enough values and don’t care enough about each other. And unless we change that, it doesn’t really matter if a few single individuals will protect themselves with some clever hacks.
Q. Long-term “social” solutions are obviously a key to solving the problem of harmful content. At the same time, do you see any technical solutions to the problem?
Sure. Take Firefox, the browser manufactured by Mozilla. Recently, a few privacy protection features have been added to Firefox. For example, tracking protection features allow users to hide their identity when browsing the internet. (If you're curious, this article explains the issue well).
And an intermediate solution could be just becoming conscious of the content that you’re consuming. Constantly ask yourself: why am I doing what I’m doing? I know it’s a tall order for many people. It’s often difficult for me, too.
Q. There seems to be another solution that looks simple and intuitive to many people: filtering off the “bad” content? Why can’t we just use some technical means to get rid of harmful stuff?
Sure, it’s technically possible. For example, there are browser extensions, like ad blockers, that filter off certain unwanted content. Obviously, other extensions can be designed to deal with the problem in the same way.
The problem here is who's going to decide which content is “good” or “bad.” Say, a user goes on YouTube to watch a video? Who’s going to tell the user which movie he or she needs to watch next? I think that there is no one we can trust to decide which “bad” content should be filtered off. Here in Finland, we tend to trust our government, but there are certainly governments around the world that we wouldn’t want to give censorship powers to.
Sure, certain content (e.g., hate speech) is considered illegal in some jurisdictions, but there is so much content that filtering it becomes simply impractical. Can we use artificial intelligence? Perhaps. But do we want to censor content based on the decision of an algorithm? That’s a big question.
Q. As you know, HeroX is a crowdsourcing platform. And as we know, you have some experience in participating in crowdsourcing projects. If you had a chance to address a crowd of a million people, what would you ask them to do to solve the problem of harmful content?
I can think of three things. Firstly, we know that the content platforms, such as YouTube and Facebook, employ recommendation algorithms to decide what content to show to their users., I have a hypothesis that the optimization of those algorithms leads to the amplification of harmful content. But it’s only a hypothesis, and it’s very difficult to prove or disprove it without having the data the platforms have. I’d therefore ask the crowd to suggest a natural experiment or a data collection approach capable of determining whether the optimization of the recommendation algorithms amplifies harmful content.
Secondly - as we discussed earlier - it’s impractical to design a browser tool that could tell you which content is “good” or “bad.” But perhaps we can create a browser extension providing users with a contextual education, helping them evaluate for themselves the content they’re consuming? In fact, platforms have gradually started doing this. Could the crowd come up with some suggestions to help strengthen this budding trend?
Lastly, can we create a framework to evaluate whether tech companies are making reasonable efforts to control the spread of harmful content? There are a lot of variables here – and I don’t want to become too technical – but I’m thinking about ways to show that the companies are making real improvement or, on the contrary, are not doing enough to control the spread of harmful content.
We at HeroX really loved the topics proposed by Jesse. Anyone interested in exploring potential challenges?
As we thank Jesse for coming here today and sharing his insights, we asked him to say a few parting words. That’s what he said:
“The technology isn’t inherently evil, but people should be conscious of what they are doing and what content they’re consuming. Protect yourself by understanding the choices you’re making. And remember that harmful content is first and foremost a social, not technical, problem. It reflects the divisiveness of our societies. As a society, we must learn to be sympathetic and compassionate and try to understand even people we disagree with. And this is the best way to fight back what has been done to us through technology.”
More about Jesse McCrosky
Jesse has been working in Data Science for the last 10 years. Currently at Mozilla, he previously worked with Google, Statistics Canada, and a number of academic and government researchers as an independent statistical consultant. During his career, he has developed a deep concern for the unintended consequences of the optimization of online services and products and their consequences for our society. Now, at Mozilla, he has the opportunity to tackle these problems head on. Although passionate about his work, he values his non-work life and enjoys spending time with his family and pursuing interests in music, dance, yoga, and other areas. He lives in Helsinki, Finland with his wife and daughter.
Visit Jesse McCrosky’s blog “Wrong, but useful”
About the Author, Eugene Ivanov
Eugene Ivanov began his career as a molecular biologist working in academia and the biotech industry in Russia, France, and the United States. Between 2003 and 2012, he worked for InnoCentive, one of the world's first commercial crowdsourcing platforms. Currently an author and innovation consultant, he lives in Framingham, Massachusetts with his wife, dog, and cat. He writes the Innovation Observer blog and can be followed @eivanov101. In his spare time, he ballroom dances with his wife, practices boxing, and watches football (a.k.a. soccer in the United States).
Eugene Ivanov (our Contributing Writer & Community Partner) has written a book about Crowdsourcing! Here's a link to find his book: "We the People of the Crowd…: Real-life stories about crowdsourcing told by an innovation consultant"
Visit Eugene Ivanov's blog "innovation observer"