15. november 2022
“Fuck all refugees! Just let them drown.” and “Refugees are like rats. They must be killed.” are just two examples of hate speech against refugees that abound on the Internet. Social media has ensured that more people can have their say in the public debate, but at the same time it has also made it easier to hide anonymously behind the screen and write cruel things.
Hate speech is public statements that spread or incite hatred, discrimination or hostility against a particular group and often a minority, and it has been proven that such speech can lead to physical violence in real life.
It is therefore crucial to detect hate speech on social media to be able to actively engage in the fight against hate directed against specific groups. And this was exactly the challenge that 25-year-old Frederik Gaasdal Jensen, Associate Consultant in Delegate’s Data & AI team, and his thesis partner, Henry Alexander Stoll, were presented with by the UN Refugee Agency when they started their master’s thesis:
Frederik Gaasdal Jensen
Associate Consultant, Delegate
Frederik and his partner therefore collected 12 data sets, which were based on roughly the same understanding of what the term “Hate Speech” means. These datasets were then used to train different types of Bidirectional Encoder Representations from Transformers (BERT) models to learn when a tweet or piece of text contains hate speech.
One challenge that is generally seen in machine models is the ability to understand context. This means that some models will tend to classify text as hateful based on individual hateful words in a sentence:
“We as humans can understand the context of a joke, but a model doesn’t always understand that. In the same way, it will have a hard time understanding if you defend refugees on social media, for example, by writing “refugees are not stupid”. Therefore, we have made use of so-called BERT models, which are models that are made to better understand the context of a text. Thus, it is not a single word that determines whether it is a case of “hate speech”, but it is assessed in the context of the surrounding words,” explains Frederik.
The purpose of the finished model is not to mark and handle “hate speech” in individual posts, but rather to give a greater insight into what the attitude towards refugees is more generally at a given time. Thus, the UN, and other relevant agencies, can monitor and analyse developments and trends in hate speech in connection with various refugee crises, for example caused by the war in Ukraine or Syria. Here, for example, it is interesting to look at what the various neighbouring countries think about the reception of refugees. Based on this, the UN can shape their communication accordingly, so that the hate speech on the internet hopefully does not develop into physical violence.
Working on the thesis project has given Frederik a new perspective of his education and his work life:
“Being able to make a difference is not always just being able to send some money or volunteer. It’s something like this, too. “Hate speech” can lead to physical violence, so it’s important to do something about it. The fact that I can use my education and knowledge to create such a tool here, it’s incredibly meaningful.”
The thesis has been titled “Detecting Social Media Hate Speech Surrounding Refugees using State-of-the-Art Deep Learning Methods”, and Frederik and Henry are working on publishing it as a research paper.