Frederik and his partner therefore collected 12 data sets, which were based on roughly the same understanding of what the term “Hate Speech” means. These datasets were then used to train different types of Bidirectional Encoder Representations from Transformers (BERT) models to learn when a tweet or piece of text contains hate speech.
One challenge that is generally seen in machine models is the ability to understand context. This means that some models will tend to classify text as hateful based on individual hateful words in a sentence:
“We as humans can understand the context of a joke, but a model doesn’t always understand that. In the same way, it will have a hard time understanding if you defend refugees on social media, for example, by writing “refugees are not stupid”. Therefore, we have made use of so-called BERT models, which are models that are made to better understand the context of a text. Thus, it is not a single word that determines whether it is a case of “hate speech”, but it is assessed in the context of the surrounding words,” explains Frederik.
The purpose of the finished model is not to mark and handle “hate speech” in individual posts, but rather to give a greater insight into what the attitude towards refugees is more generally at a given time. Thus, the UN, and other relevant agencies, can monitor and analyse developments and trends in hate speech in connection with various refugee crises, for example caused by the war in Ukraine or Syria. Here, for example, it is interesting to look at what the various neighbouring countries think about the reception of refugees. Based on this, the UN can shape their communication accordingly, so that the hate speech on the internet hopefully does not develop into physical violence.
Working on the thesis project has given Frederik a new perspective of his education and his work life:
“Being able to make a difference is not always just being able to send some money or volunteer. It’s something like this, too. “Hate speech” can lead to physical violence, so it’s important to do something about it. The fact that I can use my education and knowledge to create such a tool here, it’s incredibly meaningful.”
The thesis has been titled “Detecting Social Media Hate Speech Surrounding Refugees using State-of-the-Art Deep Learning Methods”, and Frederik and Henry are working on publishing it as a research paper.