Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification
- Auteur-es
- Fan Yang, Xiaochang Peng, Gargi Ghosh, Eider Moore, Goran Predovic
- Nombre Auteurs
- 5
- Titre
- Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification
- Année de publication
- 2019
- résumé
- Interactions among users on social network platforms are usually positive, constructive and insightful. However, sometimes people also get exposed to objectionable content such as hate speech, bullying, and verbal abuse etc. Most social platforms have explicit policy against hate speech because it creates an environment of intimidation and exclusion, and in some cases may promote real-world violence. As users’ interactions on today’s social networks involve multiple modalities, such as texts, images and videos, in this paper we explore the challenge of automatically identifying hate speech with deep multimodal technologies, extending previous research which mostly focuses on the text signal alone. We present a number of fusion approaches to integrate text and photo signals. We show that augmenting text with image embedding information immediately leads to a boost in performance, while applying additional attention fusion methods brings further improvement.
- URL
- https://research.facebook.com/file/2400404176758343/Exploring-Deep-Multimodal-Fusion-of-Text-and-Photo-for-Hate-Speech-Classification.pdf
- doi
- https://doi.org/10.18653/v1/W19-3502
- Accessibilité de l'article
- Libre
- Champ
- Artificial Intelligence, Natural Language Processing & Speech
- Méthode
- The method discussed in the provided text involves deep multimodal fusion of text and photo signals for the task of hate speech classification on social networks. The goal is to improve the identification of hate speech by integrating information from both text and image modalities. Various fusion techniques are explored, including simple concatenation, bilinear transformation, gated summation, and attention mechanisms.
- Cas d'usage
- N/A
- Objectifs de l'article
- The objectives of the article are to explore the challenge of automatically identifying hate speech using deep multimodal technologies, extend previous research focused on text signals alone, and provide insights into improving the detection of hate speech content on social platforms.
- Question(s) de recherche/Hypothèses/conclusion
- The research question revolves around how deep multimodal fusion of text and photo signals can enhance the automatic identification of hate speech on social networks.
- The hypothesis is that by combining information from text and photo modalities through deep multimodal fusion techniques, the performance of hate speech classification can be significantly improved.
- The conclusions highlight the effectiveness of augmenting text with image embedding information and applying attention fusion methods in boosting the performance of hate speech classification. The study demonstrates that fusion techniques like attention with deep cloning, sparsemax, and symmetric gate offer the best results in identifying hate speech content on social networks.
- Cadre théorique/Auteur.es
- The theoretical framework of the article involves deep multimodal fusion techniques for hate speech classification. The main authors cited in the text include Tong et al. (2017), Mishra et al. (2018), Gunasekara and Nejadgholi (2018), Kshirsagar et al. (2018), Magu and Luo (2018), and Sahlgren et al. (2018).
- Concepts clés
- Hate speech, Multimodal fusion, Text and photo signals
- Données collectées (type source)
- Photos from social media platforms
- Définition des émotions
- Non
- Ampleur expérimentation (volume de comptes)
- Not available
- Technologies associées
- Deep learning, Multimodal fusion techniques
- Mention de l'éthique
- Non
- Finalité communicationnelle
- The ultimate objective is to develop more effective methods for detecting hate speech.
- Pages du site
- Contenu
Fait partie de Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification