GoEmotions: A Dataset of Fine-Grained Emotions
- Auteur-es
- Demszky, Dorottya; Movshovitz-Attias, Dana; Ko, Jeongwoo; Cowen, Alan; Nemade, Gaurav; Ravi, Sujith
- Nombre Auteurs
- 6
- Titre
- GoEmotions: A Dataset of Fine-Grained Emotions
- Année de publication
- 2020
- Référence (APA)
- Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions : A Dataset of Fine-Grained Emotions. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4040‑4054. https://doi.org/10.18653/v1/2020.acl-main.372
- Mots-clés
- ND
- URL
- https://aclanthology.org/2020.acl-main.372
- doi
- https://doi.org/10.18653/v1/2020.acl-main.372
- Accessibilité de l'article
- Open access
- Champ
- Natural Language Processing
- Type contenu (théorique Applicative méthodologique)
- Applicative
- Méthode
-
We compiled GoEmotions, the largest human annotated dataset of Reddit comments.
We design our emotion taxonomy considering related work in psychology and coverage in our data.
We include a thorough analysis of the annotated data and the quality of the annotations. Via Principal Preserved Component Analysis (Cowen et al., 2019b)
Build an emotion classification model.
We provide a strong baseline for modeling finegrained emotion classification.
We conduct transfer learning experiments with existing emotion benchmarks. - Cas d'usage
- ND
- Objectifs de l'article
-
In the past decade, NLP researchers made available several datasets for language-based emotion classification [...]. However, existing available datasets are (1) mostly small, containing up to several thousand instances, and (2) cover a limited emotion taxonomy, with coarse classification into Ekman (Ekman, 1992b) or Plutchik (Plutchik, 1980).
Create a large-scale, consistently labeled emotion dataset over a fine-grained taxonomy, with demonstrated high-quality annotations. - Question(s) de recherche/Hypothèses/conclusion
- Research question(s) : Need for a large-scale, consistently labeled emotion dataset over a fine-grained taxonomy, with demonstrated high-quality annotations.
- Hypothesis(es) :
- Conclusion(s) : We present GoEmotions, a large, manually annotated, carefully curated dataset for fine-grained emotion prediction. We provide a detailed data analysis, demonstrating the reliability of the annotations for the full taxonomy. We show the generalizability of the data across domains and taxonomies via transfer learning experiments. We build a strong baseline by fine-tuning a BERT model, however, the results suggest much room for future improvement.
- Cadre théorique/Auteur.es
- Emotion Datasets (Bostan et Klinger, 2018 ; CrowdFlower, 2016)
- Emotion Taxonomy (Ekman, 1992a ; Russell, 2003 ; Cowen et al., 2019a ; Cowen et Keltner, 2017 ; Cowen et al., sous presse ; Cowen et Keltner, 2019 ; Cowen et al., 2019b ; Cowen et al., 2018)
- Emotion Classification Models (Mohammad, 2018 ; Devlin et al., 2019)
- Concepts clés
- Sentiment analysis
- “Semantic space” of emotion
- Données collectées (type source)
- We use a Reddit data dump originating in the redditdata-tools project, which contains comments from 2005 (the start of Reddit) to January 2019. We select subreddits with at least 10k comments and remove deleted and non-English comments.
- Définition des émotions
- Explanation of their taxonomy
- Evokes Ekman's taxonomy
- List and define the 27 emotions
- Ampleur expérimentation (volume de comptes)
- Our dataset is composed of 58K Reddit comments, labeled for one or more of 27 emotion(s) or Neutral.
- Technologies associées
- Principal Preserved Component Analysis (Cowen et al., 2019b)
- Mention de l'éthique
- ND
- Finalité communicationnelle
-
Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks.
Our taxonomy includes a large number of positive, negative, and ambiguous emotion categories, making it suitable for downstream conversation understanding tasks that require a subtle understanding of emotion expression, such as the analysis of customer feedback or the enhancement of chatbots. - Résumé
- Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.
- Pages du site
- Contenu
Fait partie de GoEmotions: A Dataset of Fine-Grained Emotions