A comparison of classifiers for detecting emotion from speech
- Auteur-es
- Shafran, I.; Mohri, M.
- Nombre Auteurs
- 2
- Titre
- A comparison of classifiers for detecting emotion from speech
- Année de publication
- 2005
- Référence (APA)
- Shafran, I., & Mohri, M. (2005). A comparison of classifiers for detecting emotion from speech. Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 1, I/341-I/344 Vol. 1. https://doi.org/10.1109/ICASSP.2005.1415120
- Mots-clés
- ND
- URL
- https://ieeexplore.ieee.org/document/1415120
- doi
- https://doi.org/10.1109/ICASSP.2005.1415120
- Accessibilité de l'article
- Open access
- Champ
- Machine Intelligence
- Type contenu (théorique Applicative méthodologique)
- Applicative
- Méthode
-
This paper compares several techniques for detecting emotion by evaluating their performance on a common corpus of speech data collected from a deployed customer-care application (HMIHY 0300).
Emotion detection classifiers can use diverse information sources, e.g., acoustic or lexical information. To use a common set of input features, we compared classifiers using spoken words as the input.
We present a comparison of three classification algorithms that we have implemented: two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model [6], another on a mutual information-based (MI-based) feature-selection approach [9, 8], and compare them with a discriminant kernelbased technique that we recently adopted [2, 13]. - Cas d'usage
- Data extracted from a customer service system (AT&T "How May I Help You")
- Objectifs de l'article
-
This paper compares several techniques for detecting emotion by evaluating their performance on a common corpus of speech data collected from a deployed customer-care application (HMIHY 0300).
This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. - Question(s) de recherche/Hypothèses/conclusion
- Research question(s) : Several techniques for detecting emotion from speech have been recently described. But the relative performance of these techniques has not been measured since the experiments reported by the authors were carried out on distinct corpora. The results reported are not always indicative of the performance of these techniques in real-world applications. Some are based on the unrealistic assumption that the word transcription of the spoken utterance is given in advance. Others are derived from experiments with speech data produced by professional actors expressing distinct emotion categories.
- Hypothesis(es) : This paper compares several techniques for detecting emotion by evaluating their performance on a common corpus of speech data collected from a deployed customer-care application (HMIHY 0300). [...] To use a common set of input features, we compared classifiers using spoken words as the input. We present a comparison of three classification algorithms that we have implemented.
-
Conclusion(s) : The results show that our kernelbased classifier achieves an accuracy of 80.6%, and outperforms both the interpolated language model classifier, which achieved a classification accuracy of 70.1%, and the classifier using mutual information-based feature selection (78.8%).
[...]
The results reflect the performance of these classifiers in a real-word task since the data used in our experiments was extracted from a deployed customer-care system, (HMIHY 0300). They demonstrate that the discriminant classifier based on rational kernels outperforms the two other popular classification techniques. - Cadre théorique/Auteur.es
- Detecting emotion from speech (Batliner et al., 2000 ; Dellaert, Polzin, et Waibel, 1996 ; Devillers et Vasilescu, 2003 ; Devillers, Vasilescu, et Lamel, 2003 ; Lee et Narayanan, 2004 ; Lee, Narayanan, et Pieraccini, 2002 ; Petrushin, 2000 ; Polzin et Waibel, 1998 ; Shafran, Riley, et Mohri, 2003)
- Classifiers using spoken words as the input (Cortes, Haffner, et Mohri, 2004 ; Devillers, Vasilescu, et Lamel, 2003 ; Lee et Narayanan, 2004 ; Lee, Narayanan, et Pieraccini, 2002 ; Shafran, Riley, et Mohri, 2003)
- Concepts clés
- Sentiment analysis
- Données collectées (type source)
-
We evaluated their performance on data extracted from a deployed customercare system, the AT&T “How May I Help You” system (HMIHY 0300).
The corpus used consisted of utterances from speakers. The emotion category of the speaker for each utterance was originally tagged into one of seven emotion categories [Shafran, Riley, and Mohri, 2003]. For this study, they were grouped into only two categories – negative and non-negative [Lee and Narayanan, 2004]. The utterances were presented to human annotators in the order of occurrence, thus they had the advantage of knowing the context beyond the utterance being labeled. - Définition des émotions
- No definition
- Use of sentiment categories/groups
- Negative, neutral, positive labeling
- Ampleur expérimentation (volume de comptes)
-
The corpus used consisted of 5147 utterances from 1854 speakers.
On the average, the utterances were about 15 words long. A subset of 448 utterance was used for testing on which two human labelers were in full agreement. - Technologies associées
- N-gram sequences
- Interpolated Language Model Classifier
- Mutual Information-based Feature-Selection Classifier
- Kernel-Based Discriminant Classifier
- Mention de l'éthique
- ND
- Finalité communicationnelle
- Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. It can help design more natural spoken-dialog systems than those currently deployed in call centers or used in tutoring systems. The speaker’s emotion can be exploited by the system’s dialog manager to provide more suitable responses, thereby achieving better task completion rates. Emotion detection can also be used to rapidly identify relevant speech or multimedia documents from a large data set.
- Résumé
- Accurate detection of emotion from speech has clear benefits for the design of more natural human-machine speech interfaces or for the extraction of useful information from large quantities of speech data. The task consists of assigning, out of a fixed set, an emotion category, e.g., anger, fear, or satisfaction, to a speech utterance. In recent work, several classifiers have been proposed for automatic detection of a speaker's emotion using spoken words as the input. These classifiers were designed independently and tested on separate corpora, making it difficult to compare their performance. This paper presents three classifiers, two popular classifiers from the literature modeling the word content via n-gram sequences, one based on an interpolated language model, another on a mutual information-based feature-selection approach, and compares them with a discriminant kernel-based technique that we recently adopted. We have implemented these three classification algorithms and evaluated their performance by applying them to a corpus collected from a spoken-dialog system that was widely deployed across the USA. The results show that our kernel-based classifier achieves an accuracy of 80.6%, and outperforms both the interpolated language model classifier, which achieved a classification accuracy of 70.1%, and the classifier using mutual information-based feature selection (78.8%).
- Pages du site
- Contenu
Fait partie de A comparison of classifiers for detecting emotion from speech