Sentiment summarization: evaluating and learning user preferences
- Auteur-es
- Lerman, Kevin; Blair-Goldensohn, Sasha; McDonald, Ryan
- Nombre Auteurs
- 3
- Titre
- Sentiment summarization: evaluating and learning user preferences
- Année de publication
- 2009
- Référence (APA)
- Lerman, K., Blair-Goldensohn, S., & McDonald, R. (2009). Sentiment summarization : Evaluating and learning user preferences. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 514‑522. https://aclanthology.org/E09-1059
- Mots-clés
- ND
- URL
- https://aclanthology.org/E09-1059
- Accessibilité de l'article
- Open access
- Champ
- Natural Language Processing
- Type contenu (théorique Applicative méthodologique)
- Applicative
- Méthode
-
Our initial set of experiments were over the three opinion-based summarization systems: SM, SMAC, and SAM.
We evaluated summary performance for reviews of consumer electronics. - Cas d'usage
- ND
- Objectifs de l'article
- We present the results of a large-scale, end-toend human evaluation of three sentiment summarization models applied to user reviews of consumer products.
- Question(s) de recherche/Hypothèses/conclusion
- Research question(s) : While this means that users have plenty of information on which to base their purchasing decisions, in practice this is often too much information for a user to absorb. To alleviate this information overload, research on systems that automatically aggregate and summarize opinions have been gaining interest. Evaluating these systems has been a challenge, however, due to the number of human judgments required to draw meaningful conclusions.
- Hypothesis(es) : The study presented here differs from Carenini et al. in many respects: First, our evaluation is over different extractive summarization systems in an attempt to understand what model properties are correlated with human preference irrespective of presentation; Secondly, our evaluation is on a larger scale including hundreds of judgments by hundreds of raters; Finally, we take a major next step and show that it is possible to automatically learn significantly improved models by leveraging data collected in a large-scale evaluation.
- Conclusion(s) : Our results indicated that humans prefer sentiment informed summaries over a simple baseline. This shows the usefulness of modeling sentiment and aspects when summarizing opinions. However, the evaluations also show no strong preference between different sentiment summarizers. A detailed analysis of the results led us to take the next step in this line of research – leveraging preference data gathered in human evaluations to automatically learn new summarization models. These new learned models show large improvements in preference prediction accuracy over the previous single best model.
- Cadre théorique/Auteur.es
- Systems that automatically aggregate and summarize opinions (Hu and Liu, 2004a ; Hu and Liu, 2004b ; Gamon et al., 2005 ; Popescu and Etzioni, 2005 ; Carenini et al., 2005 ; Carenini et al., 2006 ; Zhuang et al., 2006 ; Blair-Goldensohn et al, 2008)
- Human evaluation of these systems (McKeown et al., 2005 ; Carenini et Cheung, 2008 ; Carenini et al., 2006)
- Sentiment Summarization (Jindal et Liu, 2006 ; Stoyanov et Cardie, 2008 ; Hu et Liu, 2004a ; Carenini et al., 2006 ; Choi et al., 2005)
- Concepts clés
- Sentiment analysis
- Données collectées (type source)
-
We evaluated summary performance for reviews of consumer electronics. [...] We gathered reviews for electronics products from several online review aggregators. The products covered a variety of electronics, such as MP3 players, digital cameras, printers, wireless routers, and video game systems. All summaries were roughly equal length to avoid length-based rater bias.
In each experiment two summaries of the same product were placed side-by-side in a random order. Raters were also shown an overall rating, R, for each product. Raters were asked to express their preference for one summary over the other. Raters were free to choose any rating, but were specifically instructed that their rating should account for a summaries representativeness of the overall set of reviews. Raters were also asked to provide a brief comment justifying their rating. - Définition des émotions
- Methodological explanation of their classification (quantitative, mathematical)
- No definition
- Positive and negative labeling
- Ampleur expérimentation (volume de comptes)
-
Reviews for 165 electronics products from several online review aggregators (each product had a minimum of four reviews and up to a maximum of nearly 3000).
The mean number of reviews per product was 148, and the median was 70. We ran each of our algorithms over the review corpus and generated summaries for each product with K = 650.
In total we ran four experiments for a combined number of 1980 rater judgments. - Technologies associées
- 3 extractive sentiment summarization systems :
- Sentiment Match (SM)
- Sentiment Match + Aspect Coverage (SMAC)
- Sentiment-Aspect Match (SAM)
- Mention de l'éthique
- ND
- Finalité communicationnelle
-
The growth of the Internet as a commerce medium, and particularly the Web 2.0 phenomenon of user-generated content, have resulted in the proliferation of massive numbers of product, service and merchant reviews. While this means that users have plenty of information on which to base their purchasing decisions, in practice this is often too much information for a user to absorb. To alleviate this information overload, research on systems that automatically aggregate and summarize opinions have been gaining interest.
We can thus conclude that the data gathered in human preference evaluation experiments, such as the one presented here, have a beneficial secondary use as training data for constructing a new and more accurate summarizer. - Résumé
- We present the results of a large-scale, end-to-end human evaluation of various sentiment summarization models. The evaluation shows that users have a strong preference for summarizers that model sentiment over non-sentiment baselines, but have no broad overall preference between any of the sentiment-based models. However, an analysis of the human judgments suggests that there are identifiable situations where one summarizer is generally preferred over the others. We exploit this fact to build a new summarizer by training a ranking SVM model over the set of human preference judgments that were collected during the evaluation, which results in a 30% relative reduction in error over the previous best summarizer.
- Pages du site
- Contenu
Fait partie de Sentiment summarization: evaluating and learning user preferences