Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models

Auteur-es: Täckström, Oscar; McDonald, Ryan
Nombre Auteurs: 2
Titre: Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models
Année de publication: 2011
Référence (APA): Täckström, O., & McDonald, R. (2011). Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models. Advances in Information Retrieval, 368‑374. https://doi.org/10.1007/978-3-642-20161-5_37
Mots-clés: ND
URL: https://link.springer.com/chapter/10.1007/978-3-642-20161-5_37
doi: https://doi.org/10.1007/978-3-642-20161-5_37
Accessibilité de l'article: Restricted
Champ: Natural Language Processing
Type contenu (théorique Applicative méthodologique): Applicative
Méthode: We propose a model that learns to analyze fine-grained sentiment strictly from coarse annotations. Such a model can leverage the plethora of labeled documents from multiple domains available on the web. In particular, we focus on sentence level sentiment (or polarity) analysis.
The model we present is based on hidden conditional random fields (HCRFs) [10], a well-studied latent variable structured learning model that has been used previously in speech and vision.
Cas d'usage: ND
Objectifs de l'article: In this paper we investigate the use of latent variable structured prediction models for fine-grained sentiment analysis in the common situation where only coarse-grained supervision is available. Specifically, we show how sentencelevel sentiment labels can be effectively learned from document-level supervision using hidden conditional random fields (HCRFs)
Question(s) de recherche/Hypothèses/conclusion: Research question(s) : This study inverts the evaluation and attempts to assess the accuracy of the latent structure induced from the observed coarse signal; Hypothesis(es) : In fact, one could argue that learning fine-grained sentiment from document level labels is the more relevant question for multiple reasons as 1) document level annotations are the most common naturally observed sentiment signal, e.g., star-rated consumer reviews, and 2) document level sentiment analysis is too coarse for most applications, especially those that rely on aggregation and summarization across fine-grained topics [3]; Conclusion(s) : In this paper we showed that latent variable structured prediction models can effectively learn fine-grained sentiment from coarse-grained supervision. Empirically, reductions in error of up to 20% were observed relative to both lexicon-based and machine-learning baselines. In the common case when document labels are available at test time as well, we observed error reductions close to 30% and over 20%, respectively, relative to the same baselines. In the latter case, our model reduces errors relative to the strongest baseline with 8%
Cadre théorique/Auteur.es: Sentiment analysis (Pang, Lee, 2008); Lexicon-based sentiment analysis (Hatzivassiloglou et McKeown, 1997 ; Kim et Hovy, 2004 ; Turney, 2002); Machine-learning for sentiment analysis (Pang, Lee, Vaithyanathan, 2002); Limits of both models (Hu et Liu, 2004 ; Wilson, Wiebe et Hoffmann, 2005); Hidden conditional random fields (Quattoni et al., 2007); Latent-variable structured learning models (Nakagawa, Inui, Kurohashi, 2010 ; Yessenalina, Yue, Cardie, 2010)
Concepts clés: Supervised learning; Sentiment analysis
Données collectées (type source): For our experiments we constructed a large balanced corpus of consumer reviews from a range of domains.
A training set was created by sampling reviews from five different domains: books, dvds, electronics, music and videogames. Document sentiment labels were obtained by labeling one and two star reviews as negative (NEG), three star reviews as neutral (NEU), and four and five star reviews as positive (POS).
Définition des émotions: No definition; Negative, neutral, positive labeling
Ampleur expérimentation (volume de comptes): A training set was created by sampling a total of 143,580 positive, negative and neutral reviews (the total number of sentences is roughly 1.5 million).

To study the impact of the training set size, additional training sets, denoted S and M,were created by sampling 1,500 and 15,000 documents from the full training set, denoted L.

A smaller separate test set of 294 reviews was constructed by the same procedure. This set consists of 97 positive, 98 neutral and 99 negative reviews
Technologies associées: Latent variable structured prediction models; Hidden conditional random fields
Mention de l'éthique: ND
Finalité communicationnelle: The model we employed, a hidden conditional random field, leaves open a number of further avenues for investigating weak prior knowledge in fine-grained sentiment analysis, most notably semi-supervised learning when small samples of data annotated with fine-grained information are available.
Résumé: In this paper we investigate the use of latent variable structured prediction models for fine-grained sentiment analysis in the common situation where only coarse-grained supervision is available. Specifically, we show how sentence-level sentiment labels can be effectively learned from document-level supervision using hidden conditional random fields (HCRFs) [10]. Experiments show that this technique reduces sentence classification errors by 22% relative to using a lexicon and 13% relative to machine-learning baselines.

Pages du site: Publications (ancienne version); Contenu

Fait partie de Discovering Fine-Grained Sentiment with Latent Variable Structured Prediction Models