GLA in MediaEval 2018 Emotional Impact of Movies Task
- Auteur-es
- Sun, Jennifer J.; Liu, Ting; Prasad, Gautam
- Nombre Auteurs
- 3
- Titre
- GLA in MediaEval 2018 Emotional Impact of Movies Task
- Année de publication
- 2019
- Référence (APA)
- Sun, J. J., Liu, T., & Prasad, G. (2019, novembre). GLA in MediaEval 2018 Emotional Impact of Movies Task. MediaEval 2018 Multimedia Benchmark Workshop. MediaEval’18, 29-31 October 2018, Sophia Antipolis, France. https://doi.org/10.48550/arXiv.1911.12361
- Mots-clés
- ND
- URL
- http://arxiv.org/abs/1911.12361
- doi
- https://doi.org/10.48550/arXiv.1911.12361
- Accessibilité de l'article
- Open access
- Champ
- Machine Intelligence
- Machine Perception
- Type contenu (théorique Applicative méthodologique)
- Applicative
- Méthode
-
This task, using the LIRISACCEDE dataset, enables researchers to compare different approaches for predicting viewer impact from movies. Our approach leverages image, audio, and face based features computed using pretrained neural networks. These features were computed over time and modeled using a gated recurrent unit (GRU) based network followed by a mixture of experts model to compute multiclass predictions. We smoothed these predictions using a Butterworth filter for our final result.
We first used pre-trained networks to extract features. To model the temporal aspects of the data, the methods we evaluated included long short-term memory (LSTM), gated recurrent unit (GRU) and temporal convolutional network (TCN). - Cas d'usage
- ND
- Objectifs de l'article
- Towards a better understanding of viewer impact, we present our methods for the MediaEval 2018 Emotional Impact of Movies Task to predict the expected valence and arousal continuously in movies. This task, using the LIRISACCEDE dataset, enables researchers to compare different approaches for predicting viewer impact from movies.
- Question(s) de recherche/Hypothèses/conclusion
- Research question(s) : Movies can cause viewers to experience a range of emotions, from sadness to relief to happiness. Viewers can feel the impact of movies, but it is difficult to predict this impact automatically. [...] The Emotional Impact of Movies Task provides participants with a common dataset for predicting the expected emotional impact from videos. We focused on the first subtask in the challenge: predicting the expected valence and arousal continuously (every second) in movies.
-
Hypothesis(es) : Our method’s novelty lies in the unique set of features we extracted including image, audio, and face features (capitalizing on transfer learning) along with our model setup, which comprises of a GRU combined with a mixture of experts.
We approached the valence and arousal prediction as a multivariate regression problem. Our objective is to minimize the multilabel sigmoid cross-entropy loss and this could allow the model to use potential relationships between the two dimensions for regression. - Conclusion(s) : We found that precomputed features modeling image, audio, and face in concert with GRUs provided the optimal performance in predicting the expected valence and arousal in movies for this task. [...] We found some evidence that recurrent models performed better than TCN. [...] The pre-computed features we used to model image, audio, and face information showed better performance when compared with the VGG16+openSMILE baseline.
- Cadre théorique/Auteur.es
- References to articles by previous MediaEval participants (Dellandréa et al., 2018 ; Jin et al., 2017 ; Liu, Gu, et Ko, 2017)
- Concepts clés
- Emotion prediction
- Données collectées (type source)
- The dataset provided by the task is the LIRIS-ACCEDE dataset [3, 4], which is annotated with self-reported valence and arousal every second from multiple annotators.
- Définition des émotions
- No definition
- Ampleur expérimentation (volume de comptes)
- We optimized the hyperparameters of our models to have the best performance on the validation set, which consists of 13 movies from the development set. We then trained our models on the entire development set to run inference on the test set. Our setup used a batch size of 512.
- Technologies associées
- Transfer learning
- Inception network pre-trained on ImageNet
- AudioSet, a VGG-inspired model pre-trained on YouTube-8M
- An Inception based architecture trained on faces
- Artificial neural network (LSTM, GRU)
- Temporal Model (TCN)
- Mention de l'éthique
- ND
- Finalité communicationnelle
- Movies can cause viewers to experience a range of emotions, from sadness to relief to happiness. Viewers can feel the impact of movies, but it is difficult to predict this impact automatically. The ability to automatically predict movie evoked emotions is helpful for a variety of use cases [7] in video understanding.
- Résumé
- The visual and audio information from movies can evoke a variety of emotions in viewers. Towards a better understanding of viewer impact, we present our methods for the MediaEval 2018 Emotional Impact of Movies Task to predict the expected valence and arousal continuously in movies. This task, using the LIRIS-ACCEDE dataset, enables researchers to compare different approaches for predicting viewer impact from movies. Our approach leverages image, audio, and face based features computed using pre-trained neural networks. These features were computed over time and modeled using a gated recurrent unit (GRU) based network followed by a mixture of experts model to compute multiclass predictions. We smoothed these predictions using a Butterworth filter for our final result. Our method enabled us to achieve top performance in three evaluation metrics in the MediaEval 2018 task.
- Pages du site
- Contenu
Fait partie de GLA in MediaEval 2018 Emotional Impact of Movies Task