In one embodiment, a method includes identifying an emotion associated with an identified first object in one or more input images, selecting, based on the emotion, a mask from a set of masks, where the mask specifies one or more mask effects, and for each of the input images, applying the mask to the input image. Applying the mask includes generating graphical features based on the identified first object or a second object in the input images according to instructions specified by the mask effects, and incorporating the graphical features into an output image. The emotion may be identified based on graphical features of the identified first object. The graphical features of the identified object may include facial features. The selected mask may be selected from a lookup table that maps the identified emotion to the selected mask.
A dynamic text-to-speech (TTS) process and system are described. In response to receiving a command to provide information to a user, a device retrieves information and determines user and environment attributes including: (i) a distance between the device and the user when the user uttered the query; and (ii) voice features of the user. Based on the user and environment attributes, the device determines a likely mood of the user, and a likely environment in which the user and user device are located in. An audio output template matching the likely mood and voice features of the user is selected. The audio output template is also compatible with the environment in which the user and device are located. The retrieved information is converted into an audio signal using the selected audio output template and output by the device.
Systems and methods for emoji prediction and visual sentiment analysis are provided. An example system includes a computer-implemented method. The method may be used to predict emoji or analyze sentiment for an input image. An example method includes the step of receiving an image. The example method further includes the steps of generating an emoji embedding for the image and generating a sentiment label for the image using the emoji embedding. The emoji embedding may be generated using a machine learning model.
Meetings held in virtual environments can allow participants to conveniently express emotions to a meeting organizer and/or other participants. The avatar representing a meeting participant can be enhanced to include an expression symbol selected by that participant. The participant can choose among a set of expression symbols offered for the meeting.