Described techniques may be utilized to receive a transcription stream including transcribed text that has been transcribed from speech, and to receive a summary request for a summary to be provided on a display of a device. Extracted text may be identified from the transcribed text and in response to the summary request. The extracted text may be processed using a summarization machine learning (ML) model to obtain a summary of the extracted text, and the summary may be displayed on the display of the device. When an image is captured, an augmented summary may be generated that includes the image together with a visual indication of one or more of an emotion, an entity, or an intent associated with the image, the summary, or the extracted text.
An example system and method elicits reviews and opinions from users via an online system or a web crawl. Opinions on topics are processed in real time to determine orientation. Each topic is analyzed sentence by sentence to find a central tendency of user orientation toward a given topic. Automatic topic orientation is used to provide a common comparable rating value between reviewers and potentially other systems on similar topics. Facets of the topics are extracted via a submission/acquisition process to determine the key variables of interest for users.
Emoticons or other images are inserted into text messages during chat sessions without leaving the chat session by entering an input sequence onto an input area of a touchscreen on an electronic device, thereby causing an emoticon library to be presented to a user. The user selects an emoticon, and the emoticon library either closes automatically or closes after the user enters a closing input sequence. The opening and closing input sequences are, for example, any combination of swipes and taps along or on the input area. Users are also able to add content to chat sessions and generate mood messages to chat sessions.
Implementations described herein relate to causing emoji(s) that are associated with a given emotion class expressed by a spoken utterance to be visually rendered for presentation to a user at a display of a client device of the user. Processor(s) of the client device may receive audio data that captures the spoken utterance, process the audio data to generate textual data that is predicted to correspond to the spoken utterance, and cause a transcription of the textual data to be visually rendered for presentation to the user via the display. Further, the processor(s) may determine, based on processing the textual data, whether the spoken utterance expresses a given emotion class. In response to determining that the spoken utterance expresses the given emotion class, the processor(s) may cause emoji(s) that are stored in association with the given emotion class to be visually rendered for presentation to the user via the display.
Systems and methods for capturing media content in accordance with viewer expression are disclosed. In some implementations, a method is performed at a computer system having one or more processors and memory storing one or more programs for execution by the one or more processors. The method includes: (1) while a media content item is being presented to a user, capturing a momentary reaction of the user; (2) comparing the captured user reaction with one or more previously captured reactions of the user; (3) identifying the user reaction as one of a plurality of reaction types based on the comparison; (4) identifying the portion of the media content item corresponding to the momentary reaction; and (5) storing an association between the identified user reaction and the portion of the media content item.
Systems, methods, and non-transitory computer-readable media can determine one or more chunks for a content item to be captioned. Each chunk can include one or more terms that describe at least a portion of the subject matter captured in the content item. One or more sentiments are determined based on the subject matter captured in the content item. One or more emotions are determined for the content item. At least one emoted caption is generated for the content item based at least in part on the one or more chunks, sentiments, and emotions. The emoted caption can include at least one term that conveys an emotion represented by the subject matter captured in the content item.
Systems, methods, and non-transitory computer readable media can obtain a conversation of a user in a chat application associated with a system, where the conversation includes one or more utterances by the user. An analysis of the one or more utterances by the user can be performed. A sentiment associated with the conversation can be determined based on a machine learning model, wherein the machine learning model is trained based on a plurality of features including demographic information associated with users.
Systems, methods, and non-transitory computer-readable media can acquire a set of media content items. A mood indication can be acquired. A soundtrack can be identified based on the mood indication. A video content item can be dynamically generated in real-time based on the set of media content items and the mood indication. The video content item can include the soundtrack.