AI Hears Your Anger in 1.2 Seconds

Friday February 8, 2019. 04:35 PM , from Slashdot

MIT Media Lab spinoff Affectiva's neural network, SoundNet, can classify anger from audio data in as little as 1.2 seconds regardless of the speaker's language -- just over the time it takes for humans to perceive anger. From a report: Affectiva's researchers describe it ('Transfer Learning From Sound Representations For Anger Detection in Speech') in a newly published paper [PDF] on the preprint server Arxiv.org. It builds on the company's wide-ranging efforts to establish emotional profiles from both speech and facial data, which this year spawned an AI in-car system codeveloped with Nuance that detects signs of driver fatigue from camera feeds. In December 2017, it launched the Speech API, which uses voice to recognize things like laughing, anger, and other emotions, along with voice volume, tone, speed, and pauses.

SoundNet consists of a convolutional neural network -- a type of neural network commonly applied to analyzing visual imagery -- trained on a video dataset. To get it to recognize anger in speech, the team first sourced a large amount of general audio data -- two million videos, or just over a year's worth -- with ground truth produced by another model. Then, they fine-tuned it with a smaller dataset, IEMOCAP, containing 12 hours of annotated audiovisual emotion data including video, speech, and text transcriptions.

Read more of this story at Slashdot.