Shammur Absar Chowdhury, Giuseppe Riccardi
ICASSP, IEEE 2017, New Orleans, USA.
Publication year: 2017

ABSTRACT

The motivation behind the research on overlapping speech has always been dominated by the need to model human- machine interaction for dialog systems and conversation anal- ysis. To have more complex insights of the interlocutors’ intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocu- tor’s intention to grab the floor. This act could be a com- petitive or non-competitive act, which either signals a prob- lem or indicates assistance in communication. In this paper, we present a Deep Learning approach to modeling competi- tiveness in overlapping speech using acoustic and lexical fea- tures and their combination. We compare a fully-connected feed-forward neural network to the Support Vector Machine (SVM) models on real call center human-human conversa- tions. We have observed that feature combination with DNN (significantly) outperforms SVM models, both the individual feature baselines and the feature combination model by 4% and 2% respectively.

Leave a Reply

Your email address will not be published. Required fields are marked *

2 × 1 =