Shammur Absar Chowdhury, Arindam Ghosh, Evgeny Stepanov, Ali Orkan Bayer, Giuseppe Riccardi and Ioannis Klasinas
Interspeech-2014, Singapore
Publication year: 2014

Abstract

The development of a natural language speech application requires the process of semantic annotation. Moreover multilingual porting of speech applications increases the cost and complexity of the annotation task. In this paper we address the problem of transferring the semantic annotation of the source language corpus to a low-resource target language via crowdsourcing. The current crowdsourcing approach faces several problems. First, the available crowdsourcing platforms have skewed distribution of language speakers. Second, speech applications require domain-specific knowledge. Third, the lack of reference target language annotation, makes crowdsourcing worker control very difficult. In this paper we address these issues on the task of cross-language transfer of domain-specific semantic annotation from an Italian spoken language corpus to Greek, via targeted crowdsourcing. The issue of domain knowledge transfer is addressed by priming the workers with the source language concepts. The lack of reference annotation is coped with a consensus-based annotation algorithm. The quality of annotation transfer is assessed using source language references and inter-annotator agreement. We demonstrate that the proposed computational methodology is viable and achieves acceptable annotation quality.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

4 + 18 =