Selected Projects

QVoice: Voice Technology in Education

The primary goal of the QVoice project is to enhance speech technology specifically designed for Automatic Spoken Language Learning. Our target audience encompasses both non-native adults and children.
Central to our research is strengthening multilingual speech capabilities, with a particular emphasis on Arabic and English languages.

L1-Background Handling: QVoice aims to delve into diverse research strategies to effectively manage and recognize various accents, dialects, and speaking styles within a unified phonetic framework.

Augmentation Techniques: We will explore a range of augmentation methods to enrich L2 and children’s speech models.

Multilingual and Multimodal Integration: The project endeavors to seamlessly integrate information from various learners’ L1-background and input modalities.

Website: https://qvoice.qcri.org/

code-switching asr

Multilingual, Dialectal and Code-switching ASR

The goal of the project is to design cutting-edge speech technology focusing on recognizing speech from the diverse array of Arabic dialects found across the MENA region. By embracing the rich tapestry of languages and the profound cultural influences present in the Middle East and North Africa, this project aims to bridge gaps in communication.
It not only targets modern standard Arabic (MSA) and dialectal speech but also the intricate nuances of code-switching, where speakers fluidly transition between languages and dialects.

Explainable Speech Models

Deep neural networks are inherently opaque and challenging to interpret. Unlike hand-crafted feature-based models, we struggle to comprehend the concepts learned and how they interact within these models. This understanding is crucial not only for debugging purposes but also for ensuring fairness in ethical decision-making.

The project aims to conduct layer and neuron-wise analyses, probing for speaker, language, and channel properties among others to understand the capabilities of the network and ensure transparent and ethical decision-making in speech models.

 

Understanding Human Conversational Dynamics

The project seeks to unravel the complexities of human-to-human dialogues, especially in telephony conversation. Here we dive deep into challenges like turn-taking, overlapping speech – two or more participants speak simultaneously, and understanding the nuanced verbal and non-verbal cues that predict conversation outcomes. By decoding these dynamics, our goal is to refine AI systems, enabling them to navigate and contribute to real-world conversations more effectively and naturally.

Community and Platforms

  • Bangla NLP

    Co-founded BNLP community. The goal to enrich NLP and Speech resources for Bangla.

    More details in BNLP website

  • MyVoice

    Co-founded MyVoice Platform to facilitate dialectal speech research and enrich the technology, while giving researchers access to the resources.

    More details in MyVoice Platform

  • ArabicSpeech

    Actively maintaining ArabicSpeech community, releasing speech resources to enrich the Arabic research field.

    More details in ArabicSpeech

Previous Projects