Article

Crowdsourcing-generated and Crowdsourcing-labeled Dataset for Speech Quality Prediction (en)

* Presenting author
Day / Time: 21.03.2024, 09:00-09:20
Room: Raum 11/13
Typ: Regulärer Vortrag
Abstract: Models for speech quality prediction are commonly trained on speech data which has been recorded in a studio environment, artificially degraded by target impairments and then judged by listeners in a laboratory test, under quiet conditions. This guarantees maximum control over the process, but lacks realistic acoustic environments and recording/playback devices. In the paper, we describe an approach to collect speech files under different background noise conditions in a crowdsourcing environment, degrade these files by common network impairments, and rate them again by crowdworkers in realistic acoustic conditions. We present the recording instructions and procedure, as well as the automatic screening algorithm, analyze the resulting speech files with respect to the acoustic target scenes, and the crowdsourcing rating procedure and outcome. The dataset will be made openly available to the research community, and will be used for assessing the performance of popular speech quality prediction models in the future.