Binaural Speech Enhancement Based on Deep Fusion Layers (en)
* Presenting author
Abstract:
This work introduces a method to enhance speech in binaural hearing scenarios by integrating latent features from monaural speech enhancement algorithms using "Fusion layers." Inspired by multi-task learning techniques and physiological binaural mechanisms, these layers perform Hadamard products between latent spaces, improving noise reduction by sharing information across processing stages. The study showcases the efficacy of the fusion model in fitting synthetic data better than linear models, balancing activation variance, and leveraging data redundancy for improved training. Moreover, implementing fusion layers enhances speech more effectively compared to the baseline model, but excessive fused features may hinder intelligibility. The research advocates for parameterized sharing of latent representations within fusion layers to effectively utilize information from both listening sides. Overall, this study highlights the potential of deep fusion layers to enhance binaural speech while maintaining consistent trainable parameters and improving generalization.