Navigation

Exploring Attention Models for Speech Enhancement

Proposal for a Master Thesis

Topic:

Exploring Attention Models for Speech Enhancement

Description:

In the field of machine learning, recurrent neural networks (RNNs) using long short-term
memory (LSTM) have achieved impressive results in natural language processing, text prediction, speech enhancement and machine translation. An important development is the attention mechanism [1], which allows to assign weights to the input of a neural network. This way, the parts of the input, to which the neural network should pay attention, can be weighted stronger than unimportant parts. In machine translation this would mean emphasizing names, places and important verbs, and neglecting filling words such as articles and conjunctions.

In this thesis project, a literature review shall be conducted to assess the possibilities attention models have in terms of speech enhancement. This includes collecting different applications and developments based on attention models. Based on the literature survey, an algorithm showcasing the power of attention should be chosen and implemented and compared to its not attention-based equivalent, e.g., [2]. From here, possible variations of the model with respect to speech enhancement can be explored.

[1] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation byJointly Learning to Align and Translate,”arXiv:1409.0473 [cs, stat], May2016.

[2] X. Hao, C. Shan, Y. Xu, S. Sun, and L. Xie, “An Attention-based NeuralNetwork Approach for Single Channel Speech Enhancement,” in2019IEEE Int. Conf. Acoust., Speech and Sig. Process. (ICASSP), pp. 6895–6899, May 2019

 

Download

Professor:

Prof. Dr.-Ing. Walter Kellermann

Supervisior:

Annika Briegleb, M.Sc., room 05.021 (Cauerstr. 7), annika.briegleb@fau.de

Prerequisites:

Fundamental understanding of recurrent neural networks, Python programming, ideally incl. Tensorflow

Available:

Immediately