We have an open PhD position in analysis and control techniques for deep-learning spoken language applications. This vacancy is part of the InDeep consortium project. Open until filled.
Speech processing is increasingly done via end-to-end rather than modular models: this makes is hard to understand what is causing the model’s decisions in general and specifically why it fails when it does. In the context of Automatic Speech Recognition (ASR), end-to-end typically means that the spectrogram is the input, and the transcription is the output. Other types of end-to-end speech models cover an even larger range distance between the input and output:
Given the opacity of such end-to-end models, it is desirable to develop and test methods for analyzing the intermediate representations they learn, and interpreting the decisions they make. The objective of this WP is to develop and test methods for manipulating intermediate representations learned by end-to-end speech-understanding models in order to make it possible for users to debug them, to control them, and explain their output.