Call for PhD applications at the Machine Listening Lab

The Machine Listening Lab is welcoming PhD applications for September 2021 entry. Applicants from all nationalities can apply across different funding schemes and PhD programmes. Current PhD funding opportunities for September 2021 entry include:

Applicants are encouraged to contact prospective supervisors before submitting their application – please send an email to your selected supervisors with your CV and draft research proposal.

Suggested PhD topics offered by Machine Listening Lab academics as part of the AIM PhD Programme include:

Deep learning for automatic singer identification

Supervisor: Dr Bhusan Chettri

The goal of this project is to develop advanced end-to-end models to automatically detect the identity of a singer from a given music recording [1, 2]. Such automatic systems can be used for organising, browsing, and retrieving music collections. One such closely related task from the speech community is called speaker identification. In the past few years, novel deep learning methods have been developed that have improved state-of-the-art results in speaker identification. This project aims to leverage the recent developments in speaker identification (for example, SincNet [3], X-vector [4]) to singer identification tasks and study if such methods can be optimised for polyphonic music signals.

The candidate should have experience in at least one of the following scientific areas: audio signal processing, machine learning, deep learning framework (Kaldi/TensorFlow/Keras/PyTorch) and programming skills (e.g. Python, C/C++, Matlab). A background in music is a plus, but not a requirement.

References
[1] A. Mesaros, T. Virtanen and A. Klapuri, “Singer identification in polyphonic music using vocal separation and pattern recognition methods,” in Proc. of ISMIR, 2007.
[2] Xulong Zhang et. al, “Music artist classification with wavenet classifier for raw waveform audio data”, arXiv:2004.04371, 2020.
[3] Mirco Ravanelli, Yoshua Bengio, “Speaker Recognition from Raw Waveform with SincNet”, in Proc. of SLT, 2018.
[4] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey and S. Khudanpur, “X-Vectors: Robust DNN Embeddings for Speaker Recognition,” in Proc. ICASSP, 2018

Music interestingness in the brain

Supervisor: Dr Huy Phan
in collaboration with Aarhus University

Measuring interestingness of a song when one is listening to the song will not only shed some light on individual music perception, allowing personalized music recommendation, but also open possibility of using music songs as a brain stimulus. This project aims to automatically measure interestingness of a music songs in the brain using Ear-EEG.

An Ear-EEG device will be used to measure the brain signal (EEG) in the ear canals when one is listening to a song, which is then assessed by machine learning algorithms (potentially deep neural networks) to map the recorded EEG signal into an interestingness measure. Data collection will be carried out and a cohort of young and healthy subjects will be recruited for this purpose. This data will allow exploring different machine learning algorithms and techniques for interestingness modelling. Personalisation and multi-modal modelling, that combines music information (either raw signals or high-level musical features, e.g. melody, music genre, etc.) and the EEG, will also be investigated. This is a joint project with the Centre for Ear-EEG, Aarhus University, and the candidate is expected to work with academics in both C4DM and the Centre for Ear-EEG.

Meta-learning for music data

Supervisor: Dr Emmanouil Benetos

Meta-learning, or “learning to learn”, is an emerging area in the broader field of machine learning. Contrary to conventional machine learning approaches where a particular task is solved using a fixed learning algorithm, the main aim of meta-learning is to learn and improve the learning algorithm itself, so that it can absorb information from one task and generalise across unseen tasks. Meta-learning has various uses in machine learning applications, for example in cases where large datasets are unavailable or when we would like to rapidly learn something about a new task without training our model from scratch. It is also closely related to other emerging machine learning concepts, such as multi-task learning, transfer learning, few-shot learning, and self-supervised learning amongst others. While meta-learning has seen a dramatic rise in research interest in recent years, its principles have seen limited adoption in the intersection of music and AI research.

https://www.aim.qmul.ac.uk/This PhD project will investigate methods for meta-learning applied to music data, such as audio recordings or music scores. The successful candidate will investigate, propose and develop novel machine learning methods and software tools for meta-learning, and will apply them to address tasks related to music and audio data analysis. This will result in methods that can rapidly learn from limited music data, or on methods that can learn from one task and generalise to other unseen tasks related to music and audio data analysis.

Suggested PhD topics for studentships in Computer Science or Electronic Engineering programmes include:

Scalable audio event detection and localisation for domestic acoustic monitoring

Supervisor: Dr Huy Phan

Audio event detection and localisation, which is a highly active research topic, entangles the “what” and “where” questions about occurring sound events. It would enable a wide range of novel applications, particularly domestic acoustic monitoring for healthcare. In this application, it is the case that the target acoustic environments are often different from house to house, causing reverberation mismatch particularly when a system is deployed in a totally new environment. This aspect remains uncharted in the current methods proposed for audio event detection and localisation, hindering scalable deployment and robustness of the system. This project aims to evaluate the robustness of the state-of-the-art methods and propose new machine-learning (potentially deep-learning) and inference methods to address this limitation. Furthermore, apart from being robust against environmental mismatch, such a scalable system should be self-adaptive to resources available of a target device (e.g. IoT devices and mobile devices), able to detect event of interest as early as possible.


Voice and language analysis for personality disorder detection

Supervisor: Dr Huy Phan

Bipolar Disorder (BD) and Borderline Personality Disorder (BPD) are two major mental health disorders that can seriously affect the life of patients. An early and correct diagnosis of these diseases is of paramount importance for an early intervention and treatment. This project aims to develop new methods to recognise these mental health disorders. We have interviewed and collected a voice database from a significant number of participants (both healthy and with disease status), the interviews were also transcripted. This database allows to explore different machine learning, particularly deep learning, methods to analyse the bimodal data (voice and language) for disease recognition. Another important aspect of this project is that we are not only interested in recognition but also in identifying acoustic and linguistic markers that are relevant to, and hopefully underpin, the diseases.

End-to-end learning for fake speech detection

Supervisor: Dr Bhusan Chettri

Voice biometric systems use automatic speaker verification (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoofing attacks (or presentation attacks) is of growing concern, and is now well acknowledged by the speech community. The vulnerability of ASV systems against spoofing attacks is an important problem to solve because it poses a serious threat to the security of such systems. Spoofing attack methods include (a) text-to-speech; (b) voice conversion techniques; (c) impersonation; (d) playing back speech recordings (Replay attack).

Although many researchers have studied the application of deep learning methods on this topic, the majority of these works discards phase information and limited work has been carried out on modelling from raw audio. Phase information may provide complementary information and improve detection performance. To that end, this project focusses on exploring the potential of learning from raw waveforms in an end to end setting. This project will explore both supervised and unsupervised learning paradigms to study this problem using the benchmark spoofing datasets released by the ASVspoof community.

[1] https://www.asvspoof.org/

Towards generalised fake speech detector using representation learning

Supervisor: Dr Bhusan Chettri

Voice biometric systems use automatic speaker verification (ASV) technology for user authentication. Even if it is among the most convenient means of biometric authentication, the robustness and security of ASV in the face of spoofing attacks (or presentation attacks) is of growing concern, and is now well acknowledged by the speech community. The vulnerability of ASV systems against spoofing attacks is an important problem to solve because it poses a serious threat to the security of such systems. Spoofing attack methods include (a) text-to-speech; (b) voice conversion techniques; (c) impersonation; (d) playing back speech recordings (Replay attack).

While there has been remarkable progress in developing fake speech countermeasures that perform well in intra-dataset test conditions, however, such countermeasures fail dramatically when tested on real world test conditions. Furthermore, a countermeasure trained (and tuned) to detect TTS and VC attacks performs poorly when tested for replay spoofing attack detection and vice-versa. However, from a practical implementation perspective, it is not possible to know the type of attack before hand. Hence, there is a need to design a countermeasure that generalises in all different attack conditions. To that end, this project focusses on representation learning from raw-audio that would generalise across all attack conditions using the benchmark spoofing datasets released by the ASVspoof community.

[1] https://www.asvspoof.org/

Multi-task learning for music information retrieval

Supervisor: Dr Emmanouil Benetos

Music signals and music representations incorporate and express several concepts: pitches, onsets/offsets, chords, beats, instrument identities, sound sources, and key to name but a few. In the field of music information retrieval, methods for automatically extracting information from audio focus only on isolated concepts and tasks, thus ignoring the interdependencies and connections between musical concepts. Recent advances in machine and deep learning have showed the potential of multi-task learning (MTL), where multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This research project will investigate methods for multi-task learning for music information retrieval. The successful candidate will investigate, propose and develop novel machine learning methods and software tools for jointly estimating multiple musical concepts from complex audio signals. This will result in improved learning efficiency and prediction accuracy when compared to task-specific models, and will help gain a deeper understanding on the connections between musical concepts.

Sound recognition in everyday environments

Supervisor: Dr Emmanouil Benetos

The emerging field of sound scene analysis refers to the development of software systems for automatically recognising everyday sounds and the environment/context of a recording. Applications of sound scene analysis include smart homes, urban planning, audio-based security/surveillance, indexing of sound archives, and acoustic ecology. This project will focus on recognizing sounds from everyday environments. You will carry out research and develop computational methods suitable for detecting overlapping sound events from noisy and complex audio, recorded in urban environments. In this project you will be based in the Machine Listening Lab in the Centre for Digital Music, developing new methods and software tools based on signal processing and machine learning theory.

Posted in Uncategorized