Call for PhD applications at the Machine Listening Lab – 2023 entry

The Machine Listening Lab is welcoming PhD applications for September 2023 entry. Applicants from all nationalities can apply across different funding schemes and PhD programmes. Current PhD funding opportunities for September 2023 entry include:

Applicants are encouraged to contact prospective supervisors before submitting their application – please send an email to your selected supervisors with your CV and draft research proposal.

Suggested PhD topics offered by Machine Listening Lab academics include:

Deep learning for low-resource music
(for PhD in Artificial Intelligence and Music)

Supervisor: Dr. Emmanouil Benetos
in collaboration with Bytedance

The field of music information retrieval (MIR) has been growing for more than 20 years, with recent advances in deep learning having revolutionised the way machines can make sense of music data. At the same time, the MIR community is constrained by the data available, and most methods are focused on extracting information from mainstream music styles (mostly pop, rock, and classical music), using predefined sets of commonly used musical instruments, and when relevant assuming high-resource languages for singing voice analysis. Inspired by recent developments in the field of speech technology for low-resource languages, this PhD project will investigate and develop deep learning methods for making sense of music data in low-resource conditions, whether these refer to under-represented music styles, new musical instruments, or low-resource singing corpora. Methods based on few-shot and zero-shot learning will be investigated, along with methods for open-set recognition, meta-learning or semi-supervised learning, applied to various MIR tasks including but not limited to music tagging, music transcription, lyrics recognition, or audio matching or cover song detection.

The successful candidate will investigate, propose and develop novel methods for analysing low-resource music corpora, resulting in models that can rapidly learn or adapt from small or unlabelled datasets of under-represented music styles, musical instruments, or sung languages.

Posted in Uncategorized

MLLab at ISMIR 2022

On 4-8 December 2022, several MLLab researchers will participate at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022). ISMIR is the leading conference in the field of music informatics, and is currently the top-cited publication for Music & Musicology (source: Google Scholar). This year ISMIR will take place in Bengaluru, India, and online.

As in previous years, the Machine Listening Lab will have a strong presence at the conference, both in terms of numbers and overall impact.

In the Technical Programme, the following papers are authored/co-authored by MLLab members:

On Special Sessions, MLLab PhD student Lele Liu is joining as panellist the special session on PhD in MIR: Challenges and Opportunities.

On the Late-breaking/Demo (LBD) session, the following extended abstracts are authored/co-authored by MLLab members:

Posted in Uncategorized

MLLab at DCASE 2022

On 3-4 November, several Machine Listening Lab researchers will participate at the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022). The workshop aims to provide a venue for researchers working on computational analysis of sound events and scene analysis to present and discuss their results, and is organised in conjunction with the DCASE 2022 Challenge.

As in previous years, the Machine Listening Lab will have a strong presence at the workshop, both in terms of numbers and overall impact. The below papers presented at DCASE 2022 are authored or co-authored by MLLab members:

On challenge organisation, MLLab PhD students Inês Nolasco and Shubhr Singh, MLLab alumna and research visitor Veronica Morfi, and MLLab alumnus Dan Stowell are all involved in the organisation of the DCASE 2022 Challenge task on Few-shot Bioacoustic Event Detection, focusing on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations.

See you all at DCASE!

Posted in Uncategorized

MLLab student joins the Turing

A Machine Listening Lab PhD student has been given an enrichment award by the Alan Turing Institute, the UK’s national institute in artificial intelligence and data science, enabling them to join and interact with institute researchers and its community in the 2022/23 academic year.

Specifically, MLLab PhD student Jiawen Huang has been offered an Enrichment Placement Award for the project “Real-Time Audio-to-Lyrics Alignment for Polyphonic Music”.

Congratulations to Jiawen! For the full story on enrichment awards for Queen Mary doctoral students please read the QMUL newsitem.

Posted in Uncategorized

MLLab at ICASSP 2022

On 7-13 & 22-27 May, several MLLab researchers will participate at the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022). ICASSP is the leading conference in the field of signal processing and the flagship event of the IEEE Signal Processing Society.

As in previous years, the Machine Listening Lab will have a strong presence at the conference, both in terms of numbers and overall impact. The below papers presented at ICASSP 2022 are authored or co-authored by MLLab members:

See you all at ICASSP!

Posted in Uncategorized

Call for PhD applications at the Machine Listening Lab – 2022 entry

The Machine Listening Lab is welcoming PhD applications for September 2022 entry. Applicants from all nationalities can apply across different funding schemes and PhD programmes. Current PhD funding opportunities for September 2022 entry include:

Applicants are encouraged to contact prospective supervisors before submitting their application – please send an email to your selected supervisors with your CV and draft research proposal.

Suggested PhD topics offered by Machine Listening Lab academics include:


Computational analysis of chick vocalisations: from categorisation to live feedback
(for PhD in Computer Science funded by a QMUL Principal’s studentship)

Supervisors: Dr. Emmanouil Benetos and Dr. Elisabetta Versace

The assessment of animals’ emotional state and welfare is a central issue for behavioural neuroscience and ethical farming. Animal vocalisations provide a rich set of information on the inner state of animals and can be used to influence animal behaviour in industrial settings, such as chicken farms. However, there is limited research on using vocalisations for monitoring animal welfare, and on the use of audio technologies for automatic welfare assessment of poultry in industrial settings.

In this project, we will develop computational methods to automatically categorise vocalisations of domestic chickens, infer their emotional state, provide live feedback and identify stimuli that can improve animal welfare. The PhD project will build upon pilot work led by the supervisors [i] on automatic recognition of chick calls using machine learning and signal processing methods. This project will lead to computationally efficient and robust machine learning methods and systems for automatically monitoring poultry welfare from audio, as well as will investigate research questions related to poultry development, behaviour, and well-being in industrial settings. Prospective candidates should be curious, self-motivated and have experience in one or more of the following: Bioacoustics, Cognitive Science, Artificial Intelligence/Machine Learning, Digital Signal Processing.

[i] C. Wang, E. Benetos, S. Wang, and E. Versace, “Joint Scattering for Automatic Chick Call Recognition”, 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), submitted.


Discourse structure recognition for broadcast summarisation
(for PhD in Computer Science funded by a China Scholarship Council studentship)

Supervisors: Prof. Matthew Purver, Dr Huy Phan

This project will investigate new methods for inferring and detecting dialogue and narrative structure in broadcast data, primarily TV programmes, to enable more effective and meaningful semantic search and linking, and/or generation of concise meaningful summaries. Current methods for search and summarization are based around models of word and phrase meaning: these are effective when the key information is expressed though verbal content, and our recent work has shown that e.g. news broadcast segmentation can be improved by incorporating recent advances in NLP methods (Ghinassi, DataTV 2021). However, in many domains this is not the case: political debates need to be understood as networks of linked but contrasting opinions; dramas as often non-linear plotlines with key events and changes. Effective models of these formats must therefore include information about these elements of narrative/discourse structure, and learn to detect these in real datasets.

Methods for recognising and inferring narrative structure are available from related work on text understanding and summarization (including work from our lab, Droog-Hayes et al, ICSC 2019). This project will extend them to be suitable for spoken broadcast data and to use state-of-the-art neural network NLP methods, following our recent work in topic segmentation (Ghinassi, 2021) and related recent work in text segmentation (Angelov, 2020; Lukasik et al., 2021), while integrating information from multiple modalities including text, audio and video, again following our recent advances in multimodal modelling (Ghinassi et al., in prep.; Rohanian et al., Interspeech 2020). It will then build on this to develop models for jointly learning to generate compressed summaries while detecting key events and points in the discourse (discussion points, key characters, plot changes etc.), by adapting recent neural methods for dialogue act detection and summarization (Goo & Chen, 2018; Goo et al., 2018), and combining with suitable graph structures


Multitask modelling for overlapping sound sources
(for PhD in Artificial Intelligence and Music)

Supervisor: Dr Huy Phan

Overlapping sound sources are the main error source in a modelling system. Examples are polyphonic audio events in audio event detection and polyphonic music in multi-instrument transcription. In deep-learning context, the most common approach to deal with event overlaps is to treat the modelling task as a multi-label classification problem. By doing this, we inherently consider multiple one-vs.-rest classification problems, which are jointly solved by a single (i.e. shared) network. This project investigates to frame the task as a multi-class classification problem by considering each possible label combination as one class. To circumvent the large number of arising classes due to combinatorial explosion, decomposition of the label space will be explored to form multiple groups of category labels and yield a multi-task problem in a divide-and-conquer fashion, where each of the tasks is a multi-class classification problem. Network architectures will be then devised for multi-task modelling. The approach will be validated on databases with high overlapping degree of sound sources for polyphonic audio event detection and polyphonic music transcription.


Personalized scientific-grading sleep monitoring in home environments
(for PhD in Computer Science funded by a China Scholarship Council studentship)

Supervisor: Dr Huy Phan

Good sleep is crucial in maintaining one’s mental and physical health while sleep disorders are linked with plethora of different ailments, such as cardiovascular diseases, dementia, and depression to name a few. Accurate and cost-effective monitoring of sleep not only has great medical value but also allows individuals to self-assess and self-manage their sleep. Its importance gives rise to increasing demand in bringing sleep monitoring from labs to home environments for longitudinal monitoring. Existing commercial devices has limited value for scientific purposes, such as sleep disorders assessment or studying sleep under conditions like dementia. Novel wearable in-ear EEG devices, whose sensors are fitted neatly inside an ear canal to measure brain activities, hold a great potential for home-based use. Furthermore, they have been shown to be comparable to the lab-based polysomnography for sleep scoring.

However, there are two data-related challenges with these devices. First, manually labelling a large amount of their data is difficult and expensive as it requires to have polysomnography data recorded in parallel. Second, it exhibits strong “trait-like characteristics” specific to individuals and significant night-to-night variation in an individual’s sleep. This project aims to develop personalized deep learning methods to overcome these challenges. Domain adaptation methods will be explored to transfer knowledge from a large polysomnography database to ear-EEG. Regularization techniques will be developed given a potentially small amount of labelled ear-EEG data. Semi-supervised/unsupervised domain adaptation will be also explored to leverage unlabelled data. Furthermore, we will also investigate continual learning methods to keep up a personalized model with the possible “concept drift” caused by the long-term changes in a person’s sleep patterns. While these methods are required to learn from sequential, and potentially small, data, they also need to overcome catastrophic forgetting.


Resource-efficient models for music understanding
(for PhD in Artificial Intelligence and Music)

Supervisors: Dr. Emmanouil Benetos and Prof. Phillip Stanley-Marbell

State-of-the-art models for music understanding and music information research are often very hard to run on small and embedded devices such as mobile phones, single-board computers, and other microprocessors. At the same time, the computational cost, footprint, and environmental impact for building and deploying deep learning models for music understanding is constantly increasing. This PhD project will investigate methods for creating resource-efficient models for music understanding, applied to various tasks in music information research that involve music audio data, such as automatic music transcription, audio fingerprinting, or music tagging. Methods to be investigated can include but are not limited to sparse training, network pruning, binary neural networks, post-training inference, and knowledge distillation.

The successful candidate will investigate, propose and develop novel machine learning methods and software tools for resource-efficient music understanding, and will apply them to address tasks of their choice within the wider field of music information research. This will result in models that can be deployed on small or embedded devices, or on offline models where learning and inference times and computational resources are drastically reduced.


Self-supervision in machine listening
(for PhD in Artificial Intelligence and Music)

Supervisor: Dr. Emmanouil Benetos
in collaboration with Bytedance

Self-supervised learning methods aim to provide an alternative to supervised representation learning, eliminating the need for large annotated datasets. Self-supervision has advanced rapidly in recent years with applications across several modalities, and can be ideally used in machine listening and music understanding tasks which have been historically data-deprived compared to other domains. This PhD project will investigate methods for self-supervised learning applied to various tasks in applied to various tasks in music information research that involve music audio data, such as automatic music transcription, audio fingerprinting, or music tagging. Methods to be investigated can include but are not limited to contrastive self-supervised learning, formulation of appropriate pretext tasks, transferability to downstream tasks, and links between self-supervised and semi-supervised learning for music understanding.

The successful candidate will investigate, propose and develop novel self-supervised representation learning methods and software tools for music understanding, and will apply them to address tasks of their choice within the wider field of music information research. This will result in models that can learn from unlabelled data while performing comparably or surpassing supervised learning methods.


Sound source separation and localisation
(for PhD in Computer Science funded by a China Scholarship Council studentship)

Supervisors: Dr. Emmanouil Benetos and Prof. Mark Sandler

Audio and music source separation has been an active area of research, with applications in sound recording & production, broadcasting, and audio consumption. In recent years, deep learning techniques have dominated the topic, along with the use of priors for informed source separation. While the research community has mostly focused on single-channel source separation, real-world, professional studio applications make use of several microphones which are used in consort and their outputs are combined (or mixed) to create a composite signal.

This PhD project will investigate and propose machine learning methods for sound source separation and localisation (SSSL), which will separate a mixture recording into its constituent sources, in the context of a studio setup involving multiple microphones including spot and main feeds, while at the same time using information on the spatial location of sound sources to improve separation performance. The proposed research will lead to a new paradigm for informed sound source separation, where prior information will be provided both by spot mics related to constituent sources, as well as by providing or automatically inferring the spatial location of sound sources in the scene. The proposed project will draw knowledge from and will contribute knowledge to the fields of machine learning, digital signal processing, and acoustics, and will advance the broader fields of audio & music technology and signal separation. Upon completion, the project will provide new high fidelity and computationally efficient algorithms and models for separating and enhancing sources in studio practices.


Using Signal-informed Source Separation (SISS) principals to improve instrument separation from legacy recordings
(for PhD in Artificial Intelligence and Music)

Supervisors: Prof. Mark Sandler and Dr. Emmanouil Benetos

The recently proposed Signal-Informed Source Separation (SISS) paradigm from c4dm belongs to the broader category of Informed Source Separation (ISS), with the unique and specific attribute of using one audio signal to inform the separation of another. The informing source is a close approximation of a coherent component in the mixture. A current AIM PhD is examining this paradigm for live ensemble recordings when the spot mic signal informs the separation of the main mix. An alternative viewpoint is that the informing signal as a caricature of its corresponding component in the main feed. This suggests that we should investigate other caricatures for musical instrument separation and modification, especially for re-mixing and up-mixing of legacy commercial recordings. For example, a session musician could play the guitar line in the Beatles’ “She Loves You” to separate Harrison’s part, or it could be rendered from a MIDI transcription. Preliminary, confirmatory evidence for this approach appears in [1 & 2] which explore crude implementations with good outcomes but do not develop the approach further.

This PhD will develop skills in deep learning, especially architectures employing conditioning, as well as novel cost functions, perhaps incorporating physical models of the instruments to be separated. Applicants would benefit from a background in Machine Learning and DSP, coupled with knowledge of modern music recording, processing and mixing techniques.

[1] P. Smaragdis & G. J. Mysore, ‘Separation by “humming”: User-guided sound extraction from monophonic mixtures’, in IEEE WASPAA, 2009.
[2] Y. Li et al, ‘Learning to Denoise Historical Music’, in ISMIR, 2020.

Posted in Uncategorized

CfP: EURASIP JASMP special issue on Recent Advances in Computational Sound Scene Analysis

EURASIP Journal on Audio, Speech and Music Processing
https://asmp-eurasipjournals.springeropen.com/ssoundscene

Special issue on Recent Advances in Computational Sound Scene Analysis
Deadline: 1st April 2022

Topics of interest include but are not limited to:

  • Methodology: signal processing, machine learning, auditory perception, taxonomies, and ontologies related to sound scenes and events
  • Tasks and applications: acoustic scene classification, sound event detection and localization, sound source separation, audio tagging, audio captioning, detection of rare sound events, anomaly audio event detection, computational bioacoustic scene analysis, urban soundscape analysis, and cross-modal analysis (e.g. audio recognition/analysis with information from video, texts, image, language, etc.)
  • Machine learning methodologies for sound scene analysis: self-supervised learning, few-shot learning, meta-learning, generative models, explainable machine learning, continual learning, curriculum learning, active learning, multi-task learning, and attention mechanisms
  • Human-centered sound scene analysis: human-computer interaction and interfaces, user-centered evaluation, visualization of audio events and scenes, and user annotation
  • Evaluation, datasets, software tools, and reproducibility in computational sound scene and event analysis
  • Ethics and policy: legal and societal aspects of computational sound scene analysis; ethical and privacy issues related to designing, implementing and deploying sound scene analysis systems; privacy-preserving sound scene analysis; federated learning for sound scene analysis
  • Performance metrics: studies for developing effective evaluation metrics and tools for related tasks in audio scene analysis, event detection, and audio tagging

The EURASIP Journal on Audio, Speech, and Music Processing recognizes novel contributions of the following types within its area:

  • Empirical Research: Data-driven research, new experimental results, and new data sets
  • Methodology: New theory and methods for the processing of speech, audio, and music signals
  • Software: New software implementations and toolboxes for speech, audio, and music processing
  • Review: Timely and comprehensive overview and tutorial material covering recent developments within the field

Submission instructions:
https://asmp-eurasipjournals.springeropen.com/submission-guidelines

Guest Editors:
Jakob Abeßer, Fraunhofer IDMT, Germany
Emmanouil Benetos, Queen Mary University of London, UK
Annamaria Mesaros, Tampere University, Finland
Wenwu Wang, University of Surrey, UK


Posted in Uncategorized

MLLab at Interspeech 2021

The Machine Listening Lab will be participating to the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021), taking place on 30 August – 3 September both online and in Brno, Czech Republic. The following papers will be presented by MLLab members:

Posted in Uncategorized

MLLab students and staff to join the Alan Turing Institute

Two MLLab PhD students and three MLLab academics will join the Alan Turing Institute, the UK’s national institute in artificial intelligence and data science, in Autumn 2021.

The following MLLab PhD students will join the Turing as Enrichment students in 2021/22:

  • Lele Liu – Enrichment project: Cross-domain automatic music audio-to-score transcription
  • Ilaria Manco – Enrichment project: Multimodal deep learning for music information retrieval

The Turing’s Enrichment scheme offers students enrolled on a doctoral programme at a UK university an opportunity to boost their research project with a placement at the Turing for up to 12 months.

The following MLLab academics have been appointed Turing Fellows in 2021/22:

Turing Fellows are scholars with proven research excellence in data science, artificial intelligence or a related field whose research would be significantly enhanced through active involvement with the Turing network of universities and partners.

Posted in Uncategorized

MLLab at IJCNN 2021

On 18-22 July 2021, MLLab researchers will participate virtually at the IEEE International Joint Conference on Neural Networks (IJCNN 2021), the flagship conference of the IEEE Computational Intelligence Society and the International Neural Network Society.

The following papers authored/co-authored by MLLab members will be presented at IJCNN 2021:

  • MusCaps: Generating Captions for Music Audio
    Ilaria Manco, Emmanouil Benetos, Elio Quinton and Gyorgy Fazekas
    Paper
  • Revisiting the Onsets and Frames Model with Additive Attention
    Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos and Dorien Herremans
    Paper
Posted in Uncategorized