Representation learning for music classification and retrieval: bridging the gap between natural language and music semantics / Minz Won

Por:

Won, Minz

Colaborador(es):

Tipo de material:

Archivo de ordenadorIdioma: Inglés [Barcelona]: Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions, 2021Descripción: 1 recurso en línea (200 páginas)Tipo de contenido:

Texto (visual)

Tipo de medio:

electrónico

Tema(s):

Género/Forma:

Tesis y escritos académicos

Recursos en línea:

Acceso al documento

Resumen: The explosion of digital music has dramatically changed our music consumption behavior. Massive digital music libraries are now available through streaming platforms. Since the amount of information available to an individual listener has increased greatly, it is nearly impossible for them to go through the entire catalog exhaustively. As a result, we need robust knowledge management systems more than ever. Recent advances in deep learning have enabled data-driven music representation learning for classification and retrieval. However, there is still a gap between machinelearned representations and the human understanding of music. This dissertation aims at reducing this semantic gap in order to assist listener's behavior around music information with advanced algorithmic support. To this end, we tackle three main challenges in representation learning: model architecture design, scalability, and multimodality. Firstly, we carefully review previous deep representation models and propose new architectures that improve the representation in qualitative and quantitative ways. The newly proposed models are more flexible, interpretable, and powerful than previous ones. Secondly, training schemes beyond supervised learning are explored as a way to achieve scalable research. Transfer learning, semi-supervised learning, and self-supervised learning approaches are addressed in detail; transfer learning and semi-supervised methods are applied to enhance music representation learning. Finally, metric learning is proposed as a way to bridge music audio representation and natural language semantics, forming a multi-modal embedding space. This facilitates music retrieval using arbitrary tags beyond a fixed vocabulary, and makes it possible to match music to text stories based on mood. Although our work focuses on bridging music and natural language semantics, we believe the proposed approaches generalize to other modalities. All implementation details of this thesis are available and open-source for reproducibility. The knowledge gained throughout this thesis has been put in practice and grounded in research internships and collaborations with multiple industries.

Existencias
Tipo de ítem	Biblioteca actual	Colección	Ubicación en estantería	Signatura topográfica	URL	Estado	Fecha de vencimiento	Código de barras
Documentos-e (locales y remotos)	Biblioteca CDAEM	Monografías Música	DOCS E	M (M) - e102	Enlace al recurso	Disponible		0025353344748

Directores: Xavier Serra i Casals; Horacio Saggion, Dept. of Information and Communication Technologies.

Fecha de defensa: 01-07-2022

The explosion of digital music has dramatically changed our music consumption behavior. Massive digital music libraries are now available through streaming platforms. Since the amount of information available to an individual listener has increased greatly, it is nearly impossible for them to go through the entire catalog exhaustively. As a result, we need robust knowledge management systems more than ever. Recent advances in deep learning have enabled data-driven music representation learning for classification and retrieval. However, there is still a gap between machinelearned representations and the human understanding of music. This dissertation aims at reducing this semantic gap in order to assist listener's behavior around music information with advanced algorithmic support. To this end, we tackle three main challenges in representation learning: model architecture design, scalability, and multimodality. Firstly, we carefully review previous deep representation models and propose new architectures that improve the representation in qualitative and quantitative ways. The newly proposed models are more flexible, interpretable, and powerful than previous ones. Secondly, training schemes beyond supervised learning are explored as a way to achieve scalable research. Transfer learning, semi-supervised learning, and self-supervised learning approaches are addressed in detail; transfer learning and semi-supervised methods are applied to enhance music representation learning. Finally, metric learning is proposed as a way to bridge music audio representation and natural language semantics, forming a multi-modal embedding space. This facilitates music retrieval using arbitrary tags beyond a fixed vocabulary, and makes it possible to match music to text stories based on mood. Although our work focuses on bridging music and natural language semantics, we believe the proposed approaches generalize to other modalities. All implementation details of this thesis are available and open-source for reproducibility. The knowledge gained throughout this thesis has been put in practice and grounded in research internships and collaborations with multiple industries.

No hay comentarios en este titulo.

para colocar un comentario.