Problem Identification:
Proposed Solution:
MELONS Framework: The paper introduces MELONS, a framework designed to generate melodies with long-term structure.
Graph Representation: It employs a graph representation of musical structure, capturing eight types of bar-level relations.
<8 types of bar-level relations>
from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
<Graph figures1>
from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
<Graph figures2>
from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
Methodology:
Structure Generation: Creating the underlying structure of the melody.
token triple (i, j, t) previous bar j to the current bar i with a relation type t
Structure Conditional Melody Generation: Generating the actual melody based on the pre-defined structure.
Framework Overview
Used encoding scheme: Compound Word - (type, beat, tempo, pitch, duration, relation type)
Sequences for training: {[start of the melody context], (content of the melody context),[start of the related bar], (content of the related bar), [relation type], [start of the target bar], (content of the target bar)}.
<Example of sequence>
This part consists of two modules:
<Figures of the generation framwork>
from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
Experiment Section
Datasets
POP909 Dataset:
The POP909 dataset, as shown in the image below, consists of 909 MIDI files based on piano performances. These performances are divided into melody and accompaniment, making it an ideal dataset for extracting monophonic melodies.
Wikifonia Dataset:
The Wikifonia dataset comprises 6,673 musicXML files, representing melody data. The dataset is shown in the form of sheet music as seen in the image below.
Model Architecture
Quantitative Evaluation
The bar relations in the generated music were compared with the ratios of bar relations in the two primary datasets (POP909 and Wikifonia) as well as an additional dataset.
The evaluation measured the accuracy correctly generated bars following a given relations
“Among all the relations, rhythmic sequence owns the highest accuracy(92.88%) while transposition shows the lowest performance(77.33%)” from the paper.
Listening Test Settings
Discussion Questions for Lab Members on the MELONS Paper