Why this paper?

I’m currently working on creating new encoding method. Bar-level repetition could be applied on mind too.
To check how similarity between measures are measured.
I want to see how they design the experiments to prove their ideas, especially when they need to compare their generation models.

About Melons

Problem Identification:
- Generating full-song melodies with a clear and coherent long-term structure is challenging.
Proposed Solution:
- MELONS Framework: The paper introduces MELONS, a framework designed to generate melodies with long-term structure.
- Graph Representation: It employs a graph representation of musical structure, capturing eight types of bar-level relations.
  
  <8 types of bar-level relations>
  
  from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
  
  <Graph figures1>
  
  from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
  
  <Graph figures2>
  
  from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
Methodology:
- Multi-Step Generation: The melody generation process is divided into two main steps:
  1. Structure Generation: Creating the underlying structure of the melody.
    
    token triple (i, j, t) previous bar j to the current bar i with a relation type t
  2. Structure Conditional Melody Generation: Generating the actual melody based on the pre-defined structure.
    - Framework Overview
      
      Used encoding scheme: Compound Word - (type, beat, tempo, pitch, duration, relation type)
      
      Sequences for training: {[start of the melody context], (content of the melody context),[start of the related bar], (content of the related bar), [relation type], [start of the target bar], (content of the target bar)}.
      
      <Example of sequence>
      
      Structure Generation
      1. Modeling the Structure Graph:
        
        Inspired by GraphRNN, the structure graph is modeled as a sequence of relations (i, j, t).
      2. Auto-regressive Transformer Model: This model predicts the next edge in the sequence, one token triple at a time, until an EOS (End of Song) token is predicted, indicating the end of the structure.
      Melody Generation
      
      This part consists of two modules:
      1. Unconditional Music Generation Module:
        
        An auto-regressive Transformer model trained on original melodies.
        
        It generates new bars without any conditioning.
      2. Conditional Generation Module:
        
        For each relation (i,j,t), the model generates the melody for the i-th bar conditioned on the j-th bar and the relation type t.
      Detailed Generation Algorithm
      1. Initialization:
        
        Input: An 8-bar melody motif M={b1,b2,…,b8}.
        
        Extract the structure of M into a relation list R.
      2. Structure Generation:
        
        Predict the next relation in the structure graph autoregressively and append it to R.
      3. Melody Generation:
        
        If the relation does not exist → Generate the i-th bar bi using the unconditional generation module.
        
        Generate the i-th bar bi using the conditional generation module, conditioned on 8 measures and related measures
      <Figures of the generation framwork>
      
      from the paper “MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH”, Yi Zou et al.
Experiment Section
- Datasets
  1. POP909 Dataset:
    
    The POP909 dataset, as shown in the image below, consists of 909 MIDI files based on piano performances. These performances are divided into melody and accompaniment, making it an ideal dataset for extracting monophonic melodies.
    - example data
  2. Wikifonia Dataset:
    
    The Wikifonia dataset comprises 6,673 musicXML files, representing melody data. The dataset is shown in the form of sheet music as seen in the image below.
    - example data
- Model Architecture
  1. Structure Generation Model:
    - 4 layers, 4 heads, 256 dimensions.
    - Inner size: 1024, which is relatively small.
  2. Melody Generation Model:
    - 6 layers, 8 heads, 512 dimensions, 2048 linear size.
    - CP encoding was used with different embedding size in sub-tokens
- Quantitative Evaluation
  - The bar relations in the generated music were compared with the ratios of bar relations in the two primary datasets (POP909 and Wikifonia) as well as an additional dataset.
    - ratio comparison
  - The evaluation measured the accuracy correctly generated bars following a given relations
    
    “Among all the relations, rhythmic sequence owns the highest accuracy(92.88%) while transposition shows the lowest performance(77.33%)” from the paper.
- Listening Test Settings
  1. Model settings:
    - Structure Graph Source:
      - Real Music (R): Structure graphs extracted from real music.
      - Generated (no mark): Structure graphs generated by the structure generation network.
    - Types of Relations in Structure Graph:
      - Basic (B): Relations limited to simple repetition and rhythm sequence as suggested by PopMNet.
      - Sophisticated (no mark): The more complex set of relations as described in Section 2.1 of the paper.
  2. Selection of Motifs:
    - 10 motifs were randomly selected from the testing set.
  3. Generation of Melodies:
    - Sets of melodies corresponding to the experimental systems were generated based on these motifs.
  4. Participants:
    - 12 professional musicians were invited to participate as listeners.
  5. Evaluation:
  - Listening test results
Discussion Questions for Lab Members on the MELONS Paper

Why this paper?

About Melons

Structure Generation

Melody Generation

Detailed Generation Algorithm