Training a Neural Network on MIDI data with Magenta and Python
Since I started learning how to code, one thing that has always fascinated me was the concept of computers creating music. With Magenta, a Python library built that makes it easier to process music and image data, this can be done more easily than before. Magenta has pre-trained example models you can use to generate music, as seen in a previous blog post, but it's a lot more fun to create your own.
Let's walk through how to use Magenta to train a neural network on a set of music data from classic Nintendo games to generate new Nintendo-sounding tunes.
Installing Magenta
First we need to install Magenta, which can be done using pip. Make sure you create a virtual environment before installing. I am using Python 3.6.5, but Magenta is compatible with both Python 2 and 3.
Run the following command to install Magenta in your virtual environment, it's a pretty big library with a good amount of dependencies so it might take a bit of time:
Alternatively, if you want to install Magenta globally you can use the following shell commands to run an install script created by the Magenta team to simplify things:
This will give you access to both the Magenta and TensorFlow Python modules for development, as well as scripts to work with all of the models that Magenta has available. For this post, we're going to be using Magenta's Polyphony recurrent neural network model because we want to generate music that has multiple simultaneous notes.
Building your dataset
Before being able to train a neural network, you're going to need some data to work with. Magenta is good at working with MIDI files, so here is a set I created of 1285 songs in MIDI format from classic Nintendo games. Extract them into a directory of your choosing.
A collection of MIDI files is a great place to start, but we're going to have to convert them into "NoteSequences", a fast and efficient data format that's easier to work with for training.
From the root directory of your project, run the following terminal command, replacing input_dir
with the directory where your MIDI dataset is:
This will create a notesequences.tfrecord
file in a tmp
directory, which will be used in the next step.
Creating SequenceExamples
Now we need to create "SequenceExamples" to be fed to the model during training and evaluation. Each SequenceExample will contain a sequence of inputs and labels that represent a polyphonic sequence.
Run the command below to extract polyphonic sequences from your NoteSequences and save them as SequenceExamples. Two collections of SequenceExamples will be generated, one for training, and one for evaluation. The fraction of SequenceExamples in the evaluation set is determined by the --eval_ratio
argument. This means that 10% of the extracted polyphonic tracks will be saved in the eval collection, and 90% will be saved in the training collection.
After this command finishes, there will be two files in the tmp/polyphony_rnn/sequence_examples
directory, one called training_poly_tracks.tfrecord
and one called eval_poly_tracks.tfrecord
to be used for training and evaluation respectively.
Once it's finished you are ready to train your model!
Training and evaluation
This step can take by far the longest amount of time, and can be difficult to get right. But with time and experimentation can be a lot of fun once you're finished. Let's start a training job using the attention configuration that comes with Magenta.
In the following command for training your model:
--run_dir
is the directory where checkpoints and TensorBoard data will be stored.--sequence_example_file
is the TFRecord file of SequenceExamples that will be fed to the model.--num_training_steps
is an optional parameter for how many update steps to take before exiting the training loop. By default, training will run continuously until manually terminated.- --hparams is another optional parameter that specifies the hyperparameters you want to use. In this example, we're going to use a custom batch size of 64 instead of the default of 128 to reduce memory usage. A larger batch size can potentially cause out-of-memory issues when training larger models. We'll also use a 3-layer RNN with 128 units each, instead of the default of 256 each. This will make our model train faster.
With these parameters in mind, run this command with the appropriate values to begin training:
This can potentially take a very long time, so try to have this running while you are away from your plugged-in computer, or even on a machine in the cloud. If you have enough compute power, you can try using larger layer sizes for better results. It's fun to experiment with different parameters to see how it affects the results your model generates.
You can also optionally run an eval job in parallel by keeping all the values in the previous command the same, except for the --sequence_example_file
which should point to the separate set of eval polyphonic tracks, and adding the --eval
flag to it. This isn't necessary to generate music, but can help if you want greater insight into how your model is performing.
Generating polyphonic tracks with your model
After you've let your model train for a bit, you can test it out using the latest checkpoint file of your trained model. This can be done either during or after training.
Run the following command to generate 10 polyphonic tracks using the model you just trained. The --hparams
and --run_dir
values should be the same as in the command you used to run the training job:
The --primer_pitches
we're using here represent a chord with a quarter-note duration, and can be replaced by a melody by instead using the --primer_melody
argument. --num_steps
determines how long the generated music will be.
If the condition_on_primer
argument is set to true, then the RNN will receive the primer as its input before it begins generating a new sequence. This is useful if you're using the primer pitches as a chord to establish the key. If inject_primer_during_generation
is true, the primer will be injected as a part of the generated sequence. This is useful if you want to harmonize an existing melody.
You typically want only one of these values to be true depending on the scenario, but it can be fun to switch these values around and see how it affects the generated music.
Here are some of my favorite tunes that my model generated.
What do I do once I'm happy with my model?
Once you've messed around with the parameters (or maybe used a massively powerful supercomputer for training) and are happy enough with the results, you can create a bundle file to make your model more portable and easy to work with. This can be done by calling the same polyphony_rnn_generate
from the previous section, but with some slightly different parameters:
This will save your model based on the latest checkpoint to whatever file you specified in the --bundle_file
parameter.
Now you can send your model to all of your friends so they can make music with it! You can also send it to me if you want. Feel free to reach out for any questions or to show off any cool artificial creativity related projects you build or find out about:
- Email: Sagnew@twilio.com
- Twitter: @Sagnewshreds
- GitHub: Sagnew
- Twitch (streaming live code): Sagnewshreds
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.