Generating Nintendo Music Over the Phone in Python with Magenta and Twilio

October 04, 2019
Written by
Sam Agnew
Twilion

Copy of Generic Blog Header 1-3.png

Artificial creativity is amazing, and I am always blown away by what people accomplish with neural networks. Lots of developers have used Python libraries like Magenta to create awesome music.

I used Magenta to train a neural network on a bunch of classic Nintendo music, and I thought it would be fun to hack around with that and make a Twilio phone number you can call to hear computer-generated NES music. If you're in the US, try calling ‭(424) 398-8637‬ (424-39-TUNES) to hear!

Let's walk through how to build a phone number that plays computer generated music using Magenta.

Installing dependencies

First we need to install Magenta, which can be done using pip. Make sure you create a virtual environment before moving forward. I am using Python 3.6.5, but Magenta is compatible with both Python 2 and 3.

Run the following command to install Magenta in your virtual environment, it's a pretty big library with a good amount of dependencies so it might take a bit of time:

pip install magenta==1.1.6

This will give you access to both the Magenta and TensorFlow Python modules for development, as well as scripts to work with all of the models that Magenta has available. For this post, we're going to be using Magenta's polyphony recurrent neural network model.

To install the other dependencies run the following command:

pip install twilio==6.31.1 flask==1.1.1 rq==1.1.0 tensorflow==1.14.0

We're going to be using the Twilio python library to handle incoming phone calls, Flask as our web framework, and RQ as a task queue for generating the music asynchronously without slowing down the web requests.

In order for RQ to work, you'll also need to install Redis on your machine. That can be done with the following commands using wget:

wget http://download.redis.io/releases/redis-5.0.5.tar.gz
tar xzf redis-5.0.5.tar.gz
cd redis-5.0.5
make

Run Redis in a separate terminal window on the default port with the command src/redis-server from the directory where it's installed.

We have all our Python dependencies, but we also need to make sure a program called Fluidsynth is installed. Once the NES MIDIs are generated, this is what we'll use to convert them to audio to play over the phone.

This can be done with Homebrew by running brew install fluidsynth or on Ubuntu with:

sudo apt-get update
sudo apt-get install fluidsynth

Handling incoming Twilio phone calls with Flask

Since we're trying to create a computer-generated Nintendo music hotline, let's start by writing some code to handle phone calls. In the root directory of your project, create a file called app.py and add the following code to it:

import os

from flask import Flask, request, send_from_directory
from twilio.twiml.voice_response import VoiceResponse


# Folder where the generated music will be.
UPLOAD_FOLDER = '/path/to/uploads'

app = Flask(__name__)


@app.route('/call', methods=['GET', 'POST'])
def call():
    resp = VoiceResponse()
    resp.pause()
    resp.say('Please wait while we generate some new Nintendo music for you.')

    # Wait until we asynchronously update the call after music is generated.
    resp.pause(length=100)

    return str(resp)


@app.route('/uploads/<filename>', methods=['GET', 'POST'])
def uploaded_file(filename):
    return send_from_directory(UPLOAD_FOLDER, filename)

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)

This a Flask app with two routes. /call handles incoming Twilio phone calls by generating TwiML with a VoiceResponse which will answer the call by saying "Please wait while we generate some new Nintendo music for you" and then hold for 100 seconds before hanging up. Later on we are going to add some code to generate music and to asynchronously update the phone call to play the music once it's done, but we'll leave it like this for now.

/uploads is a route that takes a filename and serves it as static content. This is where we are going to serve the generated .wav files from later on. Create a directory somewhere in your project where you want to keep all of the sound files generated by your code, and update the value of UPLOAD_FOLDER to point to that directory.

Run your Flask app with python app.py from the directory your code is in, and visit http://localhost:5000/call to see something similar to this screenshot. I inspected the elements on the page to show the TwiML our /call route is serving.

/call route on your Flask app

Setting up your Twilio phone number

Before being able to handle phone calls, you’ll need a Twilio phone number. You can buy a phone number here (it’s free if you’re using the number to test your code during development).

Your Flask app will need to be visible from the internet in order for Twilio to send requests to it when phone calls come in. We will use ngrok for this, which you’ll need to install if you don’t have it. In your terminal run the following command:

ngrok http 5000

If you’ve just installed ngrok and that previous command didn’t work, you might have to run ./ngrok http 5000 from the directory that the ngrok executable is in. Now we just need to point a phone number at our app. You should see a black screen with a forwarding URL on it.

Ngrok forwarding url

Open the phone number configuration screen in your Twilio console. Scroll down to the “a call comes in” field, and paste the URL ngrok generated for you with a /call at the end. It should look something like this:

Twilio phone number configuration

This means that whenever someone calls your Twilio phone number, Twilio will send an HTTP POST request to the /call route of your Flask app, which is running on port 5000 behind your ngrok URL. Make sure you remember to hit save at the bottom of the page!

With your Flask app running, call your Twilio phone number to hear the robot voice. You can hang up afterwards since nothing else will happen for now.

Generating Nintendo music with Magenta

You have a Twilio phone number answering phone calls, so now it's time to write some code to generate Nintendo tunes! I already trained a model on a set of 1285 songs from NES games that you can use. Grab the bundle file here, and save it to the same directory as app.py.

With this bundle file, you'll be able to use Magenta to generate MIDI files, but in order to play them over the phone they need to be converted to audio files. We are going to use Fluidsynth to convert the MIDI files to audio using this Famicom soundfont which will make them sound like old Nintendo games.

Create a file called music_generator.py and add the following code to it:

import os, re, subprocess


GENERATION_DIR = '/path/to/generated/midi/directory'

soundfont = '/path/to/Famicom.sf2'
rnn_model = '/path/to/nes_rnn.mag'


# Takes a path to a midi and creates a .wav file with the Famicom soundfont.
def to_audio(midi_file, output_file):
    subprocess.call(['fluidsynth', '-T', 'wav', '-F',
                    output_file, '-ni', soundfont, midi_file])


# Uses the NES RNN model to generate a new MIDI file for a phone call.
def generate_midi(call_sid):
    output_dir = '{}/{}'.format(GENERATION_DIR, call_sid)
    args = ['polyphony_rnn_generate',
            '--bundle_file', rnn_model,
            '--output_dir', output_dir,
            '--num_outputs', '1',
            '--num_steps', '256']
    subprocess.call(args)
    midi_file = os.listdir(output_dir)[0]

    return '{}/{}'.format(output_dir, midi_file)


def generate_nes_music(call_sid, output_file):
    midi_file_path = generate_midi(call_sid)
    to_audio(midi_file_path, output_file)

This code is running a subprocess to execute the polyphony_rnn_generate shell script that Magenta provides you, and then taking the MIDI file produced by that script and converting it to audio with another subprocess running a Fluidsynth terminal command.

Replace the variables at the top of the file with paths to where you want the MIDIs to be generated, where you saved the Famicom soundfont, and where you saved the NES RNN model that you downloaded.

Open a Python shell and run the following Python code to test it out:

import music_generator
music_generator.generate_nes_music('test', 'test.wav')

This should generate a sound file called test.wav in the directory where you ran the code. Open it to see what it sounds like! If you received an error, you might need to install a different version of the TensorFlow module.

Adding music generation to your Twilio phone calls

Now that we have code to generate music and code to handle phone calls, we need to bring the two together.

Open up app.py and replace the contents with the following code, where the changes are highlighted:

import os

from flask import Flask, request, send_from_directory
from redis import Redis
from rq import Queue
from twilio.twiml.voice_response import VoiceResponse

import music_generator


# Folder where the generated music will be.
UPLOAD_FOLDER = '/path/to/uploads'

# Open a Redis connection on the default port
q = Queue(connection=Redis())
app = Flask(__name__)


@app.route('/call', methods=['POST'])
def call():
    call_sid = request.form['CallSid']

    # The file we want the final .wav file to be saved to.
    output_file = '{}/{}.wav'.format(UPLOAD_FOLDER, call_sid)

    # Queue up an async task to generate the music.
    # After this task is completed, the phone call will be updated.
    q.enqueue(music_generator.generate_nes_music, call_sid, output_file)

    resp = VoiceResponse()
    resp.pause()
    resp.say('Please wait while we generate some new Nintendo music for you.')

    # Wait until we asynchronously update the call after music is generated.
    resp.pause(length=100)

    return str(resp)


@app.route('/play_music', methods=['POST'])
def play():
    call_sid = request.form['CallSid']
    output_file = '{}/{}.wav'.format(UPLOAD_FOLDER, call_sid)

    resp = VoiceResponse()
    resp.play('/uploads/{}.wav'.format(call_sid))
    return str(resp)


@app.route('/uploads/<filename>', methods=['GET', 'POST'])
def uploaded_file(filename):
    return send_from_directory(UPLOAD_FOLDER, filename)

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)

In this updated code, we're adding the Redis Queue logic to the phone call route. When a phone call is received, we're queueing up an asynchronous task to execute the code from music_generator.py to create new Nintendo music, and then responding to Twilio's webhook request with the same TwiML as before.

We're also adding another route to handle the logic of playing the new music that was generated. After the music generation is finished, we want to asynchronously update the call that is in progress and tell it to play the song instead of continuing to pause. The filename of the song is the call_sid and it will be in the directory you specify in the UPLOAD_FOLDER variable.

Now all that's left is to update the code in music_generator.py to update the TwiML that is running the phone call once the music is finished being created. Replace its contents with the following code, with the changes highlighted again:

import os, re, subprocess

from twilio.rest import Client


APP_URL = 'https://your-ngrok-url.ngrok.io'
GENERATION_DIR = '/path/to/generated/midi/directory'

soundfont = '/path/to/Famicom.sf2'
rnn_model = '/path/to/nes_rnn.mag'

# Create a Twilio Client object.
# Don't forget! Set environment variables with your Account SID and auth token.
client = Client()


def to_audio(midi_file, output_file):
    subprocess.call(['fluidsynth', '-T', 'wav', '-F',
                    output_file, '-ni', soundfont, midi_file])


def generate_midi(call_sid):
    output_dir = '{}/{}'.format(GENERATION_DIR, call_sid)

    args = ['polyphony_rnn_generate',
            '--bundle_file', rnn_model,
            '--output_dir', output_dir,
            '--num_outputs', '1',
            '--num_steps', '256']
    subprocess.call(args)
    midi_file = os.listdir(output_dir)[0]

    return '{}/{}'.format(output_dir, midi_file)


def generate_nes_music(call_sid, output_file):
    midi_file_path = generate_midi(call_sid)
    to_audio(midi_file_path, output_file)

    # After the music is generated. Update the phone call to play it.
    client.calls(call_sid).update(url='{}/play_music'.format(APP_URL))

We're now making a call to the Twilio REST API to update the phone call that kicked off the task of generating music. After the generated MIDI is converted to audio, this line of code tells Twilio to redirect the TwiML flow of the call to a new URL. Don't forget to replace your ngrok URL in the code for the APP_URL variable, and to make sure the others have the right values as well!

You'll also need to set the environment variables TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN with the credentials from your Twilio Console. The Twilio Client() object will automatically check these values.

We're almost ready, but before testing the app out, you're going to need to kick off an RQ Worker process. Open another terminal window and run following command:

rqworker

If you receive any errors, make sure the correct virtual environment is activated and that you also have a Redis server running (src/redis-server from the directory where Redis is installed).

With all of that done, run python app.py again to kick off your Flask app, and try calling to see if you hear any Nintendo music. It might take a bit of time to generate the music depending on how good your computer is, but that's why we are doing it asynchronously!

Ready for more adventure?

Now that you have a Twilio phone number that plays Nintendo music generated from my model, you can try training your own! If you want a larger set of classic Nintendo tunes to train on, check out the NESMDB project. Or you can have fun and try a whole different set of data entirely. Whatever you decide to do, I hope you have as much fun with this as I have.

I can't wait to see (and hear) what you build. Feel free to reach out for any questions or to show off any cool artificial creativity related projects you work on or find out about: