How to Handle Incoming WhatsApp Audio Messages in Go

January 20, 2025
Written by
Popoola Temitope
Contributor
Opinions expressed by Twilio contributors are their own
Reviewed by

How to Handle Incoming Twilio WhatsApp Audio Messages in Go

When building social media customer support systems, automated transcription services, or voice-driven data collection platforms, users can effortlessly communicate with your application through voice recordings, eliminating the need to type their messages.

In this tutorial, you'll learn how to handle incoming WhatsApp audio messages and transcribe them into text in a Go application using Twilio and AssemblyAI.

Prerequisites

To follow along with this tutorial, you’ll need the following:

  • Go 1.22 or above
  • A Twilio account (free or paid); if you don't have one yet, click here to create a new account
  • An AssemblyAI account
  • Ngrok installed on your computer, and an ngrok account
  • Your preferred text editor or IDE
  • Prior experience with developing in Go would be ideal but is not required

Create a new Go project

To get started, let’s create a new Go project. Open your terminal, navigate to the directory where you want to create the project, and run the commands below.

mkdir twilio-whatsapp-voice
cd twilio-whatsapp-voice
go mod init twilio-whatsapp-voice

After running the commands above, open the project in your preferred code editor or IDE.

Install the required dependencies

Install the Twilio Go Helper Library

To make it significantly easier for the application to interact with the Twilio WhatsApp API, install the Twilio Go Helper Library using the command below.

go get github.com/twilio/twilio-go

Install the AssemblyAI Go SDK

Now, to translate incoming WhatsApp audio messages to text in our application, we will use the AssemblyAI, as it transcribes speech to text. To simplify interacting with AssemblyAI, we'll use their Go SDK.

Run the command below to Install it.

go get github.com/AssemblyAI/assemblyai-go-sdk

Store your credentials as environment variables

Now, we'll store the API credentials in the .env file so that they'll be accessible as environment variables. To do this, create a .env file in your project folder and add the following variable:

TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_WHATSAPP_NUMBER=whatsapp:+<your_twilio_whatsapp_number>
ASSEMBLY_AI_API_KEY=your_assembly_ai_api_key
BASE_URL=<ngrok-forwarding-URL>

Next, let's install the Godotenv package to load the environment variables into the application using the command below.

go get github.com/joho/godotenv

Retrieve the required credentials

Retrieve your Twilio credentials

To retrieve your Twilio API credentials, log in to your Twilio Console dashboard. You will find your Twilio Account SID and Auth Token under the Account Info section, as shown in the screenshot below.

Twilio console displaying account SID, auth token, phone numbers, and helpful links.

Retrieve your AssemblyAI API key

To obtain your AssemblyAI API key, log in to your AssemblyAI dashboard. You will find your API key as shown in the screenshot below.

Screenshot of AssemblyAI website showing Python code and an API key highlighted in red at the bottom right.

Create the application logic

Now, let’s create the application's core logic for processing incoming WhatsApp audio messages. Specifically, we need to set up a webhook URL to handle these messages from Twilio. In the project's root directory, create a file named main.go and add the following code to it.

package main

import (
	"context"
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
	"path/filepath"
	aai "github.com/AssemblyAI/assemblyai-go-sdk"
	"github.com/joho/godotenv"
	"github.com/twilio/twilio-go"
	openapi "github.com/twilio/twilio-go/rest/api/v2010"
)

var (
	accountSid   string
	authToken    string
	apiKey       string
	baseURL      string
	twilioWhatsAppFrom string
)

func init() {
	if err := godotenv.Load(); err != nil {
		log.Fatalf("Error loading .env file: %v", err)
	}

	accountSid = os.Getenv("TWILIO_ACCOUNT_SID")
	authToken = os.Getenv("TWILIO_AUTH_TOKEN")
	apiKey = os.Getenv("ASSEMBLY_AI_API_KEY")
	baseURL = os.Getenv("BASE_URL")
	twilioWhatsAppFrom = os.Getenv("TWILIO_WHATSAPP_NUMBER")
}

func downloadFile(URL, fileName string) error {
	uploadDir := "uploads"
	if err := os.MkdirAll(uploadDir, os.ModePerm); err != nil {
		return fmt.Errorf("failed to create directory: %w", err)
	}
	req, err := http.NewRequest("GET", URL, nil)
	if err != nil {
		return fmt.Errorf("failed to create request: %w", err)
	}

	req.SetBasicAuth(accountSid, authToken)
	client := &http.Client{}
	response, err := client.Do(req)
	if err != nil {
		return fmt.Errorf("failed to download file: %w", err)
	}
	defer response.Body.Close()

	filePath := filepath.Join(uploadDir, fileName)
	out, err := os.Create(filePath)
	if err != nil {
		return fmt.Errorf("failed to create file: %w", err)
	}
	defer out.Close()
	if _, err = io.Copy(out, response.Body); err != nil {
		return fmt.Errorf("failed to write file: %w", err)
	}
	
	fmt.Printf("File successfully downloaded and saved to: %s\n", filePath)
	return nil
}

func transcribeAudio(filePath string) (string, error) {
	client := aai.NewClient(apiKey)
	ctx := context.Background()
	audioURL := fmt.Sprintf("%s/uploads/received_audio.ogg", baseURL)
	transcript, err := client.Transcripts.TranscribeFromURL(ctx, audioURL, nil)
	if err != nil {
		return "", fmt.Errorf("failed to transcribe audio: %w", err)
	}
	return *transcript.Text, nil
}

func main() {
	client := twilio.NewRestClientWithParams(twilio.ClientParams{
		Username: accountSid,
		Password: authToken,
	})

	http.Handle("/uploads/", http.StripPrefix("/uploads/", http.FileServer(http.Dir("uploads"))))
	http.HandleFunc("/webhook", func(w http.ResponseWriter, r *http.Request) {
		err := r.ParseForm()
		if err != nil {
			log.Fatal("Error parsing form: ", err)
		}

		from := r.FormValue("From")
		body := r.FormValue("Body")
		numMedia := r.FormValue("NumMedia")
		fmt.Printf("Received message from %s: %s\n", from, body)

		if numMedia != "0" {
			mediaURL := r.FormValue("MediaUrl0")
			mediaType := r.FormValue("MediaContentType0")
			fmt.Printf("Media URL: %s\n", mediaURL)
			if mediaType == "audio/ogg" || mediaType == "audio/mpeg" || mediaType == "audio/wav" {
				fileName := "received_audio"
				if mediaType == "audio/ogg" {
					fileName += ".ogg"
				} else if mediaType == "audio/mpeg" {
					fileName += ".mp3"
				} else if mediaType == "audio/wav" {
					fileName += ".wav"
				}
				filePath := filepath.Join("uploads", fileName)
				err := downloadFile(mediaURL, fileName)
				if err != nil {
					log.Fatalf("Error downloading audio file: %v\n", err)
				}
				fmt.Printf("Audio file downloaded: %s\n", fileName)

				transcription, err := transcribeAudio(filePath)
				if err != nil {
					log.Fatalf("Error transcribing audio file: %v\n", err)
				}
				fmt.Printf("Transcription: %s\n", transcription)
				params := &openapi.CreateMessageParams{}
				params.SetTo(from)
				params.SetFrom(twilioWhatsAppFrom)
				params.SetBody(fmt.Sprintf("Transcription: %s", transcription))
				_, err = client.Api.CreateMessage(params)
				if err != nil {
					log.Fatalf("Error sending transcription: %v\n", err)
				}
			} else {
				fmt.Fprintf(w, "Unsupported media type: %s", mediaType)
			}
		} else {
			fmt.Fprintf(w, "No media attached to the message")
		}
	})

	port := os.Getenv("PORT")
	if port == "" {
		port = "8080"
	}
	log.Printf("Server started at http://localhost:%s\n", port)
	log.Fatal(http.ListenAndServe(":"+port, nil))
}

Here is the breakdown of the above code:

  • The init() function loads the environment variables and sets up the global credential variables
  • The downloadFile() function downloads the audio file from Twilio endpoint
  • The transcribeAudio() function sends the downloaded audio file to AssemblyAI for transcription
  • The main() function sets up an HTTP server that handles incoming requests

Connect the app to the Twilio WhatsApp Sandbox

Let’s configure our Twilio sandbox to accept and send incoming WhatsApp messages to our application. To do this, go to your Twilio Console dashboard, and navigate to Explore products > Messaging > Try it out > Send a WhatsApp message, as shown in the screenshot below.

Twilio console screen showing WhatsApp Sandbox connection process with QR code and sandbox settings.

On the Try WhatsApp page, copy your Twilio WhatsApp number and send the displayed join message to that number, as shown in the screenshot below.

Twilio Sandbox confirmation message for WhatsApp integration displayed in a chat window.

Next, open the .env file and replace the placeholder <twilio–whatsapp-number> with your actual Twilio WhatsApp number.

Start the application

Let's start the application development server. You can do this by running the command below.

go run main.go

You will see the application running on localhost listening on port 8080.

Set up a Twilio WhatsApp Webhook

When Twilio receives an incoming message, it forwards the message details to your application's webhook URL. For the webhook URL to work, you have to make the application accessible over the internet using ngrok. To do this, open another terminal and run the following command:

ngrok http 8080

The command above will generate a F orwarding URL. Copy it as shown in the terminal below.

Terminal displaying Ngrok session status with a live HTTP URL and connection details.

Now, on the Twilio Try WhatsApp page, click on the Sandbox Settings option and configure the sandbox settings as follows.

  • When a message comes in: add the generated ngrok forwarding URL and append /webhook
  • Method: POST

After setting the configuration, click the Save button, as shown in the screenshot below, to save your changes.

Screenshot of Twilio console showing WhatsApp Sandbox Configuration with fields for URL and methods, and Save button highlighted.

Next, in the .env file, replace the placeholder <ngrok-forwarding-URL> with your actual value.

Test the application

To test the application, open your WhatsApp app and send a voice note to your Twilio number. You should receive a reply with your voice note translated to text, as shown in the screenshot below.

A WhatsApp chat with Twilio Sandbox showing a voice message and text responses.

That is how to handle incoming Twilio WhatsApp audio messages in Go

In this tutorial, you learned how to handle incoming WhatsApp audio messages in Go using the Twilio WhatsApp API and AssemblyAI. The application converts WhatsApp audio messages into readable text and responds to users with the transcriptions.

Popoola Temitope is a mobile developer and a technical writer who loves writing about frontend technologies. He can be reached on LinkedIn.