Chat with any PDF Document using Twilio, OpenAI, and Langchain

Time to read: 13 minutes

November 27, 2023

Written by

Carlos Mucuho

Contributor

Reviewed by

Dhruv Patel

Twilion

This post was written by a third-party contributor as a part of our Twilio Voices program. It is no longer actively maintained. Please be aware that some information may be outdated.

In this tutorial, you will learn how to build a WhatsApp chatbot application that will allow you to upload a PDF document and retrieve information from it. You are going to use a PDF document containing a few waffle recipes, but what you will learn here can be used with any PDF document.

To build this application, you will use The Twilio Programmable Messaging API for responding to WhatsApp messages. You will combine a framework named LangChain and the OpenAI API to process PDFs, create OpenAI embeddings, store the embeddings in a vector store, and select models for answering user queries related to the information contained in the document embeddings.

Embeddings are numeric representations of text, capturing semantic meaning and aiding in various natural language processing tasks. A Vector store is a storage and retrieval structure for vector embeddings, it is commonly used in tasks such as information retrieval and similarity search, enhancing the contextual understanding of textual data by machines.

The Twilio Programmable Messaging API is a service that allows developers to programmatically send and receive SMS, MMS, and WhatsApp messages from their applications.

LangChain is a language model-driven framework for context-aware applications that leverage language models for reasoning and decision-making. It connects models to context sources and facilitates reasoning for dynamic responses and actions.

The OpenAI API is a service that provides access to OpenAI's language models and artificial intelligence capabilities for natural language processing tasks.

By the end of this tutorial, you will have a chatbot that allows you to chat with any PDF document :

Tutorial Requirements:

To follow this tutorial, you will need the following components:

Node.js (v18.18.1+) and npm installed.
Ngrok installed and the auth token set.
A free Twilio account.
A free Ngrok account.
An OpenAI account.
This PDF file stored in a device that has access to a WhatsApp client. This document was originally downloaded from the Breville.com website and contains 4 waffle recipes. (Click on the Download raw file button to download the file)

Setting up the environment

In this section, you will create the project directory, initialize a Node.js application, and install the required packages.

Open a terminal window and navigate to a suitable location for your project. Run the following commands to create the project directory and navigate into it:

mkdir chat-with-document
cd chat-with-document

Use the following command to create a directory named documents, where the chatbot will store the PDF document that the user wants to retrieve information from:

mkdir documents

Run the following command to create a new Node.js project:

npm init -y

Now, use the following command to install the packages needed to build this application:

npm install twilio express body-parser dotenv node-fetch langchain pdf-parse hnswlib-node

With the command above you installed the following packages:

twilio: is a package that allows you to interact with the Twilio API. It will be used to send WhatsApp messages to the user.
express: is a minimal and flexible Node.js back end web application framework that simplifies the creation of web applications and APIs. It will be used to serve the Twilio WhatsApp chatbot.
body-parser: is an express body parsing middleware. It will be used to parse the URL-encoded request bodies sent to the express application.
dotenv: is a Node.js package that allows you to load environment variables from a .env file into process.env. It will be used to retrieve the Twilio and Open AI APIs credentials that you will soon store in a .env file.
node-fetch: is a Node.js library for making HTTP requests to external resources. It will be used to download the PDF documents sent to the chatbot.
langchain: is a LangChain is a framework for context-aware applications that use language models for reasoning and dynamic responses. It will allow an AI model to retrieve information from a document.
pdf-parse is a Node.js library for extracting text content and metadata from PDF files. It will be used under the hood by a LangChain module to retrieve the text from the document containing the recipes.
hnswlib-node is a package that provides Node.js bindings for Hnswlib. HNSWLib is an in-memory vector store that can be saved to a file. It will be used to store the document information in a format suited for AI models.

Collecting and storing your credentials

In this section, you will collect and store your Twilio and OpenAI credentials that will allow you to interact with the Twilio and OpenAI APIs.

Twilio credentials

Open a new browser tab and log in to your Twilio Console. Once you are on your console copy the Account SID and Auth Token, create a new file named .env in your project’s root directory, and store these credentials in it:

TWILIO_ACCOUNT_SID=< your twilio account SID>
TWILIO_AUTH_TOKEN=< your twilio account auth token>

OpenAI credentials

Open a new browser tab and log in to your OpenAI account and when prompted to select a page click on the button that says API. Once you are logged in, click on the button located in the top right corner with the text Personal or Business (depending on your account type) to open a dropdown menu, and then click the View API Keys button in this menu to navigate to the API page.

On the API keys page, click the Create new Secret Key button to generate a new API Key.

Once the API key is generated, copy it and store it on the .env file as the value for OPENAI_API_KEY:

TWILIO_ACCOUNT_SID=< your Twilio account SID>
TWILIO_AUTH_TOKEN=< your Twilio account auth token>
OPENAI_API_KEY=<your OpenAI API API key>

Creating the chatbot

In this section, you'll create a WhatsApp chatbot application that can handle user messages, provide responses, and store incoming documents in the documents directory.

In the project’s root directory create a file named server.js and add the following code to it:

const express = require('express');
const bodyParser = require('body-parser');
const twilio = require('twilio');
const fs = require('fs');
require('dotenv').config();

const app = express();
const port = 3000;
app.use(express.json());
app.use(bodyParser.urlencoded({ extended: false }));

The code begins by importing the express, body-parser, twilio, fs, and dotenv packages needed to create and serve a Twilio WhatsApp chatbot capable of receiving and storing documents.

After importing the packages the code sets an Express server, sets up the port to 3000, and configures the json and body-parser middlewares to parse JSON and URL-encoded request bodies.

Add the following code to the bottom of the server.js file:

const accountSid = process.env.TWILIO_ACCOUNT_SID;
const authToken = process.env.TWILIO_AUTH_TOKEN;
const twilioClient = twilio(accountSid, authToken);

Here, the Twilio API credentials (TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN) are retrieved from the environment variables and used to create a new Twilio client instance which is then stored in a constant named twilioClient.

Add the following code below the twilioClient constant:

function sendMessage(message, from, to) {
    twilioClient.messages
        .create({ body: message, from: from, to: to })
        .then((msg) => console.log(msg.sid));
};

The code above defines a JavaScript function named sendMessage that is responsible for sending a WhatsApp message using the Twilio WhatsApp API.

The function takes as parameters the message that should be sent, and the recipient’s and the sender’s phone number then uses the twilioClient.messages.create() method alongside these parameters to create and send the WhatsApp message. The message SID will be printed to the console if the message is successfully sent.

Add the following code below the sendMessage:

async function saveDocument(mediaURL) {
    try {
        const fetch = (await import('node-fetch')).default;
        const filepath = './documents/document.pdf';
        return new Promise(async (resolve, reject) => {
            await fetch(mediaURL)
                .then((res) => {
                    res.body.pipe(fs.createWriteStream(filepath))
                    res.body.on("end", () => resolve(true));
                }).catch((error) => {
                    console.error(error)
                    resolve(false)
                });
        })
    } catch (error) {
        console.error(error);
        return false;
    }
}

Here, the code defines an asynchronous function named saveDocument that takes in a parameter of a media file URL. This function is responsible for downloading the PDF file from the URL parameter and saving it as document.pdf in the documents directory.

The code begins by dynamically importing the node-fetch module and setting the path where the document will be saved. Next, the code returns a promise where the fetch function is used to download the PDF file, and the fs.createWriteStream(filepath) method is used to save the file in the specified path. If the file is downloaded and stored successfully the function returns true. However, if an error occurs the function returns false.

Add the following code below the saveDocument() function:

async function handleIncomingMessage(req) {
    const { Body } = req.body;
    let message = ""

    if (Body.toLowerCase().includes("/start")) {
        message = "Please send me the PDF document that you would like to chat with"
        return message
    } else {
        const question = Body;
        message = `Your question is : ${question}`
        return message
    }
}

The code above defines an asynchronous function named handleIncomingMessage() that takes a request object containing the incoming message as a parameter. This function is responsible for handling incoming messages and formulating responses based on the content of those messages.

First, the code retrieves the incoming message body, stores it in a constant named Body, and then defines a variable named message which gets sent back to the user.

Next, the code checks If the message contains the string "/start" and if that is the case it prompts the user to send the PDF document that the user wants to chat with in the response message and returns the response message.

If the code does not contain the string "/start", the code assumes that the incoming message contains a question and acknowledges the question by repeating it in the response message and returns the response message.

Add the following code below the handleIncomingMessage() :

app.post('/incomingMessage', async (req, res) => {
    const { To, Body, From } = req.body;
    let message = ""

    if (req.body['MediaUrl0'] === undefined) {
        message = await handleIncomingMessage(req);
        sendMessage(message, To, From)
        return res.status(200)
    } else {

    }
});

This code defines an Express.js route that handles incoming HTTP POST requests at the path '/incomingMessage'. This route is responsible for receiving the WhatsApp messages sent by the user, distinguishing between text messages and document uploads, and generating appropriate response messages.

The code begins by retrieving the message recipient, body, and sender from the request body, storing them in the To, Body, and From variables respectively, and then defines a variable named message where it will store the message that will be sent back to the user.

Next, the code checks if the MediaUrl0 property in the request body is undefined, suggesting that the message does not contain a document. If that is the case, the code calls the handleIncomingMessage function, and passes the request object as an argument, to generate a response message. This response message is assigned to the message variable.

Lastly, the code calls the sendMessage function to send the response message back to the user and returns an HTTP status code of 200, indicating that the request was successfully processed.

Take note of how when calling the sendMessage function, the To and From variables switch places since the user who sent the message is now the receiver and the chatbot the sender.

Add the following code inside the else statement:

app.post('/incomingMessage', async (req, res) => {
    if (req.body['MediaUrl0'] === undefined) {
       …
    } else {
        message = "Please wait, it can take several seconds to process this document";
        sendMessage(message, To, From);

        const wasDocumentSaved = await saveDocument(req.body['MediaUrl0']);
        if (!wasDocumentSaved) {
            message = "Failed to save document";
            sendMessage(message, To, From);
            return res.status(200);
        }

        message = "Document saved";
        sendMessage(message, To, From);
        return res.status(200);
    }
});

The code inside the else statement will run if 'MediaUrl0' is defined, indicating that a document is being uploaded. The code begins assigning a Please wait message to the message variable, indicating that the chatbot is processing the document.

Next, the sendMessage() function is called to send the "Please wait" message to the user who uploaded the document. The saveDocument() function is called to download and save the document from the URL provided in 'MediaUrl0'. The boolean value returned is stored in a constant named wasDocumentSaved, indicating whether the document was successfully saved.

If the document was not saved an error message is assigned to the message variable, and the sendMessage() function is called to send the error message. If the document was successfully saved, a Document saved message is assigned to the message variable, and the sendMessage() function is called to send this success message.

After handling the incoming message, the route function ends by returning an HTTP status code of 200, indicating that the request was successfully processed.

Add the following code to the bottom of your server.js file:

app.listen(port, () => {
    console.log(`Express server running on port ${port}`);
});

Here, the Express server is started using the app.listen() method on the port 3000. When the server starts a message stating that the server is running is printed to the console.

Running the chatbot and making it publicly accessible

In this section, you will run the express application to serve the chatbot, use Ngrok to make the application publicly accessible, and configure the Twilio WhatsApp settings in the Twilio console.

Go back to your terminal and run the following command to start the application:

node server.js

Open another tab in the terminal and run the following command to expose the application:

ngrok http 3000

Copy the https Forwarding URL provided by Ngrok. Go back to your Twilio Console main page, Click on the Develop tab button, click on Messaging, click on Try it out, then click on Send a WhatsApp message to navigate to the WhatsApp Sandbox page.

Navigating to the Twilio Whatsapp sandbox page

Once you are on the Sandbox page, scroll down and follow the instructions to connect to the Twilio sandbox. The instructions will ask you to send a specific message to a Twilio Sandbox WhatsApp Number.

After following the connect instructions, scroll back up and click on the button with the text Sandbox settings to navigate to the WhatsApp Sandbox settings page. Once on the Sandbox settings, paste the Ngrok https URL in the “When a message comes in” field followed by /incomingMessage, set the method to POST, click on the Save button, and now your WhatsApp bot should be able to receive messages. Ensure your URL looks like the one below:

Configuring the Twilio Whatsapp sandbox settings

Open a WhatsApp client, send a message with any text, and the chatbot will send a reply with the text you sent. Send a message with the text /start and the chatbot will prompt you to send a PDF document. Send the PDF document containing the waffle recipes and the chatbot will send a reply stating that the document was saved.

chatbot application with reply and document upload functionality demo

Before moving to the next section, go back to the terminal tab running the application and stop the application.

Generating the embeddings

In this section, you will load the document that you wish the chatbot to understand, generate embeddings for the PDF document, and store the embeddings in a vector store.

In the project’s root directory create a file named embeddingsGenerator.js and add the following code to it:

const { PDFLoader } = require("langchain/document_loaders/fs/pdf");
const { OpenAIEmbeddings } = require("langchain/embeddings/openai");
const { HNSWLib } = require("langchain/vectorstores/hnswlib");
require('dotenv').config();

const OPENAI_API_KEY = process.env.OPENAI_API_KEY;

The code starts by importing the PDFLoader, OpenAIEmbeddings, and the HNSWLib modules from the langchain library. Additionally, the code also imports the dotenv library.

The PDFLoader module will be used to load the PDF document that you want to chat with. The OpenAIEmbeddings module will be used to generate embeddings compatible with OpenAI models. The HNSWLib module will be used alongside the hnswlib-node library to store the embeddings.

The code then stores the OpenAI API key in a constant named OPENAI_API_KEY.

Add the following code below the OPENAI_API_KEY constant:

async function generateAndStoreEmbeddings() {
    try {
        const loader = new PDFLoader("./documents/document.pdf");
        const docs = await loader.load();

        const vectorStore = await HNSWLib.fromDocuments(
            docs,
            new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY }),
        );
        vectorStore.save("embeddings");
        console.log("embeddings created");
        return true;
    } catch (error) {
        console.error(error);
        return false;
    }
}

Here, an async function named generateAndStoreEmbeddings enclosed within a try-catch block was defined. This function is responsible for generating and storing document embeddings.

Inside the function, the code begins by creating a new instance of the PDFLoader class, passing the document.pdf file path as an argument.

It then uses the load() method on the PDFLoader instance to load the specified PDF document. After loading the document, it creates a vector store using the HNSWLib.fromDocuments method. This method creates a vector representation of the document, using the HNSWLib vector store and the OpenAIEmbeddings. The vector store is then saved in a folder named "embeddings" in your project root’s directory for future use.

Lastly, if this entire process is successful a message stating that the embeddings were created is printed to the console and the function returns true. However, If an error occurs the error is printed to the console and the function returns false.

Add the following code below the generateAndStoreEmbeddings() function:

generateAndStoreEmbeddings();
module.exports = { generateAndStoreEmbeddings };

The first line of code above calls the generateAndStoreEmbeddings() function and the second exports this function.

Go back to your terminal and use the following command to run this file:

node embeddingsGenerator.js

After executing the command above a folder named embeddings containing the document’s embeddings will be created in your project’s root directory.

Before moving to the next section, comment out the generateAndStoreEmbeddings() function call:

// generateAndStoreEmbeddings();

Retrieving information from the document

In this section, you will use the document’s embeddings alongside an OpenAI model to retrieve information.

In the project’s root directory create a file named inference.js and add the following code to it:

const { OpenAI } = require("langchain/llms/openai");
const { HNSWLib } = require("langchain/vectorstores/hnswlib");
const { OpenAIEmbeddings } = require("langchain/embeddings/openai");
const { RetrievalQAChain } = require("langchain/chains");
require('dotenv').config();

const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const model = new OpenAI({ modelName: "gpt-3.5-turbo" });

The code starts by importing the OpenAI, HNSWLib, OpenAIEmbeddings, and the RetrievalQAChain modules from the langchain library. Additionally, the code also imports the dotenv library. The RetrievalQAChain module It's designed to streamline and simplify the process of building a retrieval-based question-answering system, where answers are retrieved from stored representations of documents or text data.

Next, the code then stores the Open AI API key in a constant named OPENAI_API_KEY. It then creates an instance of the OpenAI model using the OpenAI class from the langchain library. The model is specified with the name gpt-3.5-turbo. This model is designed for natural language processing and generation.

Add the following code below the model constant:

async function ask(question) {
    try {
        const vectorStore = await HNSWLib.load(
            "embeddings",
            new OpenAIEmbeddings({ openAIApiKey: OPENAI_API_KEY }),
        );

        const chain = RetrievalQAChain.fromLLM(model, vectorStore.asRetriever());
        const result = await chain.call({
            query: question,
        });

        console.log(result);
        return result.text;
    } catch (error) {
        console.error(error);
        return "AI model failed to retrieve information";
    }
}

The code defines an async function named ask enclosed within a try-catch block. This function takes a question as an argument and is responsible for performing a question-answering task using the OpenAI model.

Inside this function, the code loads a vector store from the "embeddings" folder. This is the vector store containing the recipe document’s embeddings that were created in the previous section.

Next, it creates a RetrievalQAChain using the OpenAI model and the vector store. This chain is set up to handle the question-answering process. The function then calls the chain.call() method with a question as the query and awaits the result. If this entire process is successful the result is printed to the console and the function returns the value stored in the result's text property. However, if an error occurs the error is printed to the console and the function returns a message stating that the model failed to retrieve the information.

Add the following code below the ask() :

const question = "What is the prep time for each recipe?";
ask(question);
module.exports = { ask };

The first line of code above defines a constant named question that holds a string that will be used to ask the model how long it takes to prepare each recipe. The second line calls the ask() function and passes the question as an argument. The third line exports the ask() function.

Go back to your terminal and use the following command to run this file:

node inference.js

After executing the command above you should see the following output:

{
  text: 'Classic Waffles: 10 minutes\n' +
          'Chocolate Waffles: 15 minutes\n' +
          'Three-Cheese Soufflé Waffles: 15 minutes\n' +
          'Waffles with Poached Rhubarb and Vanilla Custard: 15 minutes'
}

The output above shows that the model is now able to retrieve information from the recipes document.

Before moving to the next section, comment out the question constant and the ask() function call:

// const question = "What is the prep time for each recipe?";
// ask(question);

Chat with document

In this section, you will integrate the embeddings generation and query features created in the previous two sections to the chatbot to allow users to retrieve information from a document.

Open the server.js file and add the following code below the twilioClient constant declaration located around line 14:

const { generateAndStoreEmbeddings } = require('./embeddingsGenerator');
const { ask } = require('./inference');

Here, the code uses destructuring to import the generateAndStoreEmbeddings() and ask() functions from the embeddingsGenerator.js and inference.js files respectively.

Go to the handleIncomingMessage() function located around line 49 and replace the code in the else statement with the following:

async function handleIncomingMessage(req) {
    …
    if (Body.toLowerCase().includes("/start")) {
        …
    } else {
        const question = Body;
        message = await ask(question);
        return message;
    }
}

The highlighted code calls the ask function to use an AI model to retrieve information from a document and store the value returned in the message variable.

Go to the /incomingMessage route handler located around line 63 and replace the last three lines inside the else statement with the following:

app.post('/incomingMessage', async (req, res) => {
    …
    if (req.body['MediaUrl0'] === undefined) {
        …
    } else {
        …
        if (!wasDocumentSaved) {
            message = "Failed to save document";
            sendMessage(message, To, From);
            return res.status(200);
        }

        const wasEmbeddingsGenerated = await generateAndStoreEmbeddings();
        if (!wasEmbeddingsGenerated) {
            message = "Document embeddings were not generated";
            sendMessage(message, To, From);
            return res.status(200);
        }

        message = "Document embeddings were generated and stored, ask anything about the document";
        sendMessage(message, To, From);
        return res.status(200);
    }
});

The code added, begins calling the generateAndStoreEmbeddings() function to generate the document embeddings and store them. The boolean value returned is stored in a variable named wasEmbeddingsGenerated, indicating whether the embeddings were generated and stored.

If the embeddings were not generated and stored an error message is assigned to the message variable, and the sendMessage() function is called to send the error message. If the embeddings were successfully generated and stored, a message stating this is assigned to the message variable, and the sendMessage() function is called to send this success message.

After sending the message, the route function ends by returning an HTTP status code of 200, indicating that the request was successfully processed.

Go back to the terminal and run the following command to start the chatbot application:

node server.js

Return to your Whatsapp client, send a message with the text /start, and the chatbot will prompt you to send a PDF document. Send the PDF document containing the waffle recipes, and the chatbot will send a reply stating that the document embeddings were generated. Send a message containing a question about the PDF document, and the chatbot will send a reply with the desired information

Conclusion

In this tutorial, you learned how to create a WhatsApp chatbot capable of retrieving information from a PDF document containing waffle recipes. You've learned how to leverage the Twilio Programmable Messaging API for message handling, integrate LangChain and the OpenAI API to process PDFs, generate and store document embeddings, and select appropriate models to respond to user queries based on the document's content.

The code for the entire application is available in the following repository https://github.com/CSFM93/twilio-chat-with-document.

Carlos Mucuho is a Mozambican geologist turned developer who enjoys using programming to bring ideas into reality. https://twitter.com/CarlosMucuho

Related Resources

Twilio Docs

From APIs to SDKs to sample apps

API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.

Resource Center

The latest ebooks, industry reports, and webinars

Learn from customer engagement experts to improve your own communication.

Ahoy

Twilio's developer community hub

Best practices, code samples, and inspiration to build communications and digital engagement experiences.

Chat with any PDF Document using Twilio, OpenAI, and Langchain

Related Posts

Related Resources

From APIs to SDKs to sample apps

The latest ebooks, industry reports, and webinars

Twilio's developer community hub