How Twilio SendGrid and Messaging reliably delivered over 55B messages during Cyber Week 2022
Time to read: 7 minutes
With Black Friday/Cyber Monday in the rearview mirror, we can all release a collective breath. After all, you spent months (if not the last year) preparing your messaging and email platforms for this one week. As we reflect on the year, we wanted to give a detailed look into how we architected our messaging and email platforms to handle record amounts of volume.
The biggest week in business communications
When it comes to business communications during Cyber Week, email is the longstanding cornerstone with channels like SMS, MMS, and WhatsApp continuing to rise in popularity. Businesses delivering seamless customer engagement experiences turn to a combination of email and messaging where the use case and consumer preference dictates the channel. Whether it's logins, notifications, promotions, password resets, or policy updates, consumers today expect to be met on the right channel at the right time. That's why it's critical to have a scalable and reliable multichannel communications strategy.
To prepare for Cyber Week, the Messaging and Email Product teams spent months architecting a scalable and reliable platform to enable our customers to engage with consumers on the channels they prefer. Reliably delivering over 50 billion emails and almost 4 billion messages over Cyber Week takes a full year of planning. As we look into how our engineers and product teams prepare for this monumental week, we review what metrics are most important during this time of year and how throughput management and deliverability should be top of mind for next year's Cyber Week.
Preparing for SMS & MMS deliverability at scale
- Forecasting & Best Practices (6 months out)
- Identifying top senders - We identify our highest-volume senders (100-200 customers) who send the most messages during the holiday season, begin analyzing their inbound and outbound traffic, and review how they performed during the previous year's holiday. This allows us to paint a data-driven picture of the required network capacity in the current year.
- Forecasting total volume - After we have identified our top senders, we take into account our ongoing traffic and how it will coincide with Cyber Week traffic. Even though Cyber Week focuses on notifications and promotional SMS, you can’t forget about other traffic like customer care and one-time passcodes.
- Determining product fit - Once we have a detailed and data driven understanding of our top customers, their sending patterns, and use cases, we assess their eligibility for advanced throughput products in beta. These new throughput products provide even better deliverability and latency performance by allowing for more granular control of how phone numbers and the overall messaging solution handles throughput and delivery of messages.
- Partnering for Scale (3 months out)
- Engagement with top customers - After months of data collection and forecasting, we engage with our customers on throughput allocation (scroll down to see how we approach throughput and how important it is for success).
- Partnering with the Industry - We actively engage with our Carrier partners to report our forecasts. Just like us, the Carriers are focused on making sure that their customers are receiving the messages they opt-in to and in a timely manner. Carriers use our forecasts to justify additional investment in their messaging infrastructure which unlocks more throughput for Twilio, you, and your customers
- COMPLIANCE - Over the last year, we have seen a shift in the US messaging ecosystem that has placed more importance around messaging compliance. With A2P 10DLC, Short Codes, and Toll-Free numbers all requiring registration and verification to send high volumes of messages, we partnered with our customers to ensure that they were utilizing the best sender types and were properly registered and verified. This meant engaging with them on best practices around opt-in / opt-out, providing tools to make SMS campaign registrations easier, and monitoring verified versus unverified traffic on our network.
- Testing & Refining (1 month out)
- Infrastructure Scaling - Twilio Engineering will assess the forecasts done earlier in the year and inputs gathered from customers on planned holiday messaging. The team will then scale up Twilio’s Messaging infrastructure to ensure the platform is able to safely process the increased holiday traffic.
- Black Friday/Cyber Monday Runbook Creation - We define our team’s roles and responsibilities to make sure that every possible scenario is defined in our On-call Runbooks which also walks through common issues that may occur during Cyber Week, and how on-call teams should rectify to maximize message deliverability.
- Heightened Awareness Period (3 weeks out)
- Twilio Messaging Gameday - At Twilio we run a Messaging “Gameday” which simulates common scenarios detailed in the Black Friday/Cyber Monday Runbooks. We review Runbook remediation steps and ensure that the steps taken are appropriate.
- Final testing and HAP begins - Engineering teams perform final scaling of systems, as appropriate, and Twilio enters a Code Freeze, which is also known as a Heightened Awareness Period (HAP)
- Cyber Week
- Active monitoring - We maintain a suite of monitoring tools that allows our on-call team to proactively respond and mitigate messaging incidents before they affect large amounts of Twilio traffic. While total messaging volume is a fun metric to see, here is a list of key metrics we look at to paint the full picture of a successful Cyber Week:
- Throughput Utilization (MPS) - We compare our MPS forecasts vs the Peak Hour MPS we see during the day to understand how close we are to our forecast.
- Queue Latencies - Queue latency is the amount of time between a message enqueued into Twilio and leaving the Twilio Platform and we look at this metric across three cohorts of messages, the 99th (P99), 95th (P95), and 90th (P90) percentile of messages.
- Errors - We also look at Error rates and analyze the amount of errors, how they were responded to, and how they impacted our network in order to gain a better understanding around how we performed.
- Active monitoring - We maintain a suite of monitoring tools that allows our on-call team to proactively respond and mitigate messaging incidents before they affect large amounts of Twilio traffic. While total messaging volume is a fun metric to see, here is a list of key metrics we look at to paint the full picture of a successful Cyber Week:
Throughput and Infrastructure for SMS & MMS
A critical part of utilizing our Programmable Messaging API for sending SMS or MMS at scale is how a business is managing their throughput. There are common misconceptions about how throughput and queuing plays a role in your messaging solution. Until recently, the majority of companies utilized Sender Based Throughput where the capacity is dictated by the available ecosystem capacity and the senders (phone numbers).
To send billions of SMS and MMS messages in a single weekend we needed to move on from this legacy approach to throughput management. That is how Twilio’s Account-Based Throughput was born. Account-Based Throughput configures a single messaging per second (MPS) limit for each sender type (Short code, Toll free, A2P 10DLC) per channel instead of individual numbers. Ultimately this product provides two unique value props for our customers scaling their messaging solutions:
- It provides Twilio customers with the optimal level of throughput based on their sending habits, use cases, and phone number inventory.
- It reduces the over provisioning of phone numbers while creating better protections for our customers' sending habits and Twilio’s platform downstream capacity. This enables us to maintain a healthy network resulting in more efficient sending for our customers.
Preparing for email deliverability at scale
Just like the messaging team, the SendGrid Email team sees its highest volume of emails sent during this time of the year. This year we reached our highest volume in a single day, 8.96 Billion. To achieve this volume, we spent months preparing our infrastructure for scale. Here’s how we did it.
- Forecasting & Best Practices - Our Product and Engineering teams spend the majority of the year preparing for these volumes by running data-driven forecasts and engaging with our customers on best practices. Being the most stressful time of the year for retail customers, our network has to be able to withstand incredible amounts of pressure while still delivering time-sensitive messages. We work with individual customers on forecasted email volumes and send times, allowing us to actively predict the health of the SendGrid network.
- Deliverability & Send Time Optimization - During Cyber Week, emails are especially time sensitive and focusing on deliverability and throughput of your emails is equally as important (if not more) than total emails sent. To achieve high deliverability for our customers we continuously collaborate with them throughout the year to align on strategies such as sending at off-peak intervals within the hour to stay top of mind and top of inbox. You can see our peak hours throughout a single day during Cyber Week. This leads to healthier deliverability and a more effective email program.
3. Testing and Repetition - We work with our customers year-round to optimize, through trial and error, throughout the year which gives us time to reassess how the recipients and inbox providers respond. During Cyber Week we are focused on more than total volume; here are a few metrics we have our eye on throughout the hectic week.
- Median end-to-end time - This metric shows how quickly we can get an email through our platform and into an inbox, and during Cyber Week our median end-to-end time was 2.2 seconds.
- Open rates by email provider - It's a good idea to keep track of open and click rates by mailbox provider to make adjustments when you see abnormalities in your data. When you see underperformance with mailbox providers, quick adjustments (often sending less mail to that provider) is almost always the right thing to do
- Performance of subject lines - Our customers' subject lines performance can directly influence open rates and click through rates which is the ultimate goal of an email campaign. We analyzed our data this year to give you a picture of which subject lines performed the best.
- 96% of our total emails sent had 1-15 words
- 95% of our total emails sent had 12-84 characters
- 2-word subject lines had the best open rate of 11%
- 75-character subject lines had the best open rate of 11.46%
How to better architect your communication solution for 2023
We are already getting started preparing for 2023. If you are too, here are best practices for a successful sending season:
Deep dive into throughput - whether you are focusing on Messaging, Email, or both, focusing efforts around how to architect your platform for optimized throughput and deliverability is key. To see how Postscript, an exclusive Twilio customer and SMS Marketing giant, thinks about throughput you can read an interview with their CEO on prepping for Cyber week
Get compliant now - In the US messaging ecosystem compliance is the first step towards building a fully verified and scalable messaging platform. You can read our US SMS Compliance Guide to make sure you are utilizing best practices when sending messages to your customers.
Better sending habits and analyzing data - Conduct a retrospective on this past year with your product and engineering teams and make a habit out of reviewing the performance of your campaigns by time/day and analyze the learnings. This will provide insight into how you schedule your emails and messages during Cyber Week to optimize your ROI.
At Twilio, we prepare all year to make sure that our platforms are ready for your volume during the busiest retail weekend of the year. Our ultimate goal is for you to be able to send your SMS/MMS and emails with confidence. We recommend starting your planning early this year and if you want to see how you can partner with Twilio, jump in and start building.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.