Get notified of new magazine issues using web scraping and SMS with C# .NET
As a Raspberry PI fan, I like to read The MagPi Magazine, which is freely available as PDFs. The problem is I tend to forget to download it manually every month, so I decided to automate the process. If Raspberry Pi is not your thing, you should be able to modify the demo application to work for any periodical publication that offers free downloads.
Prerequisites
You'll need the following things in this tutorial:
- A free Twilio account
- A Twilio Phone Number with SMS/MMS capabilities.
- An OS that supports .NET (Windows/macOS/Linux)
- .NET 6.0 SDK (newer and older versions may work too)
- A code editor or IDE (Recommended: Visual Studio Code with the C# plugin, Visual Studio, or JetBrains Rider)
Project Overview
First, let’s understand what the demo intends to achieve. The components involved and the workflow looks like this:
- The worker service reads a database to get the latest issues it sends notifications for.
- The worker service fetches the website for the magazine and gets the latest issue number. Then, it compares the latest issue number in the database to the latest issue number on the website. If the numbers are equal, it means there is no new issue. If the latest issue number on the website is greater, then there is a new issue. If there is no new issue, the worker service goes to sleep. If there is a new issue, it gets the cover image and the direct link URLs from the magazine’s website.
- The worker service calls Twilio API to send an SMS/MMS message.
- Twilio sends the message to the user.
- The worker service updates its database with the latest issue to avoid duplicate messages.
Project Implementation
Let’s start by creating the worker service by running the following commands:
Create the Data Layer
First, let’s look into the data layer. The only piece of information that needs to be stored is the latest issue number that the application processed.
Create a folder inside your project named Data. Then, create a file LatestMagazineIssue.cs, that contains a model class for your data. Add the following code:
Then, create a new file IMagazineIssueRepository.cs in the Data folder that holds a repository interface to outline the data operations you’re going to use. Add the following code to the file:
The next step is to decide how to store the data. The requirements of this project are very straightforward, so you don’t need a full-fledged database; a simple JSON file will suffice. Go ahead and create a JSON file named db.json under the Data directory. Update its contents as shown below:
Then, create another file named JsonMagazineIssueRepository.cs in the Data folder which will contain the repository implementation for the JSON file named that implements the previous interface. Update the code as shown below:
The JsonMagazineIssueRepository
only needs one parameter: The path to the JSON file. You can encapsulate it in a simple class. Create DatabaseSettings.cs under the Data directory with the following code:
Then update your appsettings.json file so that it looks like this:
Finally, for this stage, update Program.cs as shown below:
From line 7 to 9, is where you register your services with the concrete implementations in the DI container. Then the IMagazineIssueRepository
service is retrieved to get the latest magazine issue and print it to the console.
Line 12 is commented out temporarily to make the implementation/debugging phase easier. As of now, you don’t need to worry about scheduling. That will come later. So, for now, run the application by
And confirm your output looks like this:
Now that you have a working data layer move on to the next section, where you will do some HTML parsing.
HTML Parse the Magazine Page
You need 3 things to get from the magazine website:
- The latest issue number
- The URL of the magazine (PDF or other formats)
- The URL of the cover image (Optional)
Every magazine tracker will work differently but you can combine the requirements above in a single interface so that all the trackers can work in a similar fashion.
Create IMagazineTrackerService.cs for the interface and update its code as shown below:
All your trackers must implement the IMagazineTrackerService
interface.
Now, implement your first tracker by creating a file MagPiTrackerService.cs with the following dummy implementation:
To do the HTML parsing, you will use a library called AngleSharp. It makes the whole process a lot easier, and it can be added to your project via NuGet by running:
Now, take a look at where to find the latest issue number. The easiest way to find the latest issue number is by going to the issues page, which looks like this at the time of this writing:
If you look at the source of the page (Right click and click Show/View Page Source depending on your browser). If you search the phrase “The MagPi issue 121 out now” (replace the number with the one you see on your screen) you should find the relevant area that looks something like this:
This page contains the latest issue number and a URL of the cover image. To parse this page, update the MagPiTrackerService
code as shown below:
After loading the page with AngleSharp, you have to write your CSS-selector to get the element you’re interested in. In this example, the latest issue number is obtained from the href attribute of the
anchor element (by parsing the number that follows the latest ‘/’ character)
Similarly, the cover URL is parsed from the src
attribute of the img
element.
To test the latest version, update the Program.cs file as shown below:
Now the IMagazineTrackerService
is also configured as a service and retrieved from the service provider. Then tracker.GetLatestIssueNumber
and tracker.GetLatestIssueCoverUrl
is used to scrape the data and print it.
Run the application, and you should see an output that looks like this:
The third and final piece of information you need is the link to the PDF file. If you click on the “Download Free PDF” link, you get redirected to https://magpi.raspberrypi.com/issues/121/pdf, which looks like this:
If you click on the "No thanks, take me to the free PDF" link, you get redirected to https://magpi.raspberrypi.com/issues/121/pdf/download, and your download starts automatically. This is done by placing an iframe and setting the src as the link to the URL.
If you look at the source code of the download page and search for “iframe”, you should find the relevant code looks like this:
To parse this URL, update the MagPiTrackerService.GetIssuePdfUrl
method as shown below:
Update the test code in Program.cs only to test the latest update:
Run the application and confirm you can see the same URL you saw in the download page source:
Set up Twilio to Send SMS Notifications
Before implementing the actual notification mechanism, create a new interface to ensure all notification channels work the same. Create a file named INotificationService.cs and update its code like this:
In the demo project, you will implement SMS/MMS notifications using Twilio Programmable SMS.
Now that you have all the information, you need to deliver this to Twilio so that you can get SMS notifications on your mobile device. To achieve this, first, add Twilio SDK to your project by running:
You will need your Account SID and Auth Token to be able to talk to the Twilio API. You can find both of these on the welcome page in the account info section when you log in to the Twilio Console:
To store these values, you can use environment variables or a vault service, but for local development, you can use dotnet user secrets. First, you need to initialize user secrets by running
Then, create two new user secrets called Twilio:AccountSid
and Twilio:AuthToken
and set the values:
Create a new file called SmsService.cs and add the following code:
The SMS message needs to be sent from your Twilio phone number (which you can find right below Account SID and Auth Token on Twilio Console welcome page).
The reason the code checks whether or not coverUrl
has a value is that some Twilio Phones Numbers don’t support MMS. For example, Twilio Phone Numbers from the United Kingdom (UK) do not support MMS, so my UK number could only send plain SMS. So, if you are not able to send MMS messages, simply send an empty string as the cover URL so that setting the coverUrl
in your worker service looks like this:
Alternatively, you can create a boolean setting such as includeCoverUrl
to manage this behaviour.
To store both from and to phone numbers, update appsettings.json like this:
Create a file called SmsSettings.cs with the following class:
Finally, update Program.cs to reflect these changes:
Time to test the final version (which also updates the database with the latest issue number). Run the application, and you should receive an SMS/MMS on your phone.
My UK Twilio Phone Number doesn’t support MMS. If I try to set the coverUrl
to the image URL, I get the following exception:
So I set the coverUrl
to empty string as discussed previously and the SMS I receive on my phone looks like this:
And when I tap on the link, I get this:
To test the MMS feature, I purchased a US Twilio Phone Number and sent the same message with the actual coverURL
(meaning reverted the code to its original version: var coverUrl = await _magazineTrackerService.GetLatestIssueCoverUrl();
).
When I send the message from the US phone number, I get this message:
It shows the text, the full URL to the PDF and a shortened URL of the cover image.
In my case, I prefer the original message. Depending on your phone, carrier and the messaging app you use, your experience may vary. I’d recommend playing around with splitting up the notification into multiple messages, such as sending the text in one message and the cover image in another or sending text, cover image, and URL all in different messages. Try it out and decide which format you like the most.
Schedule the Worker Service
You have a working application but it only functions when you run it manually. To automate the process, move the code into the Worker.cs class shown below:
This way, you can remove all the previous test code and initializations and Program.cs becomes very concise:
Now run the application again (reset the database first to a value lower than the latest issue number), and you should receive an SMS/MMS; your database should be updated with the latest issue number, and your service should wait for 1 hour and then run the code again. You can, of course, change how often you would like to check for new issues by changing the delay.
Conclusion
My favorite projects are the ones that I develop to solve a real problem of mine. This one was a small issue, but I like the idea of automating something that otherwise I’d forget. Even though there is one implementation of a magazine tracker service, you can adapt the existing code for your favorite publication. As long as you add a new class that implements the same interface, you can replace the registration code in Program.cs and your application will start fetching that magazine. The same goes for the notification. You can replace SMS/MMS with email using SendGrid or WhatsApp.
If you'd like to keep learning, I recommend taking a look at these articles:
- How to send vCards with WhatsApp using C# and .NET
- Send Emails with C#, Handlebars templating, and Dynamic Email Templates
- Render Emails Using Razor Templating
Volkan Paksoy is a software developer with more than 15 years of experience, focusing mainly on C# and AWS. He’s a home lab and self-hosting fan who loves to spend his personal time developing hobby projects with Raspberry Pi, Arduino, LEGO and everything in-between. You can follow his personal blogs on software development at devpower.co.uk and cloudinternals.net.
Related Posts
Related Resources
Twilio Docs
From APIs to SDKs to sample apps
API reference documentation, SDKs, helper libraries, quickstarts, and tutorials for your language and platform.
Resource Center
The latest ebooks, industry reports, and webinars
Learn from customer engagement experts to improve your own communication.
Ahoy
Twilio's developer community hub
Best practices, code samples, and inspiration to build communications and digital engagement experiences.