How To Automate News Scraping With Python And APIs

mediastack apilayer python blog banner

What is mediastack, and What Makes it a Reliable Resource?

mediastack collects and aggregates news data from thousands of global sources. After that, it converts it into a standardized and machine-readable format and delivers it to developers in straightforward and lightweight JSON format.

In addition, the mediastack API searches millions of news articles in real-time and accesses historical news data across multiple categories. These include business, health, entertainment, and sports, as well as many more.

 

apilayer, an Austrian technology company that develops a variety of reliable programming interfaces (APIs) built and maintains mediastack. In addition to mediastack, apilayer focuses on providing affordable APIs for developers and startups. If you want to see more, then you can browse all their available products here. Most importantly, all apilayer APIs are highly stable and compatible with each other. For example, mediastack also pairs well with the apilayer geolocation api.

For more on apilayer and its commitment to quality, read apilayer becomes a founding member of the API3 Alliance here:

https://blog.apilayer.com/apilayer-becomes-a-founding-member-of-the-api3-alliance/

If you want to learn how to web scrape with apilayer, then please refer to this article:

https://blog.apilayer.com/easily-spider-websites-using-node-js-and-powerful-rest-apis/

Next, if you want to scrape or monitor Search Engine results, check out these articles:

https://blog.apilayer.com/easily-monitor-google-search-results-using-node-js/
https://blog.apilayer.com/easily-build-your-own-search-engine-using-a-javascript-api/

This article outlines in detail diverse API endpoints, available options, integration guides for a scripting language (Python), interactive web-based dashboard (Jupyter Notebook), and desktop applications (Delphi).

Finally, take a look at this article if you are interested in Node.js implementation:

https://blog.apilayer.com/how-to-create-a-news-web-app-in-node-js-with-a-media-api/

Why Should You Use mediastack to Provide the Latest News Data?

medaistack began in 2017 as an internal sports news aggregation feed. Over the years, however, mediastack has grown into one of the most popular one-stop shops for news data. This is because it consistently offers thousands of relevant news and blog articles per day.

Today, mediastack is the trusted news data resource for more than 2,000 happy companies worldwide. As a result, it powers their live news feeds and data analytics platforms as well as trend analysis applications around the globe.

Here just are some of the reasons why mediastack is the trusted news data API resource for more than 2,000 major companies worldwide:

  • media stack gathers information from over 7,500 reputable news sources around the world.
  • It publishes thousands of relevant news and blog articles every day.
  • mediastack powers global live news feeds, data analytics platforms, and trend analysis applications.

In addition to that, tens of thousands of individuals, universities, and non-profit organizations rely on the API for reliable, real-time news data.

Whether you run a non-profit organization, a small business resources website, a community news aggregator website, or simply want to run an RSS feed on your blog, the mediastack API gives you the same sources of news data that the big players use on a daily basis.

What Features Does the Mediastack API Offer?

The following is a list of the mediastack API’s features:

1. Live News

You can access the full set of available real-time news articles using a simple API request to the mediastack API’s news endpoint.

HTTP GET Request Parameters:

Object Description
access_key [Required] Specify your unique API access key, which is shown when you log in to your account dashboard.
sources [Optional] Include or exclude one or multiple comma-separated news sources. For example: To include CNN, but exclude BBC: &sources=cnn,-bbc
categories [Optional] Include or exclude one or multiple comma-separated news categories. For example: To include business, but exclude sports: &categories=business,-sports.
Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), as well as technology (Technology News)
countries [Optional] Include or exclude one or multiple comma-separated countries. For example: To include Australia, but exclude the US: &countries=au,-us.
Available countries: See all supported countries
languages [Optional] Include or exclude one or multiple comma-separated languages. Example: To include English, but exclude German: &languages=en,-de.
Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and well as zh (Chinese)
keywords [Optional] Search for sentences, you can also exclude words that you do not want to appear in your search results. Example: To search for “New movies 2021” but exclude “Matrix”: &keywords=new movies 2021 -matrix
date [Optional] Specify a date or date range. Example: &date=2020-01-01 for news on Jan 1st and &date=2020-12-24,2020-12-31 for news between Dec 24th and 31st.
sort [Optional] Specify sorting order. Available values: published_desc (default), published_asc, popularity
limit [Optional] Specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Specify a pagination offset value for your API request. Example: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

2. Historical News

If you subscribe to the Standard Plan or higher, you can access historical news data by specifying a historical date using the API’s date parameter in YYYY-MM-DD format.

HTTP GET Request Parameters:

Object Description
access_key [Required] Specify your unique API access key, which is shown when you log in to your account dashboard.
date [Optional] Specify a date or date range. Example: &date=2020-01-01 for news on Jan 1st and &date=2020-12-24,2020-12-31 for news between Dec 24th and 31st.
sources [Optional] Include or exclude one or multiple comma-separated news sources. Example: To include CNN, but exclude BBC: &sources=cnn,-bbc
categories [Optional] Include or exclude one or multiple comma-separated news categories. Example: To include business, but exclude sports: &categories=business,-sports.
Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), technology (Technology News)
countries [Optional] Include or exclude one or multiple comma-separated countries. Example: To include Australia, but exclude the US: &countries=au,-us.
Available countries: See all supported countries
languages [Optional] Include or exclude one or multiple comma-separated languages. Example: To include English, but exclude German: &languages=en,-de.
Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and zh (Chinese)
keywords [Optional] Include or exclude one or multiple comma-separated search keywords.Example: To include the keyword “virus”, but exclude “corona”: &sources=virus,-corona
sort [Optional] Specify sorting order. Available values: published_desc (default), published_asc, popularity
limit [Optional] Specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Specify a pagination offset value for your API request. Example: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

3. News Sources

Using the source’s endpoint together with a series of search and filter parameters searches all news sources the mediastack API supports. The API also returns all available source metadata. This includes the source IDs that define sources when requesting live or historical news.

HTTP GET Request Parameters:

Object Description
access_key [Required] Specify your unique API access key, which is shown when you log in to your account dashboard.
search [Required] Specify one or multiple search keywords.
countries [Optional] Include or exclude one or multiple comma-separated countries. Example: To include Australia, but exclude the US: &countries=au,-us.
Available countries: See all supported countries
languages [Optional] Include or exclude one or multiple comma-separated languages. I.e. To include English, but exclude German: &languages=en,-de.
Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and zh (Chinese)
categories [Optional] Include or exclude one or multiple comma-separated news categories. I.e. To include business, but exclude sports: &categories=business,-sports.
Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), technology (Technology News)
limit [Optional] Specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Specify a pagination offset value for your API request. I.e.: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

 

How can I get Started with mediastack API?

First, get your API Credentials here, and set up your subscription plan:

You can also monitor your usage via this dashboard.

API Access Key & Authentication

Next, to check if everything is working properly, simply run this URL in your favorite web browser:

You will get this API response inside your browser:

Here is the output you will get if you test it with Postman:

Next, here are the API Response Objects:

API Response Objects

Response Object Description
pagination > limit Returns your pagination limit value.
pagination > offset Returns your pagination offset value.
pagination > count Returns the results count on the current page.
pagination > total Returns the total count of results available.
data > id Returns the source ID of the given news source. This is also the ID you need to pass to the API’s sources parameter when requesting live or historical news data.
data > name Returns the name of the given news source.
data > author Returns the name of the author of the given news article.
data > title Returns the title text of the given news article.
data > description Returns the description text of the given news article.
data > url Returns the URL leading to the given news article.
data > image Returns an image URL associated with the given news article.
data > category Returns the category associated with the given news article.
data > language Returns the language the given news article is in.
data > country Returns the country code associated with the given news article.
data > published_at Returns the exact timestamp the given news article was published.

How can I Collect News Automatically with Python and mediastack?

Use the following code to make an API request using Python, to get the latest news excluding general and sports categories, and limit the result for only 10 news:

Here are the API responses inside the PyScripter IDE:

 

How can I Develop a Web-Based News Scraper with mediastack and Jupyter Notebook?

You can easily build an interactive web-based news collector/scraper using Python scripts in the previous section, just by running them inside Jupyter Notebook. If you have never used Jupyter Notebook before, visit the installation guide on the official website of Project Jupyter.

Here is the result:

 

Bonus: API Request using RAD Studio REST Debugger & Create Desktop Based News Apps

How can I Make an API Request to mediastack using Delphi REST Debugger?

Download Delphi REST Debugger here:

Choose the GET method, and send the request to the following URL:

Next, after getting the API responses you need, you can create Desktop apps based on it using Delphi. Please refer to these articles to get started:

https://blogs.embarcadero.com/using-apilayer-rest-apis-from-delphi/
https://blogs.embarcadero.com/how-to-build-a-powerful-app-for-live-news-and-more/
https://blogs.embarcadero.com/how-to-create-a-cross-platform-news-app-with-delphi/

Are You Ready to Build Your Own News Scraper Apps?

As you can see, the mediastack REST API endpoint connects to any platform or programming language you work with. In this article, we show you a basic demo of how you can access all available live news with Python scripting, web, and desktop-based applications. 

Take advantage of the free tier on mediastack. We strongly recommend you upgrade your subscription plan. especially if you need more powerful features. Our paid tiers support HTTPS Encryption, Historical Data, Commercial Use, Technical Support, and Custom Solutions. You can also contact us for a custom solution. We can’t wait to see what you build with our REST API!

Head over and sign up for free to start integrating resourceful live news data into your apps today!