Unparalleled suite of productivity-boosting Web APIs & cloud-based micro-service applications for developers and companies of any size.

APIAutomationNews and UpdatesPython

9 Ways To Do Advanced News Scraping With Python And APIs

mediastack with python advanced

What is mediastack, and what makes it so reliable?

mediastack API collects and aggregates news data from thousands of global sources. It then converts it into a standardized and machine-readable format and delivers it to developers in a straightforward and lightweight JSON format.

The mediastack API is also capable of searching millions of news articles in real-time and accessing historical news data across multiple categories. This includes business, health, entertainment, sports, and much more. Another great API to pair with mediastack is the best weather api by weatherstack.

 

This article outlines in detail more advanced ways to use the mediastack API with Python.

For basic mediastack API use with Python, take a look at this article:

and take a look at this article if you are interested in Node.js implementation:

 

Why should you use mediastack to provide live news data?

Starting in 2017 as an internal sports news aggregation feed, over the years mediastack has grown into one of the most popular one-stop shops for news data. This is because it consistently offers thousands of relevant news and blog articles per day.

Today, mediastack is the trusted news data resource of more than 2,000 happy companies worldwide. It powers live news feeds, data analytics platforms, and trend analysis applications around the globe.

Here are some reasons why mediastack is the trusted news data API resource of more than 2,000 major companies worldwide:

  • It gathers information from over 7,500 reputable news sources around the world.
  • It publishes thousands of relevant news and blog articles every day.
  • Finally, it powers many global live news feeds, data analytics platforms, and trend analysis applications.

In addition, tens of thousands of individuals, universities, and non-profit organizations rely on the API for a reliable source of real-time news data.

Whether you run a non-profit organization, a small business resources website, a community news aggregator website, or simply want to run an RSS feed on your blog, the mediastack API gives you the same sources of news data that the big players use daily.

What API features does mediastack provide?

The following is a list of the mediastack API’s features:

1. Live News

Access a full set of available real-time news articles using a simple API request to the mediastack API’s news endpoint.

HTTP GET Request Parameters:

Object Description
access_key [Required] Use this parameter to specify your unique API access key, which is shown when you log in to your account dashboard.
sources [Optional] Use this parameter to include or exclude one or multiple comma-separated news sources. Example: To include CNN, but exclude BBC: &sources=cnn,-bbc
categories [Optional] Use this parameter to include or exclude one or multiple comma-separated news categories. Example: To include business, but exclude sports: &categories=business,-sports.
Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), technology (Technology News)
countries [Optional] Use this parameter to include or exclude one or multiple comma-separated countries. Example: To include Australia, but exclude the US: &countries=au,-us.

Available countries: See all supported countries
languages [Optional] Use this parameter to include or exclude one or multiple comma-separated languages. Example: To include English, but exclude German: &languages=en,-de.

Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and zh (Chinese)
keywords [Optional] Use this parameter to search for sentences, you can also exclude words that you do not want to appear in your search results. Example: To search for “New movies 2021” but exclude “Matrix”: &keywords=new movies 2021 -matrix
date [Optional] Use this parameter to specify a date or date range. Example: &date=2020-01-01 for news on Jan 1st and &date=2020-12-24,2020-12-31 for news between Dec 24th and 31st.
sort [Optional] Use this parameter to specify sorting order. Available values: published_desc (default), published_asc, popularity
limit [Optional] Use this parameter to specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Use this parameter to specify a pagination offset value for your API request. Example: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

2. Historical News

If you subscribe to the Standard Plan or higher, can access historical news data by specifying a historical date using the API’s date parameter in YYYY-MM-DD format.

HTTP GET Request Parameters:

Object Description
access_key [Required] Use this parameter to specify your unique API access key, which is shown when you log in to your account dashboard.
date [Optional] Use this parameter to specify a date or date range. Example: &date=2020-01-01 for news on Jan 1st and &date=2020-12-24,2020-12-31 for news between Dec 24th and 31st.
sources [Optional] Use this parameter to include or exclude one or multiple comma-separated news sources. Example: To include CNN, but exclude BBC: &sources=cnn,-bbc
categories [Optional] Use this parameter to include or exclude one or multiple comma-separated news categories. Example: To include business, but exclude sports: &categories=business,-sports.

Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), technology (Technology News)
countries [Optional] Use this parameter to include or exclude one or multiple comma-separated countries. Example: To include Australia, but exclude the US: &countries=au,-us.

Available countries: See all supported countries
languages [Optional] Use this parameter to include or exclude one or multiple comma-separated languages. Example: To include English, but exclude German: &languages=en,-de.

Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and zh (Chinese)
keywords [Optional] Use this parameter to include or exclude one or multiple comma-separated search keywords.Example: To include the keyword “virus”, but exclude “corona”: &sources=virus,-corona
sort [Optional] Use this parameter to specify sorting order. Available values: published_desc (default), published_asc, popularity
limit [Optional] Use this parameter to specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Use this parameter to specify a pagination offset value for your API request. Example: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

 

3. News Sources

Using the sources endpoint together with a series of search and filter parameters you can search all news sources supported by the mediastack API. The API returns all available source metadata. This includes the source ID required to define sources when requesting live or historical news.

HTTP GET Request Parameters:

Object Description
access_key [Required] Use this parameter to specify your unique API access key, which is shown when you log in to your account dashboard.
search [Required] Use this parameter to specify one or multiple search keywords.
countries [Optional] Use this parameter to include or exclude one or multiple comma-separated countries. Example: To include Australia, but exclude the US: &countries=au,-us.
Available countries: See all supported countries
languages [Optional] Use this parameter to include or exclude one or multiple comma-separated languages. Example: To include English, but exclude German: &languages=en,-de.
Available languages: ar (Arabic), de (German), en (English), es (Spanish), fr (French), he (Hebrew), it (Italian), nl (Dutch), no (Norwegian), pt (Portuguese), ru (Russian), se (Swedish), and zh (Chinese)
categories [Optional] Use this parameter to include or exclude one or multiple comma-separated news categories. Example: To include business, but exclude sports: &categories=business,-sports.
Available categories: general (Uncategorized News), business (Business News), entertainment (Entertainment News), health (Health News), science (Science News), sports (Sports News), technology (Technology News)
limit [Optional] Use this parameter to specify a pagination limit (number of results per page) for your API request. The default limit value is 25, the maximum allowed limit value is 100.
offset [Optional] Use this parameter to specify a pagination offset value for your API request. Example: An offset value of 100 combined with a limit value of 10 would show results 100-110. The default value is 0, starting with the first available result.

 

How can I get Started with mediastack API?

First, get your API Credentials here, and set up your subscription plan:

 

You can also monitor your usage using this dashboard.

API Access Key & Authentication

Next, to check if everything is working properly, just run this URL in your favorite web browser:

 

Here is the output if you test it with Postman:

 

Here are the API Response Objects and their descriptions:

Response Object Description
pagination > limit Returns your pagination limit value.
pagination > offset Returns your pagination offset value.
pagination > count Returns the results count on the current page.
pagination > total Returns the total count of results available.
data > id Returns the source ID of the given news source. This is also the ID you need to pass to the API’s sources parameter when requesting live or historical news data.
data > name Returns the name of the given news source.
data > author Returns the name of the author of the given news article.
data > title Returns the title text of the given news article.
data > description Returns the description text of the given news article.
data > url Returns the URL leading to the given news article.
data > image Returns an image URL associated with the given news article.
data > category Returns the category associated with the given news article.
data > language Returns the language the given news article is in.
data > country Returns the country code associated with the given news article.
data > published_at Returns the exact timestamp the given news article was published.

How can I collect news using Python Requests?

Requests is a simple, yet elegant HTTP library. Requests allow us to execute standard HTTP requests. We will use this library to interact with mediastack API.

Requests is one of the most downloaded Python packages today, pulling in around 14M downloads/week. According to GitHub, 500,000+ repositories depend on Requests. Knowing this, you can definitely trust this library.

Use the following code to make a GET request using Python’s Requests to get the latest news and limit the result to 10 news:

Here is an excerpt of the API responses on Python IDE:

 

How can I collect news with specific categories using Python and mediastack?

To collect news within a specific category, add the ‘categories’ GET request parameter to the URL:

Here is an excerpt of the Business and Technology news on Python IDE:

 

How can I collect news with specific language using only Python and mediastack?

To collect news in a specific language (e.g. English), add the ‘languages’ GET request parameter to the URL:

Here is an excerpt of the English news on Python IDE:

 

How can I collect news from specific sources with Python and mediastack?

To collect news from a specific source only (e.g. BBC), add the ‘sources’ GET request parameter to the URL:

Here is an excerpt of the news from BBC on Python IDE:

 

How can I collect thousands or more news items with Python and mediastack?

Specify the limit in the URL. Here we have specified 100:

Here is an excerpt of 100 real-time news items:

 

How to collect specific news items by index number?

After collecting hundreds or thousands of news items, you can access them specifically by their index number. You can do it by employing Python’s array indexing. Here are the examples:

Request the first news item:

or the 11th news item:

or even the 100th news item:

 

How to get specific individual data using API response objects?

Run the following code, if you want to get specific individual data for a specific news item (e.g. the 11th news item):

 

How to scrape hundreds of news items and save them into a CSV file?

If you want to save all the scraped news into a CSV file for archiving or further research purposes, use the following code:

The above code will collect 100 business and technology news items, and store them into a CSV file.

Here is the output inside the CSV file:

If you want to collect or populate your apps with thousands or even millions of news items, we strongly recommend you update your plan here.

How can I develop web-based news scraper apps with mediastack and Jupyter Notebook?

You can run all the above scripts inside Jupyter Notebook. Here is an excerpt of the results in your default web browser:

 

Check out the full source code here!

Are you ready to build your own advanced news scraper apps?

As you can see, there the mediastack REST API endpoint provides rich news data and API features that you can connect to any platform or programming language you work with. In this article, we showed you some advanced tricks for using mediastack with Python. 

Take advantage of the free tier on mediastack and upgrade your subscription plan if you need more powerful features (HTTPS Encryption, Historical Data, Commercial Use, Technical Support, and Custom Solutions). You can also contact us for a custom solution. We can’t wait to see what you build with our REST API!

Head over and sign up for free to start integrating live news data into your apps today!

Related posts
APIAutomationFeatured

IP Geolocation API: Resolve IP Lookup Info Inside Google Sheets

API

12 Steps to Find the Perfect API to Verify Email Address

APIAutomationJavascriptLocation

Geocoding | Getting Started With a Geo API Service Using NodeJS

API

ProxyScrape: 10 Best Alternatives In 2024 (Free & Premium)

Leave a Reply

Your email address will not be published. Required fields are marked *