How To Easily Scrape Any Web Page Using Proxies And NodeJS

Do you want to turn web pages into actionable data? ScrapeStack is powered by one of the most powerful web scraping engines on the market — offering the #1 solution for all your scraping requirements in one place.

Tap into our extensive pool of 35+ million datacenter and residential IP addresses across dozens of global ISPs, supporting real devices, smart retries and IP rotation. Choose from 100+ supported global locations to send your web scraping API requests or simply use random geo-targets — supporting a series of major cities worldwide. Scrape the web at scale at an unparalleled speed and enjoy advanced features like concurrent API requests, CAPTCHA solving, browser support and JS rendering.

In this article we will look at how you can use ScrapeStack and NodeJS to build a web spider and extract data. Let’s get started!

Table of Contents

What is ScrapeStack & why do I need this for web scraping?

The ScrapeStack API helps to avoid problems with Geolocation, IP blocks, CAPTCHAs, and more. With the REST API, you can scrape web pages at scale without having to configure request methods in your program.

How can I use the ScrapeStack API?

To scrape web pages using this API, use the API’s endpoint and append the URL you would like to scrape following with your API access key.

Furthermore, there are other optional parameters you can utilize. For instance, the premium_proxy feature is available in the premium subscription. This option helps to fetch data from 35+ million IP addresses worldwide. By default, it does not utilize the same IP address to get data.

Is it possible to scrape forms or API endpoints?

Yes, and this feature is available on all subscription types. You can scrape API endpoints or forms directly by giving API requests via HTTP POST or PUT methods.

Before we go any further, you have to be aware of the legal limitations and restrictions of web scraping, as it could have some copyright concerns. Not all websites allow data scraping. Be sure to check the terms and conditions of the website or the organization before any action on their data!

How can I start working with ScrapeStack API?

Head over to the ScrapeStack website and set up your subscription plan. You can select a Free plan to test it out.

How can I scrape web pages with ScrapeStack API with my Node.js App?

Well, now the engaging part is started. If you head over and check out the official documentation page, you find a sample app with Node.js. Now we will add a few functions to scrape a real web page.

After signing up, you get access to your API access key. So now I will show every step you need to take to set up your project. In this sample, I will grab data from my web page. I will fetch every project name and project summary from my project’s web page. And then, we will save it to the result.json file.

How to set up Node.js with ScrapeStack API?

Open your command line and type this command to make a folder and create a starter project file in that folder:

In my case, I am using VS Code, because it is good with JavaScript!

Now we need to initialize the Node.js app with npm commands and download the required Node packages.

Open your terminal and type this:

It asks for some data to input, but you can skip it with the Enter button. Now we have a package.json config file.

Now we need to install several npm modules, and the Node package manager adds them to our package.json file.

And the result must be like this:

Well, everything is ready. Just paste this code to your app.js main file.

const axios = require('axios');

const cheerio = require('cheerio');

const fs = require('fs');

const params = {

access_key: '92d1e7e73a2785e60ed6951820d7f190',

url: 'https://www.muminjon.com/projects-grid-cards.html'

}

axios.get('http://api.scrapestack.com/scrape', {params})

.then(response => {

let data = {};

const websiteContent = response.data;

const $ = cheerio.load(websiteContent); // load the fetched data to cheerio

// iterate card-class and get prjName from h6 heading element

$('.card-body h6').each(function(i) {

console.log('__ __________________ __');

console.log($(this).text());

data[i + "-Project Name"] = $(this).text();

i += 1;

});

// now fetch project summary from paragraph element

$('.card-body p').each(function(i) {

console.log('__ __________________ __');

console.log($(this).text());

data[i + "-Project Summary"] = $(this).text();

i += 1;

});

// write the data to result.json

fs.writeFile('result.json', JSON.stringify(data), function(error) {

if(error) throw error;

console.log('Result File Created Successfully!');

})

}).catch(error => {

console.log(error);

});

Here we are scraping the project title and summary from my web page.

How do we know what to select from the response we are getting?

Open your web browser and navigate to: https://www.muminjon.com/projects-grid-cards.html

Click Ctrl+Shit+I or open Inspect with the popover menu.

As you can see, we are iterating over the card-body class which all project cards have. We are selecting text data from the h6 heading element and paragraph element.

Let’s scrape project names & summaries!

Open your terminal and write this command to start the Node.js app:

And you have extracted results in the result.json file:

Well, thank you for following on this point. It is just a simple demonstration for you. You can do more with web scraping. For instance:

Detect price drop of the products by scraping prices from e-Commerce platform
Scrape valuable data from informational web pages
Create a history of product changes and create valuable data for business owners
and more

How can I get native performance and parallel processing for web scraping?

If you want even faster parallelization and native performance than NodeJS can provide, you can always go with Delphi, which has a low-code and swift development process plus a powerful parallel processing library. Head over and download Delphi Community Edition and learn more about ScrapeStack API!

How can I get started with web scraping?

Now that we have walked you through web scraping with NodeJS and ScrapeStack you’re probably ready to get started building your own solution. The full source code for the web scraper in NodeJS and the signup link are available below.

Head over and check out the full source code here: https://github.com/MuminjonGuru/ScrapeStack-API-Node.js-Web-Scraping

Ready to get started with ScrapeStack? Sign up!

What is ScrapeStack & why do I need this for web scraping?

How can I use the ScrapeStack API?

Is it possible to scrape forms or API endpoints?

How can I start working with ScrapeStack API?

How can I scrape web pages with ScrapeStack API with my Node.js App?

How to set up Node.js with ScrapeStack API?

How do we know what to select from the response we are getting?

Let’s scrape project names & summaries!

How can I get native performance and parallel processing for web scraping?

How can I get started with web scraping?

Leave a Reply Cancel reply

Something Fresh

Enhancing Customer Experience: A Case Study on Aviationstack API Implementation

Location Based Services: Building with Ipstack

How to Use An API with Java

What People Reading

When is the right time to upgrade from a free API plan to a paid API plan?

What Happens When You Hit Your Monthly API Rate Limit?

Build your own Resume Parser Using Python and NLP

How Geolocation API Can Improve App Development

Yahoo Finance API is Discontinued. Here is your Top 10 Yahoo Finance API Alternatives

Categories

How To Easily Scrape Any Web Page Using Proxies And NodeJS

What is ScrapeStack & why do I need this for web scraping?

How can I use the ScrapeStack API?

Is it possible to scrape forms or API endpoints?

How can I start working with ScrapeStack API?

How can I scrape web pages with ScrapeStack API with my Node.js App?

How to set up Node.js with ScrapeStack API?

How do we know what to select from the response we are getting?

Let’s scrape project names & summaries!

How can I get native performance and parallel processing for web scraping?

How can I get started with web scraping?

Related posts

Leave a Reply Cancel reply

Something Fresh

What People Reading

Categories