Codetown

Codetown ::: a software developer's community

Web Scraping Service vs. Automatic Web Scraper: Which is the best option for web scraping?

 

What is web scraping?

Web scraping aka. web extraction or web crawling refers to the process of obtaining various unstructured information from any websites and turn it into structured, clean data such as xls, csv, or txt or populate the captured data to a database directly. Some common uses of web scraping include lead generation, data collection for academic researches, price monitoring from competitors’ websites, product catalogue scraping and many more. For all kinds of good reasons people turn to web scraping and can get pretty confused about which is the best path to go. In this article, I will try to walk through the Pro’s and Con’s of both web scraping service and automatic web scraper.  

 

What are some web scraping options?

When it comes to web scraping, there are two major kinds of providers available in the market, scraping tool provider and scraping service provider. Product provider basically refers to the many so called web scrapers or web extractors, examples are import.io, Octoparse, Scrapy and others. Some of these products are easier to handle for non-technical users such as Octoparse and Import.io. Some require more programming background such as Scrapy and Content Grabber. For those running on a service model, they are commonly known as DaaS, short for Data as Service. These companies do all the scraping work themselves and will provide the data to you in any formats you like in any frequencies; they will even provide weekly/monthly data feeds to you via API if needed. A few well known ones are Scrapinghub, Datahen and Data Hero etc. Among these there are also companies that provides scraping tool and provide scraping service at the same time, Mozenda scraping service and Octoparse Scraping Service. Just because they offer self-customizable scraper doesn’t mean their scraping service is any less proficient than those only do scraping service. In fact, data service provided by crawler companies can be a lot more cost efficient and are much more friendly to one-time scrapes because obviously they have the edge in owning a customizable scraping tool and only minimum manual intervention will be required.

 

So what it the essential difference between using a DIY web scraper and seeking help from a web scraping company? While there are many the most critical ones are,

  1. Cost
  2. Willingness to learn
  3. Deadline
  4. Complexity of the scraping project

 

If you are a student looking to scrape some public data to support your thesis research with a tight budget, a scraping tool will be the best way to go; If you are an enterprises looking to outsource a brand monitoring project running on a tight schedule, data scraping service will provide you with what you need. While these are only two obvious examples of how people of different groups will find themselves at more advantages using one product/service over another, they should give you a general feeling of how to approach this question by going through your specific demands, budget, schedule, project complexity and etc.

 

Comparing web scraping alternatives: 

 

Web Scraper SaaS Service

Professional Data Service (DaaS)

Data Service provided by Crawler Company

Pricing

$60 ~ $200 per month

$350 ~ $2500 per project +
$60 ~ $500 monthly maintenance fee if applicable

$100 ~ $2500 per project +
$60 ~ $300 monthly maintenance fee if applicable

Turnaround

depending on your 
 efforts

3 ~ 10 business days

1 ~ 10 business days

Format of data delivery

Most supports export to  xls, csv, html, txt, Json, xml

Most support csv, html, Json, xml

Most support csv, html, Json, xml

Database, API supported

Depends on the specific product

Yes

Yes

Dealing w/ Complex Website

(java script, ajax etc)

depends on the specific tool

Supported most of the time

Supported most of the time

Mass scale scraping

good volume for low cost if you can get what you need with the scraper

Scalable scrape but cost increases as volume goes up

Scalable scrape but cost increases as volume goes up

Support Customized Request

Self help

Highly Flexible

Highly flexible most of the time

One-time Request Friendly

Yes, pay as you go

Mostly No

Yes

Customer Support

Busy support, some are really helpful
depending on the product

Pretty responsive most of the time

High Priority Support


 

Are you ready to scrape?

Just like everything else, there are Pro’s and Con’s with either a web scraping service or a data scraping tool. Whichever is the better option will largely depend on the specific schema, data application and project budget. Do go through your request thoroughly, carry out the necessary research on the products/services available in the market - all these will be essential to finding the best web scraping solution tailoring to your scraping needs.

 

That’s all I have for now. Feel free to drop a message here if you have any specific questions with any web scraper or service. Cheers!

 

Related Reading:

Top 30 Free Web Scraping Software

Top 5 Web Scraping Tools Review

Get started with Octoparse in 2 minutes - YouTube

Top 9 data visualization tools for non-developers

 

Views: 13

Comment

You need to be a member of Codetown to add comments!

Join Codetown

Notes

Welcome to Codetown!

Codetown is a social network. It's got blogs, forums, groups, personal pages and more! You might think of Codetown as a funky camper van with lots of compartments for your stuff and a great multimedia system, too! Best of all, Codetown has room for all of your friends.



When you create a profile for yourself you get a personal page automatically. That's where you can be creative and do your own thing. People who want to get to know you will click on your name or…
Continue

Created by Michael Levin Dec 18, 2008 at 6:56pm. Last updated by Michael Levin Jul 27, 2011.

Looking for Jobs or Staff?

Check out the Codetown Jobs group.

There's also a free Java Jobs mailing list. It's a Yahoo group so you have to create a Yahoo account to use it.

 

Enjoy the site? Support Codetown with your donation.



Reading List

© 2017   Created by Michael Levin.   Powered by

Badges  |  Report an Issue  |  Terms of Service