Review Roundup

Top-2 plug-ins for scraping data right in your browser

Web scraping plug-ins for Google Chrome, Internet Explorer and Mozilla Firefox can’t do it all, but they can make the job of extracting information from the Web a little easier. | Photo via Philippe Ramakers/Stock.XCHNG

Scraping information from the Web can be a complicated affair.

Although there are some off-the-shelf options for performing an online data grab, sometimes you just need something cheap and quick for an exceedingly simple job.

That’s why we’ve tested and reviewed four Web scraping browser extensions, whittling them down to our top two. These applications are primarily designed to copy and paste HTML tables. But despite their limitations, their relative ease of use and inexpensive (or free) pricing makes them valuable time-savers for small jobs.

Want a more full-featured Web scraper? Check out Reporters’ Lab Reviews for the entire list.

What won’t work (ever)

There’s one thing I should point out before getting started: In each set of tests, these applications were unable to extract data from anything other than an HTML table.

Data organized in any other format (<div> tags for example) just won’t work. If you’re not sure how to spot an HTML table, take a look at the one below. Viewing the source code of this page would show the information surrounded by the <table> tag, which is further organized by tags indicating headers (<th>), rows (<tr) and individual cells (<td>) — much like a typical spreadsheet.

Additionally, none of the browser extensions were able to take on more complex tasks traditionally reserved for more robust scrapers. Additional details linked from the data in detail pages, which would normally require users to click through to view, are out of the question. If your target database requires you to enter a search term before viewing results, these extensions won’t be good options because they can’t automate the process.

If your scraping job meets any of the above criteria, look for another solution.

Data Toolbar Scraper Table2Clipboard Table Capture
Preserves
tabular structure
YES YES YES YES
Navigates
multiple pages
YES NO NO NO
Captures info
from detail pages
NO NO NO NO
Automates
searches
NO NO NO NO
Handles limits
on search results
NO NO NO NO

What does work: scraping multiple pages

One of the most maddening parts about manually grabbing online data is clicking through multiple pages. But its repetitive nature makes it an ideal candidate for automation, which is exactly what Data Toolbar ($24) does (read the full review).

The Internet Explorer add-on had no problem nabbing the information from a 406-page listing of lobbyist information, even distinguishing and omitting duplicate header information.

We also found that even with searchable databases that allowed wildcards (for example, returning all names that start in “a,” “b,” “c,” etc.), Data Toolbar saved some time by gobbling up the results. It’s not full automation, but it’s a start.

If simple, multiple-page databases are a common occurrence for you, $24 will be well spent.

Copying tables, simplified

If you’ve got a particularly small number of pages, or just need to preserve the formatting from a large online table of data, give the Table2Clipboard Firefox add-on a try (read the full review).

By adding on an option to the “edit” menu to “Copy all tables,” the free app allows you to paste information directly into a spreadsheet, preserving the tabular formatting. It can also capture the HTML links embedded in the data.

Based on our testing, Table2Clipboard did a better job at these limited functions than Scraper and Table Capture, both for Chrome.

These add-ons may not do it all, but they’re a painless addition to any journalist’s arsenal and might just save time on those one-off projects.

About Tyler Dukes

Tyler Dukes is the managing editor for Reporters' Lab, a project through Duke University's DeWitt Wallace Center for Media and Democracy. Follow him on Twitter as @mtdukes.
comments powered by Disqus

The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.