Review Roundup, Reviews

Top tools for handling government printouts

How we review

The Reporters’ Lab aims to produce reviews that are consistent, independent, fair and, above all, useful for a reporter with little time or  patience for technical details.  We use full versions of products against a curated set of documents and tests so you can compare apples to apples and figure out what’s worth the money, how hard it will be to learn, and how long it might take. Whenever we can, we contact companies to address specific product critiques and include their responses when they’re useful to users. We don’t let companies read the reviews in full or approve or reject content.

Spreadsheets aren’t very useful when they’re locked inside printouts and PDFs.

Although there are scores of products available that claim to transform your document into something sortable, they vary wildly in price and capability. You’ve got more important things to do than waste time downloading free trials of software, so over at Reporters’ Lab Reviews, we’ve done it for you.

Using a curated set of sample documents from actual reporting projects, we’ve tested five different products billed as PDF tools on their ability to convert files from PDF to Excel format.

Our testing process is rigorous, and if you’d like you can click through to see how we rated each product on each test. That’s especially useful if you’ve got a certain type of document that you’re looking to convert. We tested products on output from a Microsoft Access database, a simple lined spreadsheet, an unlined list of appointments with mixed-in headings and a table with a tricky embedded font that foiled almost everything we threw at it.

But if you’re just interested in finding out what to try yourself, we’ve got two suggestions for you.

First stop, Cometdocs

Let me emphasize that Cometdocs was far from our top performer. It mangled several of our test documents so badly they weren’t worth cleaning up. Two others were mediocre and needed some additional work.

But it’s hard to argue with free and fast.

There’s no fee. No downloading. No installation. No learning curve. It doesn’t even slow your computer down while it’s working. Just plug in your email address and your document, and Cometdocs will process it and send it back in a few minutes.

That makes the application my suggestion for all of your PDF-to-Excel projects. Even if it’s awful, you lose nothing. And if it churns out something usable, you just saved yourself the hassle of learning how to use a program and waiting for it to stop tying up your processor.

Did I mention it’s free?

The nuclear option: Able2Extract

It’s was pretty surprising to us that our top pick for PDF tools is pretty affordable (about $100), especially given some of its high-dollar competition.

But price doesn’t always mean quality, especially when you’re looking at specific pieces of functionality.

Able2Extract wasn’t perfect, but it did the job well on some of our toughest tasks, especially the ones less adaptable PDF tools couldn’t begin to handle. Because the software allows users to indicate where rows and columns are, the spreadsheets it spits out need a lot less clean-up. This is the kind of option you need on the complicated documents that stump lesser programs.

There’s even some flexibility with the price. A 30-day subscription costs about $35, useful if you’ve got a short-term project that requires a heavy-hitting solution. You can even upgrade to the professional version for $130 that features optical character recognition for processing text.

One-to-many relationships apply to tables like these, where the "Library of Congress" heading needs to be applied to each one of the cells below it. This almost always requires a manual step.

Keep in mind that no PDF tool will do it all. Everything we tested failed to conquer problem of headings on printouts that should apply to certain rows of data. But the problem’s so common you can find solutions readily available, including in this tip sheet collections of Investigative Reporters & Editors (membership required).

The bottom line is that between Cometdocs and Able2Extract, a software solution will get you close enough to cope with your data — and free up your time to do more reporting.

About Tyler Dukes

Tyler Dukes is the managing editor for Reporters' Lab, a project through Duke University's DeWitt Wallace Center for Media and Democracy. Follow him on Twitter as @mtdukes.
comments powered by Disqus

The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.