..tools, techniques, research for public affairs reporting
  • About
  • Contact
Our Tools:
  • TimeFlow
  • Video Notebook
  • Haystax

Main menu

Skip to primary content
Skip to secondary content
  • News
  • Reviews
  • Developer challenges
  • Research Opportunities
  • Archives
A Duke University Project
 

Tag Archives: OCR

Reporters' Lab Managing Editor Tyler Dukes and Ryan Thornburg, an online journalism professor at UNC-Chapel Hill, listen in on a demo of DocHive by developer Edward Duncan in the offices of the Raleigh Public Record in early September 2012.
Tools

Turning scanned docs into structured data

Guest Posts//January 14, 2013

A new open-source program from the Raleigh Public Record aims to pull structured data from scanned public records — and developers are still looking for testing help from journalists and open-government advocates. Read More »

Tagged Charles Duncan // DocHive // ImageMagick // nicar // OCR // open source // Raleigh Public Record
Attendees at Transparency Camp 2012 jot down session pitches for the Wall on the first morning of the unconference. | Photo by Tyler Dukes
Transparency

What to do when text tells you nothing

Tyler Dukes//May 3, 2012

As it turns out, journalists aren’t the only ones stymied and frustrated by badly scanned records and hard-to-parse legislation — developers are too. And at Transparency Camp, they talked about how to join forces to fix it. Read More »

Tagged ALEC // Derek Dohler // OCR // Sal Rizzo // Sunlight Foundation // Text // The Star-Ledger // Transparency Camp
Scanned hard copies are often no more than simple images with no recognized text.
Review Roundup

Top-3 tools for recognizing text inside scans

Tyler Dukes//March 28, 2012

Scans of hard copy documents often require time-consuming analysis with human eyeballs. But optical character recognition can help, and we’ve identified three great options to get you started. Read More »

Tagged ABBY FineReader // DocumentCloud // OCR // OmniPage // optical character recognition
this fixes an annoying IE7 bug. Our Mission Latest Reporters' Lab Review Get our source code on GitHub

Sidebar Social Networking

RSS Latest Reviews

  • Outwit Hub Pro can scrape almost any database, but learning curve steepens with complex datasets
  • Pdftotext excels at extracting data from conventional tables, stumbles with more complex tasks
  • For Monarch, high cost translates to unmatched spreadsheet conversion

Recent Posts

  • Tabula’s creator talks next steps
  • PolitiFact editor Bill Adair named Duke Knight chair
  • For watchdog stories, ‘who pays?’ is the wrong question

REPORTERS' LAB

A project of Duke University's Sanford School of Public Policy
Box 90241 // SB 140 //201 Science Drive //Durham, NC 27708-0241
info@reporterslab.org

Questions? Comments?

Email the lab at info@reporterslab.org or call 919-613-7346

© 2012 Reporters' Lab. All Rights Reserved. // Web design by Row Design Studios

More Information

Our Mission
Reviews// Source Code
Developers Challenges// Research Topics

Duke University
Powered by WordPress

Testing

Testing