How a conference taught me I know nothing

The Chicago Tribune's Brian Boyer removes his costume panda head before demonstrating PANDA, an open-source database system developed by Investigative Reporters and Editors. | Photo by Tyler Dukes

It was the lightning talks that set my head spinning.

I’m a bit new to the world of data journalism, and up until that point, my first time at the annual Computer-Assisted Reporting conference had stopped just short of overwhelming. But after a few mile-a-minute presentations in this particular session, I only had one clear thought: There’s so much to know, and not enough time to learn it.

Luckily, Politifact developer and University of Nebraska-Lincoln journalism professor Matt Waite gave the audience a little bit of reassuring advice gleaned from Zen Buddhism:

It’s bad to be an expert. If you know something and you think you know something, you ignore a lot.

That seems like the perfect mantra for attending this conference. So many of the journalists and developers who spoke encouraged attendees to treat what they learned as a jumping off point, to follow up as soon as possible with data exploration, investigative reporting and programming of their own.

In the spirit of that advice, here’s a collection of the top tools I picked up from the conference. Most I know nothing (or at least very little) about. In the coming weeks and months, I’m hoping to learn a little more about each of them, whether it’s in the form of a formal review or an informal exploration on the blog.

If you’re looking for an even longer list, check out Chrys Wu’s collection on her blog. And for conference attendees who have suggestions of their own, feel free to share them in the comments.

Evernote // Project tracking

Until one of the conference sessions, I dismissed Evernote as a simple note-taking service. But one of the speakers pointed out that it’s excellent for tracking progress in a story and sharing it with editors.

Excel // Web scraping

Using Excel’s importhtml() and importData() functions is a great way to scrape some Web pages without programming. It doesn’t always work, but it’s a good starting point. More on the process from session speakers Chris Keller and Michelle Minkoff.

CrocTail // Corporation tracker

CrocTail gathers and indexes information about corporations and their subsidiaries, based on their 10-K forms from the U.S. Securities and Exchange commission.

Pipes // News feed management

I’ve always wanted to play more with Yahoo Pipes, which provides a way to mashup and customize RSS feeds just the way you want them. The user interface is visually easy to manage too, so no programming is really required.

Refine // Data cleaning

Google’s downloadable application can quickly clean up messy data by clustering and fixing similar entries like names, places and groups for easier analysis. It can also export data into other formats.

Qlikview // Data visualization

Available in a full, free version, Qlikview is built for business intelligence, although journalists can put it to work for data analysis through visualization. There are also sharing and discussion tools to collaborate with others in the newsroom.

Junar // Data extraction

Junar allows users to collect, track and embed data from the Web just by submitting a webpage containing tables (file submission is also possible). A simple interface lets reporters select the table to process the data, which is then shared publicly on the site.

NodeXL // Relationship analysis

NodeXL is an Excel template for performing network analysis, or exploring how things are connected. As session speaker Peter Aldhous put it, think about it in terms of the game Six Degrees of Kevin Bacon.

Gephi // Relationship analysis

Another open-source tool for network analysis, Gephi is a standalone piece of software for Windows, Linux and Mac.

Scraper // Web scraping

Scraper is a free Google Chrome extension that can help users quickly collect data from simple Web tables and export the information to a spreadsheet.

iMacros // Web scraping

A Firefox add-on for recording tasks in your browser, iMacros can be used to scrape websites by defining a set of actions once, then automatically repeating it numerous times.

QGIS // Mapping

Quantum GIS is a free, open-source mapping system that can help journalists analyze geographic data. In addition to a user-friendly interface, it’s also supported by a large developer community that can help troubleshoot problems and find solutions.

ReVerb // Information extraction

An open-source project from the University of Washington’s Department of Computer Science and Engineering, ReVerb detects relationship between terms on the Web, using them to answer questions automatically.

Overview // Document set analysis

Overview is a visualization tool specifically designed to help journalist find stories hidden inside large document sets using information extraction and other techniques.

Columbia Newsblaster // News topic clustering

Newsblaster uses natural language processing to cluster news by topic and summarize what’s happening using multiple documents. It’s been around for a while and has seen some academic study, and it’s something in which we’re particularly interested at the lab.

About Tyler Dukes

Tyler Dukes is the managing editor for Reporters' Lab, a project through Duke University's DeWitt Wallace Center for Media and Democracy. Follow him on Twitter as @mtdukes.
comments powered by Disqus

The Reporters' Lab welcomes relevant discussion from readers, but reserves the right to remove comments flagged as inappropriate or spam. The lab is not responsible for the content of user comments and cannot guarantee their accuracy.