Since we launched the Reporters’ Lab at the 2012 CAR Conference one year ago, we’ve been focused on finding ways to get better tools into the hands of more working journalists.
We’ve come at that core mission from a few different directions. Some software solutions we develop ourselves. Others we test and review or discover in academia. But the next step — fostering widespread adoption of these tools and techniques among journalists — can be the most difficult part of the process.
We’re certainly tackling this issue now as we look to grow and develop tools like the Video Notebook and the Raleigh Public Record’s DocHive, software capable of turning scanned documents into structured data. But as I learned at the 2013 CAR Conference, the problem is also a challenge for larger organizations like Northwestern’s Knight Lab and experienced development teams like the one behind DocumentCloud.
At a conference session Saturday morning, Northwestern’s Rich Gordon broke down some of the most valuable lessons from a five-year study by Charles Schweik and Robert English of the University of Massachusetts-Amherst, who tried to find out what makes open-source projects succeed or fail. After studying more than 80,000 projects on SourceForge, a publishing platform for open-source software, they found that four out of five were abandoned during the early stages of development.
Their complete findings are detailed in a book innocuously titled Internet Success, released in June. But Gordon said the key takeaways were what the successful projects had in common:
- Clearly defined vision
- Clearly defined audience
- Well-articulated and clear goals
- Modular software design
- Tasks of various sizes are available for developers to work on
Also on that list was effective project communication, including a good website, a bug tracking system and thorough documentation.
It’s telling that these features are all characteristic of DocumentCloud, one of the most successfully deployed journalism tools to date.
While it may be tempting at first to attribute that success wholly to the financial support DocumentCloud received throughout its development cycle — first from the Knight News Challenge in 2009 and then from IRE in 2011 – it’s clear there’s something more at play here.
So what makes DocumentCloud so different from the open-source products that fail? And how can we replicate that level of success for other tools?
DocumentCloud Lead Developer Ted Han, who shared the stage with Gordon during the session, told the audience that the application’s wide adoption was driven largely from the ground up. It takes only one reporter to sign up a newsroom for the service, and word spreads easily from there.
That’s part of the reason DocumentCloud is used by news organizations big and small, from the The Poughkeepsie Journal to the Los Angeles Times.
But there’s an even more important metric to note here. DocumentCloud’s underlying open-source technologies — Backbone, Underscore and Document Viewer – are among the most watched projects on GitHub. That open ecosystem, along with its flexible application programming interface, allows more tech-savvy newsrooms to improve on the product and what it can offer both journalists and readers.
At a Thursday afternoon CAR conference session on using the DocumentCloud API, for example, L.A. Times Database Producer Ben Welsh detailed how his newsroom uses a custom application to help reporters and producers upload documents that feed directly into both the Times’ collection and its website. The data desk released that code, in turn, back into the open-source ecosystem.
This underscores an important characteristic of DocumentCloud and one of Gordon’s last takeaways from the open-source study: Developers of a successful project must also be users.
If there’s a caveat to this discussion, it’s that “success” doesn’t mean the same thing in the open-source world as it does to a commercial enterprise. For the purposes of their research, Schweik and English defined success narrowly as an application that showed value for “at least a few users” with more than three releases.
DocumentCloud’s users number about 6,000, while its count of active users is closer to 1,500. That’s low in the start-up world, where most business models would require a much higher threshold to achieve sustainability. For tool development to work commercially, we’ll need an entirely different incentive structure.
But Han said the DocumentCloud team still takes a lot of cues from the start-up model. The big difference between open-source and commercial start-ups, he told the audience during the Saturday morning session, is “less a question of how you operate and more a question of your values.”
For all the lessons we can learn from practical examples like DocumentCloud, it’s clear there’s no one-size-fits-all solution to get these tools into the hands of more reporters. Whether here or at the Knight Lab, there’s a spectrum of “products” in the journalism tool pipeline that range from pure experimentation to actual enterprise solutions, and exposing them all to the world effectively will take a lot of guessing and testing.
We’re not sure yet where DocHive falls on that spectrum, but with luck, discussions and research like this will make those guesses a little more educated.