« August 2006 | Main | October 2006 »

Data That Talks

Parke Wilde raised an important issue on his U.S. Food Policy blog last week: As public data becomes more accessible, we should focus on ways to pull together disparate bodies of data.

Parke explains how he used data from the Environment Working Group and CSPAN to show overlap of federal farm aid and campaign contributions.

The challenge, as Bill Allison put it on the Sunlight Foundation blog (where I found Parke’s post), is that “these disparate sets of data don't talk to one another.”

Dan Gillmor and his Berkeley class seem to be focusing on this problem. Their project in California’s 11th Congressional district seems designed to pull together public resources (including data, I assume) to make it easier for citizen journalists to cover the race.

Opening up data the way folks at the EWG, CSPAN and many others are doing, makes it possible for masses of curious citizens to poke around and expose government rottenness. 

But to really get the masses poking around, the data needs to be highly accessible. One part of that is being able to pull together similar, separate bodies of information

Publish Data – It’s Good for Business

In an important post earlier this month, Adrian argued that news organizations need to move away from the “story-centric worldview.”

Instead of grinding information they collect into unstructured stories, he said, news sites should build operations that collect structured data and repurpose it in as many useful ways as possible.

He’s exactly right, and I’d add one point: Publishing structured, community data is good for business. 

Data? Good for business?

In cases where it’s the best way to consume information about a community, absolutely.

Think about how newspapers used to make money: Their articles were the best sources of information about their communities. That meant community members had to read the paper and businesses had to advertise in the paper. A paper’s business was built on its status as the best source of information in the community.

Today articles are not always the best sources of information.

Consider standardized testing results. If you’re a parent looking into local schools, you care about test results. But you don’t want to read an article summarizing results from across your county –- you just want to see the raw results in your town. Same goes for many other forms of data –- crime reports, campaign finance data, election results, census data, etc.

To remain the best source of information in their communities –- to protect the foundation of their businesses -- news organizations need to publish data as easily as they publish articles.

Collecting and publishing data the way Adrian suggests will also help news organizations defend against challengers.

Structuring and cleaning data is a lot of work. If your operation collects clean, structured data and takes advantage of that data, upstart publishers will have a tough time competing with you. Brad Burnham explained the idea of data as a defensive tool on the Union Square Ventures blog.

If you’re still not convinced, consider these successful online publishing businesses: NYTimes.com/movies, ESPN.com and Yahoo! Finance.

These sites all run articles, but none of them are designed with a singular focus on articles. They’re all built with the understanding that users want information that comes in many forms –- charts, showtimes, prices, video, you-name-it. Data is so important to The Times that it just bought a company that provides data for its movies section.

The community news sites that grow into successful businesses will be the ones that follow this model –- the ones that invest in data and publish information in the most useful formats possible.

Shapes, Maps & Massachusetts Campaign Contributions

At Faneuil Media we often use maps to make data more accessible to readers. Because it’s so simple and people are familiar with it, we frequently use Google’s mapping platform.

Unfortunately, Google’s API doesn’t make it easy to map shapes and areas. It’s simple to map specific points, like the location of crimes, but not areas, like neighborhoods.

To help readers really penetrate a body of geographic data, it’s often important to provide a detail-level view (specific points) AND a summary-level view (areas). But until now, we haven’t been able to do both with the Google API.

Now we can.

Last week we launched a package on Boston.com that maps campaign finance data for Massachusetts gubernatorial candidates. As in the past, we’ve plotted detail-level data on the map -- in this case, campaign contributions.

But this package is special because readers can also view summaries of the data. Readers can see that Chris Gabrieli’s contributions to his own campaign dominate Boston contributions, that Western Massachusetts is Deval Patrick country and that Tom Reilly leads fundraising in many of the state’s suburban and exurban communities.

This project stretched us further than anything else we’ve done. The data set was bigger and messier than anything we’ve worked with. Even after lots of cleanup, there were many contributions that we couldn’t find coordinates for and therefore were not able to map. Also, the layers are very resource-intensive and we spent a lot of time figuring out how to host the package.

But thanks to some amazing work on Theo’s part and a huge amount of patience at Boston.com, we addressed these problems, and ended up with a package that we’re very happy with. By building a deep, intuitive interface it’s now easy – not to mention kind of fun – to see who’s contributing to candidates. Hopefully we’ve made the election process in Massachusetts a little bit more transparent.

Now that this has been up for a few days, we’re beginning to think about next steps. If you have any thoughts or other feedback, let me know.