Sunday, July 31, 2011

Stories and data in our times

Lately I've been thinking about how massive databases might change how we think about "anecdotal" evidence. You know the old trope, "The plural of anecdote is not data." But maybe it could become so.

With the advent of tools like Google's Ngram Viewer, which you can use to compare the trend lines of any two words over time drawn from the millions of books Google has digitized, what were once isolated incidents may be becoming data. Here's what Ngram shows if you compare "philanthropy" to "charity" over time:



The graph shows the frequency of the two words as they appear in English in all books scanned, covering 1800-2000. There is an amazing drop in the frequency of charity. Philanthropy barely budges, even through the creation of American foundations, the boom of philanthropic product innovation in the 1990s and the last decade in which philanthropy (we thought) was suddenly being talked about everywhere. Maybe we're talking about it, but it doesn't appear that we were writing about it.

Another site, The Corpus of Contemporary American English (COCA) has 425 million words from spoken sources (TV, radio), magazines, fiction and academic texts. This site allows you to run a comparison over time, or see how frequently one of the terms (I chose "philanthropy") appears in each of the different source types. In all source types there is ~2.5x increase in frequency from 1990 to 2009 for the term "philanthropy". If we narrow the search just to frequency of "philanthropy" in spoken form we see that it doubles in frequency - broadcast sources such as NPR, CBS, ABC, PBS, NBC and Fox mention philanthropy twice as often in 2009 in 1990.

Now, that little experiment doesn't prove anything. But the tools that made it possible - the databases of words - are only one such new tool that might change how we think about stories as data, not just stories and data.

GlobalGiving is experimenting with collecting stories as data - so far they have more than 15,700 just from Uganda and Kenya. The goal is to collect stories of service from users and intended beneficiaries of development aid - going beyond the organizations that receive the funding to the actual residents of the communities to hear what they think.

Mobile phones and text messaging make this kind of data collection cheaper, faster and more imaginable than ever before. It's possible to use text messaging to ask simple questions of lots of people and aggregate the answers. It's possible for everyone to be a citizen monitor of local aid and built projects.

It also raises new privacy questions, runs the risk of putting yet another layer of funder demand into the equation, and, in worst case scenario, would violate the trusted relationships between residents and organizations that are doing good work.

Think about how eBook readers as the delivery system for reading material could change what we know about what kids are reading - this is what WorldReader hopes to make happen. The NY Times has been writing a great series on "Humanities 2.0," about how technology is changing how we teach and learn about literature and history. And massive data bases let medical researchers search other scientists data sets for unexpected applications of certain drugs or insights from one study that could be just the breakthrough they need in their field. Sergey Brin has donated $50 million to bring this kind of massive data mining to work on research in Parkinson's disease.

The data-rich environment we know inhabit is not made up solely of quantitative data - numbers. It also includes massive numbers of stories, words, pictures, movies, audio files. Each bit by itself may be little more than an anecdote. Taken together, they become data. The sheer quantity of these "bits" or "anecdotes" change what we think of as data, how we mine the data, and what we look for. The size of the source is changing the nature of the data we use and the stories we will tell.

1 comment:

Bradford Smith said...

Great post Lucy. It it's digitized its data. The 2.5X rise in the frequency of the word "philanthropy" between 1990 and 2009 correlates nicely with the 2.4X increase in the number of foundations from 32,401 to 76,545. Their giving increased more than 5X during the same period.

brad