From Data to Information

April 03, 2009

Originally, a long time ago, I became interested in programming primarily as a means of information visualization. (Well, truthfully, back in 2000/2001 I wanted to have my own blog but I got distracted and eventually learned how to do other things.) Anyway, once I discovered some of the ideas in the field ofinformation architecture and, later, through the IxDA I was hooked. One of the principle ideas that stuck with me is that information must have a structure and must be searchable in order to be useful to a consumer. This seems like a really fundamental, simple, idea. Unfortunately at that time (and now, arguably) very few people really understood this idea, or even thought about it. The more and more I thought about it, too, the more I realized that data, that ephemeral substance that we all capture in databases, is fundamentally without meaning and value. Data in its most basic form is merely a point or set of points on a continuum – e.g. 4, 15, 6. But when you provide additional information to the members of your data set, you suddenly gain additional information: Lemonade sales were 4 times higher when the temperature was 15 degrees higher than average in June. There’s a distinct information lifecycle that most people are aware of, but here it is anyway if you’ve forgotten it:

Data is given meaning and becomes Information which is processed to become Knowledge which leads to Understanding

Now, the knowledge and understanding part, that deals with the messy process of thinking and ideation. While thinking and ideation are fascinating topics, they deal with a number of individual biases and things like psychology and neurology and other things that I don’t find as interesting. As such, I’m not going to really deal with them at all. So, we’ll pretend they don’t exist for the purposes of this discussion. Essentially, to get back to the main point, one of the most fascinating things to me is the process used to give meaning to data in order to turn it into information. What’s also interesting is that there is seldom a single right answer, as so many of us have discovered. The right answer depends on the question asked and the important thing that we do, as keepers of data, is enable people to ask those questions and turn facts and figures and meaningless piles of data into information for the purpose of making business decisions (which hopefully doesn’t turn that 4, 15, 6 into ‘In the 4th quarter, revenue was down 15% so we’re laying off 6 of you’). Equally as important is metadata. The more data you can collect to describe another data point, the more meaning you can give to that data point. Let’s look at a similar example: On August 16th, our little lemonade stand sold twice as much lemonade as was sold on August 23rd. Without any additional data to describe August 16th and August 23rd, we aren’t able to do any more than report raw sales figures. But what happens if we track temperature and other weather conditions? Maybe August 16th was particularly hot. Maybe it rained all day on August 23rd. Add in more information about local events. Suddenly we have an additional descriptor for these two days and we know that there was a town parade on August 16th that passed right by our lemonade stand. This is the type of metadata that turns meaningless data points into valuable pieces of information. By leveraging technology it’s possible to easily associate these points of data with their descriptors and build a meaningful piece of information that is surrounded by descriptive metadata that enables rapid decision making and facilitates easier search and browsing related topics and ideas. Where does that leave us? Well, if the point of information is to be processed into knowledge that enables understanding then it’s fairly clear. Information retrieval systems need to provide as much context as possible to the underlying data points. The information storage systems need to be designed in a way that facilitates data collection and storage. Specifically, storage systems need to be designed in a way that allows for the storage of diverse types of metadata – documents, images, raw text, audio, and video files all need to be stored to enable the transformation of raw data points into information. At what point do we stop collecting data and start aggregating data from disparate sources? There comes a point when we simply can’t store enough data fast enough from all potential collection sources. At this point, we need to rely on others to help us turn our data into information. In turn, there is probably some of our data that will help our data providers turn their data into information. Slowly but surely, our growing need to turn data into information via included metadata will enable us to access an increasingly complex and interconnected world.