You are here

The quest for data

Submitted by sverma on Thu, 01/02/2014 - 13:07

Note: This post is about OLPC XO laptops and Sugar learning platform. This post does not apply to the Android-based XO Tablet.

As the OLPC project grows around the world, the quest for data increases. This is really more like the quest for information - data are just a collection of observations. It would be remiss of me as an Information Systems professor to not pass on that information to you (no pun intended). Unless we place data into a frame of reference, it's quite useless. For instance, what does 2, 3, 5 mean? You may say the first three prime numbers, but I say it's the number of pineapples, bananas and oranges I bought at the local market. Frame of reference is key! Processing data into information requires that we know what we have observed (data) and what we are looking for (information). Are we looking for aggregates (such as frequency of use) or correlation (such as usage of TuxMath is higher after school hours).

While we started looking for such data a while ago, it looks like the level of interest has grown. Here are the different efforts I know of:

1) Paraguay Educa

I had exchanged a few emails with Bernie Innocenti, Raúl Gutiérrez Segalés and Morgan Ames a while back, when we started to look at metadata collection and analysis (note that the two are different). Most of their work lives in a git repository online, but we were unable to make it work on our setup. It appears to have two components: a Python-based collection system, and a Ruby-on-Rails based visualization system.

2) OLPC Jamaica

This was largely built by Leotis Buchanan, with some input from me. The script traverses the /library/users location on the XS school server and gathers the metadata from Sugar Journal backups. The collected metadata is then expressed as a comma-separated or CSV file. Later, Leotis added a second export mentod to export the data as json. This allows us to extend the collection to an "Analytics" system. The Analytics system is a selective visualization setup using CouchDB.The json is pushed into a CouchDB database on the XS. Then selective data aggregates are produced via views in CouchDB and displayed using JavaScript. This step happens on the local XS (views for teacher and principal). This couchdb can also be synchronized with a central couchdb, where the visualization may be different, perhaps for a Ministry of Education.

data fields


3) OLPC Australia

Newer builds of Sugar from OLPC Australia show a harvest system in the control panel. Martin Abente Lahaye tells me that most of this is based on work at and At a cursory glance, the system appears to gather the metadata, and push it to a location on the Internet. I need to explore this a bit more. This system is different from the others in that the metadata is pushed to a central server, skipping the intermediate school server. It "harvests" the metadata, but it does not look like the system visualizes or analyzes it.

4) OLE Nepal

Martin Dluhos, who attended the OLPC San Francisco Community Summit, has moved on the Nepal. Martin, myself and Leotis had some conversations, and it looks like Martin is extending the work that Leotis and Raul did into something relevant for OLE Nepal. His git repo is at

5) Bhagmalpur

All the methods discussed above pulldata from Sugar datastore. This last approach is different in that it sits a level below Sugar itself and can capture events related to icons clicked, collaboration, local vs networked, etc. In fact, it probably gathers more data than necessary. We have this running in Bhagmalpur, India, thanks to Anish Mangal and Activity Central. It is based on the Sugar Stats project.  The metadata sits on the XO locally, and gets pushed to a server (in our case, the XSCE) every so often. The data sits in a RRD format. We have a script that does RRD -> MySQL, where we can pull data as needed. This works fairly well, but we have had some minor operational glitches (we are switching over to PostgreSQL instead of MySQL, so we don't have multiple databases running on the server). The analysis part remains an exercise that I conduct in LibreOffice and R. We also don't have visualization here, but given the architecture of the Analytics system from OLPC Jamaica, we should be able to do a RRD -> json and populate the couchDB system relatively easily.

Given the somewhat similar need in all these projects, but with somewhat different approaches to architecture, I hope that we can get some convergence. I would like to see more work and polish on the Analytics system from Jamaica. This will help with giving the decision makers some feedback. I am sure most presentations would rather see pie charts and bar graphs rather than p-values! (By the way, EDUCAUSE has a good collection of information on learning analytics). In terms of analysis, it's difficult to do boilerplate work, because the analysis itself is very specific. While basic stats are easy to do using LibreOffice (or Excel for you lesser mortals) or SOFA or R, it goes back to the initial question: we should know what we have observed and what we are looking for.

Update: I got an email about two other projects in a similar vein. I have not used either, but I'm posting these for information: