Still Open Ended

May 28, 2013 by

Open data is “in”.[1] Only last week President Obama issued an executive order to make all US government data “open and machine readable”. Several national and city governments across the world have already been doing this for several years. In 2009, the US federal government started the trend when it launched its open data platform, following close on the UK government’s initiative. These open data platforms and initiatives are premised on the supposed benefits of open data: make institutions transparent, cities “smarter”, improve service provision, revolutionise economies and improve businesses, among others. However, my experiences at Transparent Chennai suggests that data is often a messy thing, and while opening data may have some benefits, we need to address problems with data in general before pushing for open data.

One of the principal problems with government data is that it is often not situated in the context in which it is created and used. For instance, one issue with existing data in Chennai is that it often excludes or under-represents the disadvantaged. One plausible explanation for this is that government data is created when there is an interaction between the state and the individual/society, and many communities and people – unintentionally or by design – have limited access to the state. Such an interaction creates a situation where data “under-represents those less likely to be part of data producing interactions”.[2]

For instance, in the case of slums in Chennai, it has been nearly three decades since the government of Tamil Nadu recognised new slums. Because many slums are not officially recognised or notified, they do not have formal access to basic municipal services, although communities in the slums have acquired some of these by informal means. Communities in non-notified slums interact with the state in a very limited and mostly informal way. Importantly, by not recognising non-notified slums, government agencies like the Tamil Nadu Slum Clearance Board absolve themselves of the responsibility to develop them. The Census 2001, the country’s largest official data drive, similarly under-counted slum populations and emphasised the absence of large sections of the population in official records.[3] The demand to make such data open, without correspondingly emphasising on the quality of the data itself, can reinforce existing problems, many of which disproportionately affect the vulnerable.

A related problem is that existing data collection and storage processes are not immune to bias and often reproduce the social and political prejudices with which they were created. Numbers do not speak for themselves, often reproducing the biases with which they were created. A study in the city of Hyderabad revealed that that an assortment of crimes against women recorded by the Cyberabad police under the category “outraging the modesty of a woman”.[4] This category reproduces the patriarchal stereotype that women are supposed to be “pure” and “modest” and would most likely influence the interaction between police and victim. For instance, if a woman reporting a crime appears to be immodest, how do the police deal with her? Also, how do they record the crime committed against her and would they be impartial to the prejudices they harbour?

The push for open data is very real, but we need to take a step back to acknowledge and analyse the nature of our existing data and how it is created, collected, organised and stored. Clearly, several value judgements, biases and design considerations may skew datasets. While it has been argued that opening this data – even if it is of dubious quality – will allow for comment and analysis, it is important to recognise that such a policy may engender several attendant problems. One concern is that everyone cannot use open data in the same way. Large and unstructured data sets can be mixed together and analysed using software but this may be accessible only to large organisations and enterprises, and people with very specialised knowhow will be able to use such data, maybe even in a self-serving way. In other words, this differential suggests that open data is more open for some.

Another concern with open data is the privacy of the individual. While there are techniques to de-link data from individuals and make data anonymous, many scholars believe that there are significant risks of re-identification. For instance, many activists in India have objected to the UIDAI’s Aadhar project on the ground that it could violate people’s right to privacy. The project issues a unique identification number to all residents of India.[5] This number is linked to demographic and biometric information that will be used to target certain sections of the population for government services and schemes. However, biometric information is sometimes unreliable, there are serious concerns about sharing data with third parties and significant risks of hacking and identify theft. While the proponents of Aadhar claim that it may be the only way to effectively target the poor for schemes and services, there are others who caution that “the demand to trade-off one freedom for another, say the invasive loss of privacy for ‘development’, is an untenable demand”.[6]

While open data initiatives are multiplying, the concerns surrounding both data and open data need to be interrogated and addressed simultaneously. Open data may have the power to radically change governance but its success hinges on everything democracy hinges on as well: “functional institutions, the rule of law, political agency, and press freedom”.[7]

Written by Vinaya Padmanabhan, researcher, Transparent Chennai