Six federal departments and agencies today announced $200 million for Big Data. The initiative involves new commitments aiming to give scientists and policy makers betters tools and techniques for turning data into knowledge and power. Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy, claims:
In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security
What is Big Data?
Big Data describes data sets that, due to quantities of data or complexity of data, cannot be managed well with traditional tools such as databases and bar graphs. For example, if we could sequence the DNA of 10% of all the humans on earth, we might learn a lot about disease and evolution -- but how could scientists look at so much data efficiently? Already we have examples where data outstrips our analytical ability -- astronomical data, or super-collider data, for example.
As computers approach exascale capabilities, and distributed network computing becomes more widespread, the amount of data which might be hiding great discoveries mounts.
Will the Government Know That My Husband is Pregnant?Naturally, the thought of Big Government controlling Big Data gives pause for thought. Already, people are asking questions like: if I google "suicide" so I can try to help a friend who is down, will I be denied insurance some time in the future by a company mining my online profile?
Recent exposes on how companies use data mining to target customers suggest that our deepest secrets may be an open book to anyone with enough information about our habits. But efforts to manipulate large amounts of data often go astray. The case of the "pregnant husband" -- in which a man shopping for his wife's needs might be identified as pregnant -- may be cited as an example of how data can be mis-interpreted, and provides a comical analogy demonstrating the confounding factors for which we must account if data is to become knowledge, much less wisdom.
On a darker note, analysts reviewing the root causes for the incorrect belief that Iraq had weapons of mass destruction, a key underpinning to military intervention strategies, found that bad techniques for analyzing information were magnifying rumors. Worse, they gave credibility to the false conclusions because they had been reached using "scientific" analysis.
But there have always been lies, damned lies, and statistics. With Big Data we will certainly have to watch out for lies, damned lies, and Big Statistics. But these fears should not stop us from celebrating today's major stride towards making data work for us. And to be a bit optimistic, certainly at some point there will be enough data to convince climate denialists that the time for action has come (and perhaps gone, at which point we will need even more Big Data to predict what the heck we can do about it).
Data in the Service of HumanityToday's announcement approaches the topic of Big Data from many angles. How can we visualize large data sets? Can we work with encrypted data, without unencrypting it before the crunching starts, to improve data security? How can we store and access Big Data at speeds sufficient to solve problem and energy costs we can sustain?
The range of departments involved in the Big Data Across the Federal Government initiative offers a clue to how integral Big Data will be to our future. Obviously, the Departement of Defense (DoD) and Department of Homeland Security (DHS) want to be on the cutting edge of data management. The Department of Energy (DoE) and the National Aeronautics and Space Administration (NASA) have often lead scientific initiatives. But the Department of Veteran Affairs, Health and Human Services, the Food and Drug Administration, the National Archives and Records Administration each have a stake in how data will be managed tomorrow.
The themes of energy and climate change run strongly between the lines, as do promises that we can get the jump on fast-evolving superbugs. The prospects for leveraging biomimicry as we learn more about the biochemistry of life lurk in the shadows of Big Data on how proteins fold. A new age of discovery may be on our doorstep. We can't wait to see the Infographics that come out of this.