Wikipedia Knows When Disease Outbreaks Are Coming

The online encyclopedia is a treasure trove of data. (Photo: Gil C/Shutterstock).

Wikipedia — savior of the homework-bound, settler of bets, provider of everything you ever needed to know about anything — may have a new hat to wear: Epidemic spotter. Scientists say that by tracking searches in the online encyclopedia, they were able to detect emerging diseases four weeks before health officials declared an outbreak.

Infectious disease is a leading threat to public health, and getting a handle on the risk and progress of an outbreak as soon as possible is key in mitigating the impact. Disease surveillance is traditionally based on visits to health providers and laboratory tests, according to the scientists from the Los Alamos National Laboratory, but these biologically-focused monitoring techniques are costly and slow. Due to these limitations, new techniques mining social media and search queries have come to light because people often search online before seeking medical help.

For this paper, published in Plos Computational Biology, the Los Alamos team poured over access logs from Wikipedia, tracking views of disease-related pages from 2010 to 2013 in the online encyclopedia. Using language to determine location, they then compared their data with disease outbreak information provided by national health surveillance teams.

Of the 14 disease-location contexts they looked at, eight cases displayed a "usefully" close match between the team’s estimates and the official data. Their statistical model allowed them to predict emerging influenza outbreaks in the United States, Poland, Japan and Thailand, dengue fever increases in Brazil and Thailand, and an uptick in tuberculosis cases in Thailand. And even more impressive, they were able to forecast tuberculosis and influenza outbreaks a full 28 days in advance.

Is the forecasting model perfect? Not exactly. In the case of Ebola for example, the study notes, Internet traces were hampered by poor connectivity in the countries affected by the disease — and at the same time, the global interest in the disease eventually rendered the forecasting model useless. But nonetheless, the results of the study are compelling and could lead to innovative ways of surveying and responding to outbreaks of infectious disease.

"A global disease-forecasting system will change the way we respond to epidemics,” says lead researcher Dr. Sara Del Valle. "In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast.”

The new study, the authors conclude, establishes Wikipedia as a tool in collecting important data for disease surveying, and puts forth a reliable, scientifically sound and effective disease surveillance system that addresses the gaps inherent in the traditional methods of tracking emerging illnesses.

"The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code, “ says Del Valle. "This paper shows we can achieve that goal."