Data Disruption Hits Healthcare | Computation Institute

By Rob Mitchum // May 13, 2014

The hot buzzword in the tech world right now is “disruption,” the concept that one clever idea can completely shake up a stale industry, leading to new practices and big profits. Companies such as Amazon, Skype, and iTunes have dramatically changed how book stores, phone companies, and music sales work, with sometimes controversial results. But for many reasons, health care has largely resisted major tech-driven revolutions so far, its massive bulk and entrenched interests providing disruption-proof armor few other industries can boast.

But at last week’s Big Data & Health conference, co-organized by the Computation Institute and the UChicago Center for Health and the Social Sciences (CHeSS), many of the speakers signaled that data-based change was on the way for health care and research. Innovation in high-throughput genomics, electronic medical records, cancer data analysis, computer aided diagnosis, and city data collection could soon make tomorrow’s medical experience completely different from what we encounter today. As Bryan Sivak, Chief Technology Officer of the U.S. Department of Health and Human Services (HHS) put it in his lunch talk, the sweeping changes of the Affordable Care Act combine with technology to create an opportunity for major changes.

“This is, I think, a moment in time where we’re seeing the potential for disruption is probably highest since maybe the creation of Medicaid and Medicare back in the 60’s,” Sivak (right) said. “We’re at one of these inflection moments right now where I think almost anything is possible.”

Since joining HHS in July 2012, Sivak has spearheaded an open data initiative at the huge department, which includes the National Institutes of Health, the Centers for Disease Control and Prevention, and the Center for Medicare & Medicaid Services. Through healthdata.gov and their annual Health DataPalooza event, HHS has made over 1500 datasets available to researchers, app developers, journalists and more to find new insights and applications.

“We want to build a real community of people that use data,” Sivak said. “It always amazes me when I see one of the datasets we put out used in a way that I never would have imagined it being used, and it happens literally every day.”

Some of those new uses were discussed in the conference’s first talk, a high-speed drive through the landscape of methods now available for working with data, by Matt Gee and Tom Plagge of Data Science for Social Good (DSSG) and Molly Rossow of Northwestern University. The trio discussed how free tools such as Python, MySQL, and Tableau will be used by students in the DSSG fellowship to tackle projects to optimize ambulance response times, locate uninsured Americans for sign-up efforts, and look for causes of maternal mortality in Mexico.

The Big Data of As, Ts, Cs, and Gs

For many researchers in biology and medicine, “Big Data” currently means, simply, genomics. Since the sequencing of the first human genome in 2001, genetic sequencing has become super-exponentially faster and cheaper, said Nancy Cox, Professor of Human Genetics at UChicago and CI Senior Fellow. Since each complete human genome contains 3 billion base pairs, the increasingly routine collection of patient genetic information leads to huge challenges in data storage and analysis that will need to be resolved soon in order to pave the way for personalized medicine and new understanding about the origin of common diseases, Cox said.

Andrey Rzhetsky, Director of the Conte Center for Neuropsychiatric Genomics and CI Faculty and Senior Fellow, raised the stakes even further with an additional element to genomic studies of disease: the environment. His recent massive study used county-level data to find a relationship between autism and genital malformation, an indicator of prenatal toxin exposure, which points to a role for unknown environmental factors in the recent rise of autism rates.

In cancer research, large efforts are already underway to make sure that the potential of this genomic data flood is realized. Robert Grossman, Chief Research Informatics Officer for UChicago Biological Sciences and CI Faculty and Senior Fellow, talked about his new model for biomedical computing, focusing on data “commons” shared by the scientific community. By capitalizing upon cloud computing technology, researchers may form “virtual comprehensive cancer centers,” that allow patient data to be stored and analyzed in a shared, secure environment.

“We have to figure out how we do this from a technical, policy, security and organizational collaborative fashion. These are the kinds of things we’re trying to work outover the next couple years,” Grossman said. “If we don’t organize right, we’re going to have a tragedy reflected in how we deal with our common resources, which are dollars, compute, and storage.”

In a more specific use of cancer data, CI Faculty and Fellow and Director of the Center for Research Informatics Samuel Volchenboum (left) talked about the international effort to bring together data on neuroblastoma. With only 800 newly diagnosed cases each year, studying and running clinical trials for neuroblastoma can be very challenging. So researchers formed the International Neuroblastoma Risk Group Database, collecting data on the disease from patients around the world. Volchenboum spearheaded an effort to make it easier for neuroblastoma researchers to use the database, query it with new study questions, and connect it to other pools of information.

Folding Data Into Medicine

If biomedical research on big data pays off, it will result in actionable information for doctors taking care of patients, said Sameer Badlani, Chief Medical Information Officer of UChicago Medicine. Badlani described one project, led by assistant professor of medicine Dana Edelson, that will use machine learning on patient data to predict a patient’s risk of heart attack while hospitalized. Ideally, the research will produce an algorithm that can “listen in on the data conversations” within the electronic medical records system, alerting hospital personnel when certain warning signs appear for a patient. The technology, Badlani said, is not unlike what online retailers use to predict what a customer might want to buy based on their past pageviews.

“Every other vertical business uses this,” Badlani said. “We hope to use it in health care in the next 2 to 3 years at the University of Chicago. These are very exciting times.”

In some research projects, data is already showing results as a clinical assistant. Maryellen Giger, Professor of Radiology and CI Senior Fellow, talked about using a different kind of big data, digital images, in the detection and diagnosis of breast tumors. Her research is now focused on finding imaging biomarkers that can be used alongside pathology and genetic information in research and personalized treatment for breast cancer — a sort of “imagomics.”

The Intertwined Health of Cities and Citizens

An overlapping area of rapid growth in data-driven research and applications is urban studies, where pursuits include public health, epidemiology, and the impact of city infrastructure on resident health. Charlie Catlett, director of the Urban Center for Computation and Data and CI Senior Fellow, discussed several projects that touch upon health, such as The Array of Things — a network of sensor boxes placed around Chicago to measure data on environmental factors and pedestrian flow. Catlett is also collaborating with the City of Chicago on a new data platform that can help officials proactively detect problems, described in a talk by Tom Schenk Jr., (right) the city’s Director of Analytics. Currently, the city is using algorithms that predict rat infestation or food inspection issues, testing whether city crews deployed according to these new systems are more effective than older procedures.

Another UrbanCCD project with a public health dimension is the research around the Chicago Lakeside Development, a new 600-acre neighborhood planned for the city’s South Side. UrbanCCD researchers are building a modeling platform for the architects designing the development, while collaborator Kate Cagney, associate professor of sociology and health studies at UChicago, leads a survey of residents in the surrounding neighborhoods, assessing health, socioeconomics, and access to amenities.

Some of the interactions between city life and health care are not apparent until the data is explored. CHeSS director David Meltzer used data from the University Health Consortium — a collective of Chicago academic medical centers — to look at the relationship between length of hospital stay and readmission. Previous studies suffered from the confound that patients who stay in the hospital longer are also likely to be sicker than short-stay patients, increasing their baseline chance for readmission. But by combining hospital data with Chicago weather data, Meltzer found that patients who stayed in the hospital longer due to snowstorms were less likely to be readmitted than long-stay patients in better weather conditions.

The result, which gets at the value of longer hospital stays for patients, is part of the idea of a continuously learning health care system, Meltzer said, where data, research and medical care form a feedback loop. The result would be less a disruption than a data-fueled realignment that shortens the long, slow path from clinic to bench and back to clinic.

“We need not just data but thoughtful ways to get information and knowledge out,” Meltzer said. “As we provide care to patients, that produces data. That data in turn allows science to be done, which in turn then produces evidence that can improve care…it seems so obvious, but it’s by and large not how we have practiced the whole time medical research has existed. And now it’s becoming a reality.”

[Photos by Andrew Nelles]