Data Science for Health Care Applications

Funded by National Institute of Aging and Texas Medicaid

Data mining and machine learning of large dimensional datasets is critical for understanding underlying relationships and ultimately improving healthcare outcomes. My current research is focused on the following:

  • Real-time online activity and mobility monitoring (ROAMM): Understanding the impact of intervening health events (e.g., episodic falls, injuries, illnesses, and hospitalizations) is an important emerging scientific area in geriatrics and gerontology. The key challenge that we are addressing is the collection of important pre-event patient data, which consists of both monitored data and patient-reported outcomes. Having this information is critical to developing effective interventions. With support from National Institute of Aging, we have developed a sustainable research infrastructure using a novel smart watch app called ROAMM (Real-time Online Assessment and Mobility Monitor). The app offers continuous and long-term connectivity and bidirectional interactivity and programmable measurements – a necessity for capturing data surrounding an intervening health event. We have been applying this framework to capture activity and mobility patterns of free-living adults and their reported outcomes, including pain and anxiety levels for a variety of health care studies.
  • Identifying superutilizers: We have developed machine learning methods that can be used to understand and approach the high utilizer problem in a large public insurance program. To improve cost efficiency, we must identify the root cause of high utilization, which requires transparent, data-driven methods with decision paths that can be explained in an understandable way. To date, we have conducted descriptive analyses, applied statistical approaches, and developed machine learning models to demonstrate that high utilizers can be determined, taking into account patient risk factors and geographical variations. Additionally, we showed that this high utilization persists over time and then tried to predict who will become a high utilizer. A monograph, Data Driven Approaches for Healthcare: Machine Learning for Identifying High Utilizers, was recently published.