This article was originally published by DIA in Therapeutic Innovation & Regulatory Science.
Authors: Beth Harper BSOT, MBA, Michael Wilkinson MPH, Raj Indupuri MBA, Sheila Rocchio MBA & Ken Getz MBA.
Background and Introduction
Volume, velocity, and variety, often referred to as the three V’s of big data, are well known concepts in the world of consumer data management. From the millions of Facebook posts, Instagram images, or tweets that social media companies manage to the wide variety of apps and connected devices that generate millions of data points, we rarely give any thought to how these organizations ingest, process, file, and retrieve the incomprehensible amounts of data that are generated at faster rates each day.
While the volume of clinical trial data may not compete with the scale of volume that Facebook manages, it has nonetheless grown 183% over the last decade. Across nine different types of e-clinical trial data, the majority of companies surveyed in a 2017 study anticipated that their use of these data sources would double or triple within a 3-year period. In fact, all of the nine data sources were predicted to be used by more than 50% of those surveyed in 3 years’ time. That study also found that average cycle time durations needed to achieve database lock were longer and more variable than those observed 10 years prior.
Additionally, a recent white paper published by the Society for Clinical Data Management expanded the three V’s to five V’s, adding veracity and value into the mix. This paper advocated that major shifts were needed to re-think clinical data management approaches and develop fit-for-purpose strategies. The paper posed these thought-provoking questions: Where and how, both logically and physically, should the increasingly variable clinical data sources be orchestrated to shorten the gap between data acquisition to data consumption? Across industries, the definition of formal Data Strategies has become a strategic imperative defining how an organization deals with its most important asset, its data. Life Sciences organizations have long taken conservative approaches to data strategy focusing on ways to maximize control and minimize risk. As data sources and assets grow in life sciences and research, new opportunities are created across the digital health continuum that may shift the data strategy balance from a control focus to one that is more flexible and provides a greater ability to create new customer and end user experiences.
The purpose of the Tufts eClinical Solutions Data Strategies and Transformation study was to further quantify and understand the magnitude and impact that expanded data volume, sources and diversity are having on clinical trials. Specifically, the study was designed to inform how life sciences organizations are managing the growing volume and variety of clinical research data to support drug development activities and prepare for new capabilities like artificial intelligence. In addition, this study also looked to identify and share best practices with regard to:
- The types of ways external/non-CRF data are being incorporated and leveraged in clinical trials
- How external data sources are influencing data management best practices
- The data platforms, analytics tools, capabilities and competencies that companies are putting into place to manage the growing number of external/non-CRF data sources.
Contending with a continuously expanding volume and variety of clinical data poses challenges and opportunities for the industry and clinical data management organizations.
Tufts CSDD conducted an online survey aimed at further quantifying and understanding the magnitude and impact that expanded data volume, sources and diversity are having on clinical trials. The survey was distributed between October and December 2019. Responses from a total of 149 individuals were included in the final analysis.
The survey found that companies use or pilot from one to six different data sources with the majority of respondents using or piloting 3–4 different sources of data in their clinical trials. The results showed that average times to database lock have increased an average 5 days compared to a 2017 study, possibly as a result of managing an even larger number of data sources. Finally, three key mitigation strategies surfaced as techniques respondents used to tackle expanding data volume, sources, and diversity: the creation of a formalized data strategy, investment in new analytics tools and more sophisticated data technology infrastructures, and the development of new data science disciplines.
Without further investments into infrastructure and developments of additional mitigation techniques in this area, database lock cycle times are likely to continue to increase as more and more data supporting a clinical trial are coming from nontraditional, CRF sources. Further research must be done into organizations who are handling these challenges appropriately.