menu

Submission

submission voting
voting is closed.
introduction
title
ClinEpiDB: Global Health Data at Your Fingertips
short description
ClinEpiDB allows global health researchers to share data, use data shared by others, and explore and visualize data in the browser.
Submission Details
Please complete these prompts for your round one submission.
Submission Category
Data sharing
Abstract / Overview

Data sharing has transformed discovery research in the genomics arena, but has not yet become the norm for epidemiological datasets. ClinEpiDB is an online resource that attempts to bridge this gap by facilitating appropriate sharing of data from clinical & epidemiological studies/trials. We integrate data using an ontology-driven approach, providing metadata & supporting documentation for context. An intuitive point-and-click website supports data browsing, subsetting & visualization, enabling diverse users to explore potential associations, or download data for advanced analyses. By facilitating access to and interrogation of high-quality, large-scale datasets, ClinEpiDB aims to spur collaboration & discovery, enhancing global health.

 

Team

The ClinEpiDB team leverages infrastructure & staff developed in the context of the Eukaryotic Host, Pathogen & Vector Bioinformatics Resource Center (VEuPathDB.org), & includes 70+ data scientists on 4 continents. All staff working on these resources– regardless of their physical location or source of funding– are managed through collaborative teams, with performance assessed on the basis of the work we make possible by others, rather than traditional academic metrics like publications. We collaborate in real time across multiple time zones, using Slack as our office and regular Zoom calls & shared documentation to keep the team working in harmony.

ClinEpiDB encompasses ~15 FTEs, driven by epidemiologists, but also including data specialists, ontologists & outreach staff, in addition to core infrastructure support.  Outreach specialists have domain expertise in global health & epidemiology and engage with the scientific community to identify & prioritize key datasets for integration, and educate relevant communities on platform use. Data specialists process & integrate the data, collaborating with ontologists on mapping to harmonized ontologies. Principal investigators maintain funding & provide long-term project vision.


 

Potential Impact

Over 20+ years, the VEuPathDB team has become adept at integrating increasingly diverse Omic-scale datasets, and is widely viewed as an honest broker in facilitating data sharing. ClinEpiDB was developed in response to a 2015 request from NIH to explore whether the relational schema that has been so successful at integrating genome informatics datasets, enabling researchers to ask their own questions, might also accommodate epidemiological datasets from the Int’l Centers of Excellence in Malaria Research. A pilot project focusing on a longitudinal cohort study in Uganda with extensive clinical, socioeconomic & entomological data was released in 2018, and has since been supplemented by numerous other studies.  By facilitating the structuring, integration, interrogation and sharing of epidemiological data, the ClinEpiDB platform has proved appealing to a variety of data providers and funders, allowing us to establish a dedicated ontology team, expand expertise in other areas of infectious disease, maternal, newborn & child health, etc.improve tools for data analysis & visualization, and greatly reduce the per-study effort and cost of data loading.

Our data sharing policy is straightforward: while we are happy to facilitate fulfillment of the data release requirements imposed by funders, publishers, etc.study teams always retain ownership of their data in ClinEpiDB. We work as an extension of the study team: helping to define variables, harmonize ontologies, clean data, organize metadata, and provide tools for analysis. Interestingly, this has often proved to be the fastest route to public data release: by providing tools that help the study team to understand their own data, we enhance enthusiasm for sharing, triggering favorable attention from others and expediting scientific discovery, impact & policy implementation. A tiered system enables study teams to appropriately manage data access requests; nothing is ever released until both the data providers and ClinEpiDB agree that it is ready to go live. Providers control the level of data access, deciding whether permission is required to view and/or download data, and how data requests are reviewed. 

While many data repositories support clinical and epidemiological data deposition, we believe that ClinEpiDB is unique in both making data & metadata available and providing transparent, user-friendly tools and interfaces that lower barriers for data exploration & analysis directly within the browser.

 

Replicability

ClinEpiDB promotes FAIR data principles to make data Findable, Accessible, Interoperable and Reusable. 

  • We use unique and stable study identifiers to make data Findable, and include site search functionality that allows users to identify datasets and variables of interest. 
  • We make data Accessible by providing functionality for variable browsing, access to codebooks, and a simple process for downloading public data or submitting data access requests. Online analysis tools enable exploratory data analysis directly within the browser. 
  • ClinEpiDB makes data Interoperable through extensive use of ontology-based approaches for semantic harmonization: all identical variables, even across different datasets, are mapped to the same label. Data downloads are provided in machine-readable format. 
  • Reusability is fostered by study pages for each dataset that include a summary of objectives and methods, and links to publications and study documentation such as protocols and codebooks. The ease of data analysis within the platform, and ability to share online explorations & analyses with colleagues further supports data reuse. 

The ClinEpiDB approach to data sharing can readily be replicated in similar database instances constructed elsewhere, as all code underlying the ClinEpiDB platform (and the broader VEuPathDB project) is publicly available in GitHub.


 

Potential for Community Engagement and Outreach

It is a truism that scientific advances build upon prior research, which means that in this data-rich era, it is imperative that high quality datasets be Findable, Accessible, Interoperable, and Reusable if they are to fulfill their potential.  Data sharing is an integral part of the entire research process, and increasingly required as funders seek to enhance transparency, preserve research outputs, and get the most out of their investment, while publishers strive to ensure reproducibility and impact. But data sharing is more than an obligation: it directly benefits data producers by fostering collaborations, increasing visibility and making their work more fruitful. 

Over the past generation, FAIR access to genomic-scale datasets has revolutionized all aspects of biomedical research, but has been a challenge for epidemiological datasets. Our experience demonstrates that working closely with data providers during the process of data sharing results in data that is cleaner and well-structured, with accompanying metadata that is clear and complete. 

 

Supporting Information (Optional)
Include links to relevant and publicly accessible website page(s), up to three relevant publications, and/or up to five relevant resources.

comments (public)