menu

Submission

submission voting
voting is closed.
introduction
title
Creating Novel Views into PubMed Data
short description
Can we reuse PubMed data to help researchers identify gaps for future study? We want to find ways to intuitively explore published research.
Submission Details
Please complete these prompts for your round one submission.
Submission Category
Data reuse
Abstract / Overview

PubMed® contains a wealth of information about published research, however current interfaces make intuitive exploration and discovery difficult. Health science researchers, practitioners, and academics use PubMed to identify existing published studies and papers. Our approach leverages human-centered design principles, data science, and data visualization to reuse PubMed data in novel ways, enabling exploration and discovery of existing research. We also hope to potentially assist researchers in identifying gaps in fields where future research would advance individual biomedical study. Future engagement with academics and researchers will inform our iterative and incremental approach as we evolve and mature our reuse solutions. 

Team

LCG, Inc. sponsors the LCG CAPEX team. As an industry partner to NIH, LCG provide technology support at multiple Institutes. Our work initiated with the on-boarding of our summer intern. We collaborate using Teams and SharePoint sites while also publicly sharing code, plans, and direction on a GitHub repository. As an all-female, multi-racial team bringing expertise from a variety of disciplines and at different career stages, we exhibit multiple diversity dimensions.

Carolyn M. Hennings, MSDA, PMP, ITIL 4 Strategic Leader, leads the team’s vision and direction. Bio: Solutions Architect with over 30 years of industry experience focused on strategic use of data to advance and enable business and scientific outcomes.

Vrinda Bhatu, University of Maryland Information Systems Master’s student, Data Analyst Intern: Responsible for data management, feature extraction, exploratory data analysis, building machine learning models, and data visualization.

Kaira Johnson provides our human-centered design. Bio: User Interface/User Experience Designer bringing 10 years of experience with front-development, design, and marketing to create modern, innovative, intuitive applications.

Amy Talon assists with communication, collaboration, and logistics.

Potential Impact

The LCG CAPEX team’s journey to create novel views into PubMed data began in June 2022. In just three weeks, we’ve generated tangible and promising results for future iterative, incremental, discovery driven development. 

On the human-centered design side, we analyzed the existing PubMed search interface and imagined possible alternatives for interacting with the data to enable exploration. We’ve developed a few website page mockups, available on our GitHub repository, for future review and collaborative revision with bioinformatics experts, academics, and researchers representative of our target audience.

On the data side, we loaded a small subset of PubMed data into a free version of a MongoDB Atlas cloud Database-as-a-Service cluster and began profiling, understanding, and visualizing the data within the MongoDB ecosystem. We used Python code embedded in Jupyter Notebooks, available in our GitHub repository. 

We believe the combination of human-centered design with data visualization techniques will produce interesting and engaging methods for researchers to interact and explore PubMed data. 

As we progress, the importance and relevance of various reuse principles will arise. Our initial development considered the following FAIR principles:

  • Findable: Our exploration into the PubMed data reveals the National Library of Medicine’s (NLM) publication of data element (field) descriptions, document type definitions (DTDs), and XML Element Descriptions and their Attributes. Our project must identify these references while leveraging and revealing the metadata definitions through our visualizations to enable audience exploration and understanding. 
  • Accessible: NLM makes PubMed data in XML format available via FTP. Our initial use of a small subset of the PubMed data leveraged this reuse principle and our presentations of the data must point back to the original sources. 
  • Interoperable: As we import the PubMed data into other databases and make it available in alternative ways, we leverage the interoperability characteristics of the PubMed data. We must retain and maintain the source metadata information and supplement it as we find new ways of visualizing and exploring the publication citations.
  • Reusable: The LCG CAPEX team expresses our gratitude for NLM’s open terms and conditions making the PubMed data available for reuse with proper attribution. We envision releasing our work under one of the Creative Commons license options.
Replicability

The LCG CAPEX team used common tools for data extraction, analysis, and manipulation, for example, Python, Jupyter Notebooks, Pandas, and scikit-learn. We publish and document all code on our GitHub repository. 

Potential for Community Engagement and Outreach

LCG launched our CAPability EXcelerator - CAPEX (keɪp-eks), a research and development platform with one mission: LCGer’s collaborating on transformative Digital Capabilities for our Public Sector Partners. CAPEX includes advisors, partners, and academics pacesetting and influencing transformation via rapid prototyping and experimentation. CAPEX contributors embrace a continuous learning culture, creating new opportunities and solving problems. The LCG CAPEX vision is to improve the human journey by Reimaging Possibilities and Co-creating Value!

The LCG CAPEX team’s envisioned solution for providing innovative ways to explore PubMed data leveraging data reuse techniques could potentially provide a catalytic impact on the bioinformatics field and assist researchers across all biomedical research fields. The ability to intuitively explore and discover existing research, connect with researchers having similar interests, and potentially discern gaps in published research for the identification of new studies would dramatically speed the research planning process. 

In alignment with our mission and vision, we hope to inspire and intrigue experts in the bioinformatics field to engage with us to advise and advance this development. 

Supporting Information (Optional)
Include links to relevant and publicly accessible website page(s), up to three relevant publications, and/or up to five relevant resources.

comments (public)