PubMed® contains a wealth of information about published research, however current interfaces make intuitive exploration and discovery difficult. Health science researchers, practitioners, and academics use PubMed to identify existing published studies and papers. Our approach leverages human-centered design principles, data science, and data visualization to reuse PubMed data in novel ways, enabling exploration and discovery of existing research. We also hope to potentially assist researchers in identifying gaps in fields where future research would advance individual biomedical study. Future engagement with academics and researchers will inform our iterative and incremental approach as we evolve and mature our reuse solutions.
LCG, Inc. sponsors the LCG CAPEX team. As an industry partner to NIH, LCG provide technology support at multiple Institutes. Our work initiated with the on-boarding of our summer intern. We collaborate using Teams and SharePoint sites while also publicly sharing code, plans, and direction on a GitHub repository. As an all-female, multi-racial team bringing expertise from a variety of disciplines and at different career stages, we exhibit multiple diversity dimensions.
Carolyn M. Hennings, MSDA, PMP, ITIL 4 Strategic Leader, leads the team’s vision and direction. Bio: Solutions Architect with over 30 years of industry experience focused on strategic use of data to advance and enable business and scientific outcomes.
Vrinda Bhatu, University of Maryland Information Systems Master’s student, Data Analyst Intern: Responsible for data management, feature extraction, exploratory data analysis, building machine learning models, and data visualization.
Kaira Johnson provides our human-centered design. Bio: User Interface/User Experience Designer bringing 10 years of experience with front-development, design, and marketing to create modern, innovative, intuitive applications.
Amy Talon assists with communication, collaboration, and logistics.
The LCG CAPEX team’s journey to create novel views into PubMed data began in June 2022. In just three weeks, we’ve generated tangible and promising results for future iterative, incremental, discovery driven development.
On the human-centered design side, we analyzed the existing PubMed search interface and imagined possible alternatives for interacting with the data to enable exploration. We’ve developed a few website page mockups, available on our GitHub repository, for future review and collaborative revision with bioinformatics experts, academics, and researchers representative of our target audience.
On the data side, we loaded a small subset of PubMed data into a free version of a MongoDB Atlas cloud Database-as-a-Service cluster and began profiling, understanding, and visualizing the data within the MongoDB ecosystem. We used Python code embedded in Jupyter Notebooks, available in our GitHub repository.
We believe the combination of human-centered design with data visualization techniques will produce interesting and engaging methods for researchers to interact and explore PubMed data.
As we progress, the importance and relevance of various reuse principles will arise. Our initial development considered the following FAIR principles:
The LCG CAPEX team used common tools for data extraction, analysis, and manipulation, for example, Python, Jupyter Notebooks, Pandas, and scikit-learn. We publish and document all code on our GitHub repository.
LCG launched our CAPability EXcelerator - CAPEX (keɪp-eks), a research and development platform with one mission: LCGer’s collaborating on transformative Digital Capabilities for our Public Sector Partners. CAPEX includes advisors, partners, and academics pacesetting and influencing transformation via rapid prototyping and experimentation. CAPEX contributors embrace a continuous learning culture, creating new opportunities and solving problems. The LCG CAPEX vision is to improve the human journey by Reimaging Possibilities and Co-creating Value!
The LCG CAPEX team’s envisioned solution for providing innovative ways to explore PubMed data leveraging data reuse techniques could potentially provide a catalytic impact on the bioinformatics field and assist researchers across all biomedical research fields. The ability to intuitively explore and discover existing research, connect with researchers having similar interests, and potentially discern gaps in published research for the identification of new studies would dramatically speed the research planning process.
In alignment with our mission and vision, we hope to inspire and intrigue experts in the bioinformatics field to engage with us to advise and advance this development.