With the exponential growth in image data collection, more advanced analyses are focusing on making full use of the high-dimensional images to improve personalized prediction and prevention strategies. However, batch effects arising from different machines often influence quantitative image features and can produce overwhelming variability between analyses. This thus presents a huge burden in replicability of results and data reuse. We provide a layer of normalization for digital imaging data (even of high resolution such as the mammograms in our project, each composed of 13 million pixels). This technique is being applied to several datasets to ensure uniform interpretation and dissemination.
Dr. Shu Jiang is Assistant Professor in the Division of Public Health Sciences at Washington University. She has strong training in statistical theory and methodology. In 2021, Dr. Jiang was awarded the prestigious MERIT award for breast cancer methodology study by the NIH. In 2022, she was recognized in the 40 under 40 Public Health Catalyst Award with her work in breast cancer prevention by the Boston Congress of Public Health.
Dr. Hufeng Zhou is a Research Scientist in Biostatistics Department of Harvard T. H. Chan School of Public Heath. He is a dedicated Computational Biologist with broad experience in proteomics, transcriptomics, and genomics research for more than a decade and an experienced developer for bioinformatics databases. He has extensive experience in leading bioinformatics research team as junior faculty member in Harvard Medical School.
Drs. Jiang and Zhou were officemates during their postdoctoral fellowship at Harvard School of Public Health. Dr. Jiang will be responsible for statistical methods development while Dr. Zhou will bring expertise in computational infrastructure for implementation into public databases. They will meet weekly to move the project forward.
"Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process." [Wilkinson MD. et al. Scientific Data (2016)]
The key to accelerate scientific discovery and maximize societal benefits hinges on data sharing and reuse. Through data sharing and reuse, dissemination of the scientific results will be of interest not only to the statistical and bioinformatic community, but also to the empowering general population who have their routine digital imaging health care data in the electronic health record. This can increase their awareness and involvements in their own personalized health care, so that each individual can contribute of their own portion.
Therefore, maximizing data sharing and reuse will improve the quality of health care research, the well-being of general population, and have societal benefits extending into the future.