The Medical Information Mart for Intensive Care (MIMIC) is the first publicly-available electronic health record database that is used by thousands of researchers around the world to study important questions and build clinical algorithms in the intensive care unit. We propose to enhance MIMIC with new data types and sources, including publicly available population health datasets and develop new methods to identify biases in MIMIC and prevent them from being encoded in the algorithms. MIMIC will continue to be a resource for clinical research and increasingly sophisticated model development, advancing our understanding of critical issues of fairness and equity in healthcare, data science, and the broader society.
Over the past decade, MIT Critical Data has been emblematic of a different approach: the Village of Mentors. Founded initially around the goal of democratizing the use of electronic health record data to advance observational research and inform care, MIT Critical Data has evolved a broader paradigm for collaborative research that emphasizes intergroup collaboration wherein members are simultaneously mentors and mentees in real time depending on a project's context and requirements. Spanning the globe and disciplines they have worked together on countless projects leading to numerous publications and the proliferation of novel open source health databases.
The MIT Laboratory for Computational Physiology (LCP) developed and maintains the publicly available Medical Information Mart for Intensive Care (MIMIC). MIMIC II, III, and IV are the most widely-used electronic health records (EHR) datasets with open code-bases (MIMIC Code Repository, http://github.com/LCP/mimic-code). A large set of open-access educational materials have been developed by the LCP and shared with the global research community. Over 25,000 credentialed users in academia and industry from over 110 countries utilize the resource for clinical research studies and development of decision support tools. The current version, MIMIC-IV, contains highly detailed data associated with 76,540 distinct adult intensive care unit (ICU) admissions at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. We also released MIMIC-CXR, a large dataset of 227,835 imaging studies for 65,379 patients presenting to the BIDMC Emergency Department (ED) between 2011–2016, and MIMIC-IV-ED, a dataset containing encounters for 448,972 visits to the BIDMC ED.
A large set of open-access educational materials have been developed and shared with the research community. These materials include three textbooks and two open online courses that are available on the MIT Open Learning Library. The open-license textbook our group wrote, “Secondary Analysis of Electronic Health Records”, which takes readers step-by-step through a research project using MIMIC, has been downloaded 1.3 million times since its publication in 2016 and has been translated to Mandarin, Spanish, Korean and Portuguese. The online course “Collaborative Data Science for Healthcare”, which was released in November 2020, has enrolled more than 5000 learners from 110 countries. LCP produced two more open-access textbooks: “Global Health Informatics” was published by MIT Press in 2017 and “Leveraging Data Science for Global Health” was published in August 2020 and has been downloaded more than 300,000 times. Our datasets, textbooks and online courses serve as valuable resources for learners at secondary schools, universities, and hospitals who are developing the data science foundation needed to perform robust analyses of health data. Please see letters of support from hospitals and universities in the US and around the world.
To meet the FAIR Principles for Scientific Data Management (Findable, Accessible, Interoperable, and Reusable), the MIMIC databases are shared through the PhysioNet platform with detailed, machine-readable metadata on provenance, usage rights, data characteristics, and identifiers, including DataCite DOIs. The PhysioNet platform is integrated with cloud services to improve ease of data access and reduce data heterogeneity resulting from local computing infrastructure. Access to MIMIC and the eICU-CRD (Google Cloud, AWS) allows authorized researchers to carry out analyses with tools such as Colaboratory Notebooks and RStudio Markdown to create fully reproducible research pipelines.
The Fast Healthcare Interoperability Resources (FHIR) standard has emerged as a promising mechanism to share healthcare data across vendors in real-time and batch settings. Real-world datasets available in FHIR would accelerate research and development of data-driven algorithms. Existing datasets in FHIR are primarily synthetic, and cover a limited number of resources. We have reformatted the MIMIC-IV Clinical Database Demo into FHIR. The MIMIC clinical databases have received wide adoption and the constituent data are understood by the community. Translating MIMIC into FHIR provides a benchmark dataset for institutions to experiment with FHIR-based tools, and we hope this resource supports adoption and use of FHIR.
Over the past 5 years, the LCP has organized 38 international events in 16 countries across 5 continents. These include data hackathons, or datathons, and machine learning workshops. Using the Hive Learning model -- a form of multidisciplinary and multi-institutional collaborative learning -- several longitudinal partnerships have been developed with groups such as the Women in Data Science, MIT Beaver Works Summer Institute, and the East Bay Educational Collaborative (EBEC) of Rhode Island, a non-profit educational organization providing science, technology, engineering, and math (STEM) resources to more than forty urban, suburban and rural school districts. With MIMIC as their inspiration, the Amsterdam University Medical Center in the Netherlands and the Inselspital University Hospital Bern in Switzerland worked with the LCP and released their respective ICU datasets. The LCP is collaborating with the Khon Kaen University Hospital in Thailand and the Hospital Israelita Albert Einstein in Brazil in creating their ICU datasets and adopting the MIMIC model of data sharing.