menu

Submission

submission voting
voting is closed.
introduction
title
Leveraging public data for T1D disease etiology
short description
We hypothesized that public data could be used to find similarities between bacterial GAD and human GAD65 with implications for T1D.
Submission Details
Please complete these prompts for your round one submission.
Submission Category
Data reuse
Abstract / Overview

We utilized in silico analyses to show that 25 GAD sequences from human gut bacterial sources show sequence and motif similarities to human β cell GAD65. Our motif analyses determined that most bacterial gut GAD sequences contain a functionally important enzymatic site that overlaps with human GAD65. We also showed overlap with known human GAD65 T cell receptor epitopes suggesting a role for β cell immune destruction. Thus, we propose a physiological hypothesis in which changes in the T1D gut microbiome result in a release of bacterial GAD, causing miseducation of the host immune system. Due to the notable similarities, we found between human and bacterial GAD these deputized immune cells could target human β cells and have a role in T1D.

Team

the(sugar)science team hails from broad backgrounds and disciplines but share a common goal, to further the work towards a cure for Type 1 diabetes (T1D). We value diversity in thought and voice as it fosters innovative, novel strategies to approach the field of T1D.

Additionally, most of our team has a close connection to T1D, and as such, we are highly motivated to assist the scientific community who study T1D. 

Our teams work entirely remotely, and to write the paper, we coordinated across time zones in Canada, California, Texas, Tennessee, and New York. Oversight was conducted by Dr. Monica Westley. Two of the early career scientists served as bioinformatics leads, researched and wrote the paper. Research consisted of surveying the current microbial and human GAD literature using NIH and dkNET resources. Our entire research pipeline is described in our paper. 

While the initial datasets are from publicly available databases, the analysis files and code we generated were released publicly to ensure reproducibility. For files, we used the Open Society Foundation (OSF.io) framework and any code we generated was published in GitHub, both under MIT licenses. 


 

Potential Impact

This project evolved from a hypothesis proposed by MW following a deep dive into the current microbial literature following feedback from clinicians and Type 1 diabetes (T1D) patients describing gut dysbiosis at T1D diagnosis. Over 4 months, our team gathered digitally to mine the pre-existing microbial genetic data and compare it to human GAD65, one of the first autoantibodies in the majority of human T1D patients. 

Our team used the pre-existing databases to conduct all the analyses, facilitating easier data availability and workflow reproducibility. All the GAD protein sequences were curated from the NCBI database, and the T cell receptor epitopes were obtained from a literature review. In addition, our entire research pipeline has been described in our manuscript, which ensures research transparency. For the ease of other students and researchers, we have hosted all the datasets used on an open-source research data repository.

Our team believes that one of the key components of data sharing is checking for existing open-source datasets. If a current dataset can be put to use to answer a research question, it greatly boosts research reproducibility and reusability. Another good practice would be to make all the open source data used for a workflow, accessible on a research repository or via API calls. It saves plenty of time and resources to have data available at one portable instead of clicking through various links and extracting the data. It is also important to identify the licenses under which data is available and distributed. 

It is indeed compelling that our team used 100% open-source pre-existing and reliable data for the hypothesis under consideration. Although this isn’t always the case for other research questions, researchers should always be encouraged to use current data sources. In addition, we have elaborated on the various tools we have used for our analysis. This enhances research transparency and reproducibility.  
 

Replicability

Through a big data approach, literature curation, and the use of the publicly available NCBI Genes database, we leveraged existing human GAD65 and microbial GAD sequences and T-cell epitopes generated for other purposes to understand the implications of GAD65 in gut mediated autoimmune type 1 diabetes. During this process, existing tools and other databases were utilized in the process of multiple sequence alignment (Weblogo, ClustalW) and visualization (CLC Sequence Viewer), the process of motif discovery (MEME tool, Pfams), phylogeny construction and visualization (MAST, interactive Tree of Life). Our motif analyses determined that most gut GAD sequences contain the pyroxical dependent decarboxylase (PDD) domain of human GAD65, which is important for its enzymatic activity. Additionally, we showed overlap with known human GAD65 T cell receptor epitopes, which may implicate the immune destruction of beta cells. Based on these observations, a hypothesis was generated in which changes in the gut microbiome in those with T1D result in a release of bacterial GAD, thus causing miseducation of the host immune system in T1D.

Others could use this same protocol to examine individual and group microbial genetic overlap with human T1D autoantibody sequences. As we are becoming aware, T1D is an endotypal disease and, as such, may demonstrate a multitude of microbial mimetics that drive autoimmunity in each situation.

Potential for Community Engagement and Outreach

The major benefits of data sharing and reuse fall into two main categories: increasing the impact of published datasets and creatively solving complex problems. The idea that a plethora of data exists in the literature and can be interpreted through the current lens of scientific fields is gaining more credence. A singular publication is limited in the scope of hypotheses it can address due to investigator expertise, access to resources, and journal stipulations. Publicly sharing datasets, though, expands upon the scope by which one set of researchers can tackle in a given amount of time. Going back into this data armed with new hypotheses from recent laboratory experiments can be very informative and generate novel hypotheses. Lastly, these datasets allow researchers to collaborate to find innovative solutions for extremely complex biological processes, particularly multifactorial diseases. 

Supporting Information (Optional)
Include links to relevant and publicly accessible website page(s), up to three relevant publications, and/or up to five relevant resources.

comments (public)