Welcome To Website IAS

Hot news
Achievement

Independence Award

- First Rank - Second Rank - Third Rank

Labour Award

- First Rank - Second Rank -Third Rank

National Award

 - Study on food stuff for animal(2005)

 - Study on rice breeding for export and domestic consumption(2005)

VIFOTEC Award

- Hybrid Maize by Single Cross V2002 (2003)

- Tomato Grafting to Manage Ralstonia Disease(2005)

- Cassava variety KM140(2010)

Centres
Website links
Vietnamese calendar
Library
Visitors summary
 Curently online :  74
 Total visitors :  7644270

Organ-delimited gene regulatory networks provide high accuracy in candidate transcription factor selection across diverse processes

Organ-specific gene expression datasets that include hundreds to thousands of experiments allow the reconstruction of organ-level gene regulatory networks (GRNs). However, creating such datasets is greatly hampered by the requirements of extensive and tedious manual curation. Here, we trained a supervised classification model that can accurately classify the organ-of-origin for a plant transcriptome.

Rajeev Ranjan, Sonali Srijan, Somaiah Balekuttira, Tina Agarwal, Melissa Ramey, Madison Dobbins, Rachel Kuhn, Xiaojin Wang, Karen Hudson, Ying Li, and Kranthi Varala

PNAS; April 23, 2024; 121 (18) e2322751121; https://doi.org/10.1073/pnas.2322751121

Significance

Our study develops a machine-learning framework for building unbiased gene expression datasets for each organ, and to infer organ-delimited gene regulatory networks. We show that this approach is successful at predicting which transcription factors (TFs) are going to regulate processes at an organ level. We validated the accuracy of the predictions for TFs using the seed lipid synthesis pathway as a case study. We demonstrated a robust success rate for recalling known and predicting previously unknown TF regulators of seed lipid biosynthesis. The approach described in this study is broadly applicable across any organism (plant or animal) that has a large body of public gene expression data.

Abstract

Organ-specific gene expression datasets that include hundreds to thousands of experiments allow the reconstruction of organ-level gene regulatory networks (GRNs). However, creating such datasets is greatly hampered by the requirements of extensive and tedious manual curation. Here, we trained a supervised classification model that can accurately classify the organ-of-origin for a plant transcriptome. This K-Nearest Neighbor-based multiclass classifier was used to create organ-specific gene expression datasets for the leaf, root, shoot, flower, and seed in Arabidopsis thaliana. A GRN inference approach was used to determine the: i. influential transcription factors (TFs) in each organ and, ii. most influential TFs for specific biological processes in that organ. These genome-wide, organ-delimited GRNs (OD-GRNs), recalled many known regulators of organ development and processes operating in those organs. Importantly, many previously unknown TF regulators were uncovered as potential regulators of these processes. As a proof-of-concept, we focused on experimentally validating the predicted TF regulators of lipid biosynthesis in seeds, an important food and biofuel trait. Of the top 20 predicted TFs, eight are known regulators of seed oil content, e.g., WRI1, LEC1, FUS3. Importantly, we validated our prediction of MybS2, TGA4, SPL12, AGL18, and DiV2 as regulators of seed lipid biosynthesis. We elucidated the molecular mechanism of MybS2 and show that it induces purple acid phosphatase family genes and lipid synthesis genes to enhance seed lipid content. This general approach has the potential to be extended to any species with sufficiently large gene expression datasets to find unique regulators of any trait-of-interest.

 

See https://www.pnas.org/doi/10.1073/pnas.2322751121

 

Figure 1: ML classifier creates organ-specific gene expression datasets. (A) Organs are the main determinants of global gene expression as demonstrated by the strong organ-driven clustering in this reduced dimensionality (tSNE) plot. (B) KNN-based classifier achieved high accuracy (weighted F1 = 0.95). Precision, recall, F1 are covariate and related to the number of samples in the training set (B, numbers in red). (C) Confusion matrix of true vs. predicted labels from the KNN-based classifier shows that the majority of mislabeling is due to biological ambiguity of organ labels. (D) Most TFs are expressed across many organs (normalized expression shown).

 

Trở lại      In      Số lần xem: 81

[ Tin tức liên quan ]___________________________________________________

 

Designed & Powered by WEBSO CO.,LTD