duminică, 5 martie 2023

Bioinformatics XAI May - August 2021

1.

    The article describes the development of MISTy, a machine learning framework that extracts interactions from spatial omics data in a flexible, scalable, and explainable way. MISTy builds multiple views focusing on different spatial or functional contexts to dissect different effects and can be applied to different spatially resolved omics data without the need for cell-type annotation. The performance of MISTy is evaluated on an in silico dataset and three breast cancer datasets, demonstrating its applicability and potential for improving patient stratification. The flexibility of MISTy is also demonstrated by integrating different views to analyze intercellular signaling. Overall, MISTy provides a powerful tool for analyzing highly multiplexed spatial data in a way that can uncover novel insights into tumor progression and clinical outcomes.



      Fig. 1 - CNN Architecture.


2.

    The article discusses the need for improved methods to explain multimodal biomedical models. The authors propose using a gradient-based feature attribution approach called layer-wise relevance propagation (LRP) to explain the relative importance of different modalities in automated sleep stage classification. They trained a 1-D CNN with EEG, EOG, and EMG data and found that EEG was the most important modality for classification, followed by EOG and EMG. LRP provided consistent levels of importance for correctly classified samples and inconsistent levels for incorrectly classified samples, demonstrating its potential for explaining multimodal electrophysiology classifiers.


3.

    APRILE is an explainable AI framework that uses graph neural networks to identify molecular mechanisms underlying polypharmacy side effects. By identifying a set of proteins and associated Gene Ontology terms as mechanistic explanations of side effects caused by drug combinations, APRILE generates explanations for over 843,000 side effect-drug pair events across 20 disease categories. The framework provides new insights into molecular mechanisms of adverse drug reactions and facilitates the use of AI in healthcare.


4.

    The study proposes a pipeline for analyzing DNA methylation datasets for classification of controls and patients with different diseases. The pipeline includes data harmonization, machine learning classification models, dimensionality reduction, missing value imputation, and explainable artificial intelligence algorithms. Results show that harmonization improves classification accuracy, and dimensionality reduction and imputation methods achieve high accuracy. The approach is demonstrated using Parkinson’s disease and schizophrenia as examples, and explainable artificial intelligence approaches allow for explaining model predictions from both populational and individual perspectives. The proposed pipeline allows for solving data harmonization problems, imputing missing values, and building classification models of small dimensionality.


5.

    The study examined gender differences in risk factors for Cannabis Use Disorder (CUD) using a data-driven approach with machine learning and brain data. The study found that environmental factors (such as low education level and instrumental support) contributed more to CUD in women, while individual factors like personality, mental health, and neurocognitive factors played a more important role in men. The study suggests that these differences should be further investigated to develop more effective treatment approaches for CUD.


6.

    The study aimed to improve the explainability of deep learning models used for sleep staging with multimodal data. The researchers trained a convolutional neural network using EEG, EOG, and EMG data and proposed an ablation approach that replaces each modality with values that approximate line-related noise. The relative importance of each modality was consistent with sleep staging guidelines, with EEG being important for most stages and EOG for REM and nonREM stages. EMG had low importance across classes. The study suggests that a careful selection of an ablation approach may provide a clearer indicator of modality importance, and provides guidance for future research using explainability methods with multimodal electrophysiology data.


7.

    The study explores the role of variations in cis-regulatory elements (CREs) in driving the establishment of lineage-specific traits in plants. However, predicting expression behaviors from CRE patterns is challenging due to the complex biological processes involved. The researchers used cistrome datasets and explainable convolutional neural network (CNN) frameworks to predict genome-wide expression patterns in tomato fruits from the DNA sequences in gene regulatory regions. They developed a prediction model for a key expression pattern for the initiation of tomato fruit ripening by fixing the effects of trans-elements using single cell-type spatiotemporal transcriptome data for the response variables. The CNNs identified critical nucleotide residues for the expression pattern in each gene, which were validated experimentally in ripening tomato fruits. This framework will help understand regulatory networks derived from CREs and transcription factor interactions and provide a flexible way of designing alleles with optimized expression.



Fig. 2 - Visualizing the cis-regulatory elements or other nucleotide residues with the help of an explainable CNN.


8. 

    The article discusses the development of connectomics, which involves reconstructing neural connections at a nanometer scale. The article focuses on the use of deep learning methods in image processing to analyze connectomic data, specifically the development of an annotated Ultra-high Resolution Image Segmentation dataset for Cell membrane called U-RISC. The article highlights that the performance of current deep learning methods still lags behind human-level performance in analyzing the dataset, and attribution analysis is used to identify why this is the case. The article then provides a new benchmark for cell membrane segmentation on U-RISC and proposes suggestions for developing deep learning algorithms in the field of connectomics.


9. 

    The article presents the use of deep learning models as a biomarker for brain health through the estimation of brain age (BA) based on magnetic resonance imaging (MRI) data. The models accurately estimated age, revealing both small and large-scale changes associated with aging throughout the brain, as well as cardiovascular risk factors and accelerated aging in the frontal lobe. This study highlights the potential of deep learning models to detect brain-aging in healthy and at-risk individuals, providing valuable insight into brain health and serving as a useful tool for assessing brain health.


10.

    The article discusses the importance of using unsupervised artificial intelligence (AI) for analyzing large amounts of genetic data, such as genome sequences. The authors developed a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions, which allowed for the separation of SARS-CoV-2 sequences into known clades and revealed subgrouping within those clades. The BLSOM was also used to analyze sequences from other microorganisms, eukaryotes, and the human genome, providing insights into their genetic characteristics. The authors explain the methodological strategies used in their analysis and highlight the advantages of using explainable AI like BLSOM for mining big data.


Bibliography

[1]    C. A. Ellis, R. L. Miller, and V. D. Calhoun, “A Novel Local Explainability Approach for Spectral Insight into Raw Eeg-Based Deep Learning Classifiers.” bioRxiv, p. 2021.06.10.447983, Jun. 11, 2021. doi: 10.1101/2021.06.10.447983.

[2]    C. A. Ellis, D. A. Carbajal, R. Zhang, R. L. Miller, V. D. Calhoun, and M. D. Wang, “An Explainable Deep Learning Approach for Multimodal Electrophysiology Classification.” bioRxiv, p. 2021.05.12.443594, Jun. 08, 2021. doi: 10.1101/2021.05.12.443594.

[3]    H. Xu et al., “APRILE: Exploring the Molecular Mechanisms of Drug Side Effects with Explainable Graph Neural Networks.” bioRxiv, p. 2021.07.02.450937, Sep. 20, 2021. doi: 10.1101/2021.07.02.450937.

[4]    A. Kalyakulina, I. Yusipov, M. G. Bacalini, C. Franceschi, M. Vedunova, and M. Ivanchenko, “Disease classification for whole blood DNA methylation: meta-analysis, missing values imputation, and XAI.” bioRxiv, p. 2022.05.10.491404, May 23, 2022. doi: 10.1101/2022.05.10.491404.

[5]    G. Niklason et al., “Explainable Machine Learning Analysis Reveals Gender Differences in the Phenotypic and Neurobiological Markers of Cannabis Use Disorder.” bioRxiv, p. 2021.08.30.458245, Sep. 01, 2021. doi: 10.1101/2021.08.30.458245.

[6]    C. A. Ellis, R. Zhang, D. A. Carbajal, R. L. Miller, V. D. Calhoun, and M. D. Wang, “Explainable Sleep Stage Classification with Multimodal Electrophysiology Time-series.” bioRxiv, p. 2021.05.04.442658, Jun. 08, 2021. doi: 10.1101/2021.05.04.442658.

[7]    T. Akagi et al., “Genome-wide cis-decoding for expression designing in tomato using cistrome data and explainable deep learning.” bioRxiv, p. 2021.06.01.446518, Jun. 01, 2021. doi: 10.1101/2021.06.01.446518.

[8]    R. Shi et al., “U-RISC: an ultra-high-resolution electron microscopy dataset challenging existing deep learning algorithms.” bioRxiv, p. 2021.05.30.446334, Jun. 15, 2021. doi: 10.1101/2021.05.30.446334.

[9]    S. M. Hofmann et al., “Towards the Interpretability of Deep Learning Models for Multi-modal Neuroimaging: Finding Structural Changes of the Ageing Brain.” bioRxiv, p. 2021.06.25.449906, Jun. 08, 2022. doi: 10.1101/2021.06.25.449906.

[10]  T. Ikemura, Y. Iwasaki, K. Wada, Y. Wada, and T. Abe, “Unsupervised explainable AI for the collective analysis of a massive number of genome sequences: various examples from the small genome of pandemic SARS-CoV-2 to the human genome.” bioRxiv, p. 2021.05.23.445371, May 24, 2021. doi: 10.1101/2021.05.23.445371.

Niciun comentariu:

Trimiteți un comentariu

VPDA - Mall Customers Data Analysis

Introduction Exploring a dataset of mall customers can be important because it can uncover patterns in spending habits, help identify distin...