duminică, 26 martie 2023

XAI for Medicine, period May-August 2021

 XAI in Medicine

XAI (Explainable Artificial Intelligence) applied to medicine refers to the use of machine learning and AI algorithms to aid in medical decision-making while also providing clear and interpretable explanations for their predictions. In the context of medicine, XAI can help clinicians and healthcare professionals make more informed decisions by providing insight into how the algorithm arrived at its decision, thereby improving transparency, trust, and accountability. XAI techniques can be applied to a wide range of medical tasks, such as disease diagnosis, treatment recommendation, drug discovery, and clinical trials, with the ultimate goal of improving patient outcomes and advancing medical research.
In this blog post we chose 10 research articles published in the period May-August 2021 and briefly described them in order to see practical examples of how XAI is applied to the field of Medicine

1. Diagnosis of Acute Poisoning using explainable artificial intelligence

Introduction
Medical toxicology is the clinical specialty that treats the toxic effects of substances, for example, an overdose, a medication error, or a scorpion sting. The volume of toxicological knowledge and research has, as with other medical specialties, outstripped the ability of the individual clinician to entirely master and stay current with it. The application of machine learning/artificial intelligence (ML/AI) techniques to medical toxicology is challenging because initial treatment decisions are often based on a few pieces of textual data and rely heavily on experience and prior knowledge. ML/AI techniques, moreover, often do not represent knowledge in a way that is transparent for the physician, raising barriers to usability. Logic-based systems are more transparent approaches, but often generalize poorly and require expert curation to implement and maintain.

Methods

We constructed a probabilistic logic network to model how a toxicologist recognizes a toxidrome, using only physical exam findings. Our approach transparently mimics the knowledge representation and decision-making of practicing clinicians. We created a library of 300 synthetic cases of varying clinical complexity. Each case contained 5 physical exam findings drawn from a mixture of 1 or 2 toxidromes. We used this library to evaluate the performance of our probabilistic logic network, dubbed Tak, against 2 medical toxicologists, a decision tree model, as well as its ability to recover the actual diagnosis.

Conclusions
The software, dubbed Tak, performs comparably to humans on straightforward cases and intermediate difficulty cases, but is outperformed by humans on challenging clinical cases. Tak outperforms a decision tree classifier at all levels of difficulty. Our results are a proof-of-concept that, in a restricted domain, probabilistic logic networks can perform medical reasoning comparably to humans.


2. Convolutional Neural Networks for the evaluation of cancer in Barrett's esophagus: Explainable AI to lighten up the black-box

Even though artificial intelligence and machine learning have demonstrated remarkable performances in medical image computing, their level of accountability and transparency must be provided in such evaluations. The reliability related to machine learning predictions must be explained and interpreted, especially if diagnosis support is addressed. For this task, the black-box nature of deep learning techniques must be lightened up to transfer its promising results into clinical practice. Hence, we aim to investigate the use of explainable artificial intelligence techniques to quantitatively highlight discriminative regions during the classification of early-cancerous tissues in Barrett's esophagus-diagnosed patients. Four Convolutional Neural Network models (AlexNet, SqueezeNet, ResNet50, and VGG16) were analyzed using five different interpretation techniques (saliency, guided backpropagation, integrated gradients, input × gradients, and DeepLIFT) to compare their agreement with experts' previous annotations of cancerous tissue. We could show that saliency attributes match best with the manual experts' delineations. Moreover, there is moderate to high correlation between the sensitivity of a model and the human-and-computer agreement. The results also showed that the higher the model's sensitivity, the stronger the correlation of human and computational segmentation agreement. We observed a relevant relation between computational learning and experts' insights, demonstrating how human knowledge may influence correct computational learning.

3. Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study

Artificial Intelligence (AI) has made leapfrogs in development across all the industrial sectors especially when deep learning has been introduced. Deep learning helps to learn the behavior of an entity through methods of recognising and interpreting patterns. Despite its limitless potential, the mystery is how deep learning algorithms make a decision in the first place. Explainable AI (XAI) is the key to unlocking AI and the black-box for deep learning. XAI is an AI model that is programmed to explain its goals, logic, and decision making so that the end users can understand. The end users can be domain experts, regulatory agencies, managers and executive board members, data scientists, users that use AI, with or without awareness, or someone who is affected by the decisions of an AI model. Chest CT has emerged as a valuable tool for the clinical diagnostic and treatment management of lung diseases associated with COVID-19. AI can support rapid evaluation of CT scans to differentiate COVID-19 findings from other lung diseases. However, how these AI tools or deep learning algorithms reach such a decision and which are the most influential features derived from these neural networks with typically deep layers are not clear. The aim of this study is to propose and develop XAI strategies for COVID-19 classification models with an investigation of comparison. The results demonstrate promising quantification and qualitative visualizations that can further enhance the clinician's understanding and decision making with more granular information from the results given by the learned XAI models.


4. Explainable AI-based clinical decision support system for hearing disorders

In clinical system design, human-computer interaction and explainability are important topics of research. Clinical systems need to provide users with not only results but also an account of their behaviors. In this research, we propose a knowledge-based clinical decision support system (CDSS) for the diagnosis and therapy of hearing disorders, such as tinnitus, hyperacusis, and misophonia. Our prototype eTRT system offers an explainable output that we expect to increase its trustworthiness and acceptance in the clinical setting. Within this paper, we: (1) present the problem area of tinnitus and its treatment; (2) describe our data-driven approach based on machine learning, such as association- and action rule discovery; (3) present the evaluation results from the inference on the extracted rule-based knowledge and chosen test cases of patients; (4) discuss advantages of explainable output incorporated into a graphical user interface; (5) conclude with the results achieved and directions for future work.


5. Improvement of a Prediction Model for Heart Failure Survival through Explainable Artificial Intelligence

Cardiovascular diseases and their associated disorder of heart failure are one of the major death causes globally, being a priority for doctors to detect and predict its onset and medical consequences. Artificial Intelligence (AI) allows doctors to discover clinical indicators and enhance their diagnosis and treatments. Specifically, explainable AI offers tools to improve clinical prediction models that experience poor interpretability of their results. This work presents an explainability analysis and evaluation of a prediction model for heart failure survival by using a dataset that comprises 299 patients who suffered heart failure. The model employs a data workflow pipeline able to select the best ensemble tree algorithm as well as the best feature selection technique. Moreover, different post-hoc techniques have been used for the explainability analysis of the model. The paper's main contribution is an explainability-driven approach to select the best prediction model for HF survival based on an accuracy-explainability balance. Therefore, the most balanced explainable prediction model implements an Extra Trees classifier over 5 selected features (follow-up time, serum creatinine, ejection fraction, age and diabetes) out of 12, achieving a balanced-accuracy of 85.1% and 79.5% with cross-validation and new unseen data respectively. The follow-up time is the most influencing feature followed by serum-creatinine and ejection-fraction. The explainable prediction model for HF survival presented in this paper would improve a further adoption of clinical prediction models by providing doctors with intuitions to better understand the reasoning of, usually, black-box AI clinical solutions, and make more reasonable and data-driven decisions.

6Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers

Problem: An application of Explainable Artificial Intelligence Methods for COVID CT-Scan classifiers is presented. Motivation: It is possible that classifiers are using spurious artifacts in dataset images to achieve high performances, and such explainable techniques can help identify this issue. Aim: For this purpose, several approaches were used in tandem, in order to create a complete overview of the classifications. Methodology: The techniques used included GradCAM, LIME, RISE, Squaregrid, and direct Gradient approaches (Vanilla, Smooth, Integrated). Main results: Among the deep neural networks architectures evaluated for this image classification task, VGG16 was shown to be most affected by biases towards spurious artifacts, while DenseNet was notably more robust against them. Further impacts: Results further show that small differences in validation accuracies can cause drastic changes in explanation heatmaps for DenseNet architectures, indicating that small changes in validation accuracy may have large impacts on the biases learned by the networks. Notably, it is important to notice that the strong performance metrics achieved by all these networks (Accuracy, F1 score, AUC all in the 80 to 90% range) could give users the erroneous impression that there is no bias. However, the analysis of the explanation heatmaps highlights the bias.

7. Brain Hemorrhage Classification in CT Scan Images Using Minimalist Machine Learning

Over time, a myriad of applications have been generated for pattern classification algorithms. Several case studies include parametric classifiers such as the Multi-Layer Perceptron (MLP) classifier, which is one of the most widely used today. Others use non-parametric classifiers, Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Naïve Bayes (NB), Adaboost, and Random Forest (RF). However, there is still little work directed toward a new trend in Artificial Intelligence (AI), which is known as eXplainable Artificial Intelligence (X-AI). This new trend seeks to make Machine Learning (ML) algorithms increasingly simple and easy to understand for users. Therefore, following this new wave of knowledge, in this work, the authors develop a new pattern classification methodology, based on the implementation of the novel Minimalist Machine Learning (MML) paradigm and a higher relevance attribute selection algorithm, which we call dMeans. We examine and compare the performance of this methodology with MLP, NB, KNN, SVM, Adaboost, and RF classifiers to perform the task of classification of Computed Tomography (CT) brain images. These grayscale images have an area of 128 × 128 pixels, and there are two classes available in the dataset: CT without Hemorrhage and CT with Intra-Ventricular Hemorrhage (IVH), which were classified using the Leave-One-Out Cross-Validation method. Most of the models tested by Leave-One-Out Cross-Validation performed between 50% and 75% accuracy, while sensitivity and sensitivity ranged between 58% and 86%. The experiments performed using our methodology matched the best classifier observed with 86.50% accuracy, and they outperformed all state-of-the-art algorithms in specificity with 91.60%. This performance is achieved hand in hand with simple and practical methods, which go hand in hand with this trend of generating easily explainable algorithms.

8. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice


Artificial intelligence (AI) is transforming society and health care. Growing numbers of artificial intelligence applications are being developed and applied to pathology and laboratory medicine. These technologies introduce risks and benefits that must be assessed and managed through the lens of ethics.

There is great enthusiasm for the potential of these AI tools to transform and improve health care. This is reflected in efforts by pathology and laboratory medicine professionals both to enhance the practice of pathology and laboratory medicine and to advance medical knowledge based on the data that we generate.

AI application developers use real-world data sets to “train” their applications to generate the desired output. Applications are ideally validated using separate real-world data sets to assess the accuracy and generalizability of the AI output. In pathology AI, for example, a training or validation data set might consist of digitized microscopic images together with the associated diagnoses as assessed by human expert pathologists.

Clinical laboratories performing in vitro diagnostic tests (including histopathologic diagnosis) constitute one of the largest single sources of objective and structured patient-level data within the health care system. 

Ethics in medicine, scientific research, and computer science all have deep academic roots. The foundational principles of medical ethics as articulated by Beauchamp and Childress are autonomy, beneficence, nonmaleficence, and justice.






9.  Explanatory artificial intelligence for diabetic retinopathy diagnosis

Diabetic Retinopathy (DR) is a leading and growing cause of vision impairment and blindness: by 2040, around 600 million people throughout the world will have diabetes 
a third of whom will have DR (Yau et al., 2012). Early diagnosis is key to slowing down 
the progression of DR and therefore preventing the occurrence of blindness

Annual retinal screening, generally using Color Fundus Photography (CFP), is thus 
recommended for all diabetic patients.

In order to improve DR screening programs, numerous Artificial Intelligence (AI) systems were thus, developed to automate DR diagnosis using CFP (Ting et al., 2019b). However, due to the “black-box” nature of state-of-the-art AI, these systems still need to gain the trust of clinicians and patients.

Nowadays exists an eXplanatory Artificial Intelligence (XAI) that reaches the same level of 
performance as black-box AI, for the task of classifying Diabetic Retinopathy (DR) severity using Color Fundus Photography (CFP). This algorithm, called ExplAIn, learns to segment and categorize lesions in images; the final image-level classification directly derives from these multivariate lesion segmentations. The novelty of this explanatory framework is that it is trained from end to end, with image supervision only, just like black-box AI algorithms: the concepts of lesions and lesion categories emerge by themselves. For improved lesion localization, foreground/background separation is trained through self-supervision, in such a way that occluding foreground pixels transforms the input image into a healthy-looking image. The advantage of such an architecture is that automatic diagnoses can be explained simply by an image and/or a few sentences. ExplAIn is evaluated at the image level and at the pixel level on various CFP image datasets. We expect this new framework, which jointly offers high classification performance and explainability, to facilitate AI deployment.


10.  Machine Learning and XAI approaches for Allergy Diagnosis

This work presents a computer-aided framework for allergy diagnosis which is capable of handling comorbidities. The system was developed using datasets collected from allergy testing centers in South India. Intradermal skin test results of 878 patients were recorded and it was observed that the data contained very few samples for comorbid conditions. Modified data sampling techniques were applied to handle this data imbalance for improving the efficiency of the learning algorithms. The algorithms were cross-validated to choose the optimal trained model for multi-label classification. The transparency of the machine learning models was ensured using post-hoc explainable artificial intelligence approaches. The system was tested by verifying the performance of a trained random forest model on the test data. The training and validation accuracy rate of the decision tree, support vector machine and random forest are 81.62, 81.04 and 83.07 respectively. During evaluation, random forest achieved a rate of 86.39 accuracy overall, and 75% sensitivity for the comorbid Rhinitis-Urticaria class. The average performance of the clinicians before and after using the decision support system were 77.21% and 81.80% respectively




Bibliography

1. Diagnosis of Acute Poisoning using explainable artificial intelligence
    Michael Chary, Ed W. Boyer, Michele M. Burns 
    Computers in Biology and Medicine, Volume 134, July 2021
    
2. Convolutional Neural Networks for the evaluation of cancer in Barrett's esophagus: Explainable AI         to lighten up the black-box
     Luis A. de Souza Jr. , Robert Mendel, Sophia Strasser, Alanna Ebigbo, Andreas Probst, Helmut             Messmann, João P. Papa, Christoph Palm 
     Computers in Biology and Medicine, Volume 135, August 2021

3. Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study
    Q. Ye, J. Xia and G. Yang
    IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), 12 July 2021
    
4. Explainable AI-based clinical decision support system for hearing disorders
    Katarzyna A. Tarnowska, Ph.D., Brett C. Dispoto, B.S., and Jordan Conragan, B.S.
    Published online 2021 May 17.

5.Improvement of a Prediction Model for Heart Failure Survival through Explainable Artificial                 Intelligence
   Pedro A. Moreno-Sanchez
   20 Aug 2021

6. Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers
    Iam Palatnik de Sousa, Marley M. B. R. Vellasco, Eduardo Costa da Silva
    23 August 2021

7. Brain Hemorrhage Classification in CT Scan Images Using Minimalist Machine Learning
    José-Luis Solorio-Ramírez, Magdalena Saldana-Perez, Miltiadis D Lytras, Marco-Antonio Moreno-      Ibarra, Cornelio Yáñez-Márquez 
    2021 Aug 11

8. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice
     Brian R. Jackson MD,  Ye Ye , James M. Crawford, Michael J. Becich, Somak Roy, Jeffrey R.                 Botkin, Monica E. de Baca, Liron Pantanowitz 

9. ExplAIn: Explanatory artificial intelligence for diabetic retinopathy diagnosis
    Gwenolé Quellec, Hassan Al Hajj, Mathieu Lamard, Pierre-Henri Conze, Pascale Massin, Béatrice        Cochener
    August 2021


10. Machine Learning and XAI approaches for Allergy Diagnosis
      Ramisetty Kavya, Jabez Christopher, Subhrakanta Panda, Y. Bakthasingh Lazarus 
      August 2021



















XAI for Bioinformatics Jan-Apr 2022

Artificial Intelligence (AI) is increasingly being used in bioinformatics to analyze large volumes of biological data and to develop predictive models for various biological phenomena. However, as the complexity of these AI systems grows, it becomes more challenging to understand how they arrive at their conclusions or predictions. This lack of interpretability is a significant challenge for bioinformatics, where understanding the rationale behind a decision or prediction is critical for building trust in the model and identifying potential errors or biases. To address this challenge, the field of explainable AI (XAI) has emerged, which aims to develop AI models that can provide a transparent and interpretable rationale for their predictions. In bioinformatics, XAI can play a crucial role in enabling researchers to gain insights into complex biological systems, improve disease diagnosis and treatment, and identify new drug targets. This article explores the significance of XAI for bioinformatics and how it can help researchers understand and interpret the predictions of AI models in the field.


  1. A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

In the article two frameworks for supporting automatic medical consultation, which are doctor-patient dialogue understanding and task-oriented interaction, using machine learning, are proposed. The authors create a new large medical dialogue dataset with fine-grained annotations and establish five independent tasks including named entity recognition, dialogue act classification, symptom label inference, medical report generation, and diagnosis-oriented dialogue policy. The authors report benchmark results for each task, which demonstrate the usability of the dataset and establish a baseline for future studies. The article aims to improve the efficiency of automatic medical consultation and enhance patient experience.


  1. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review

The article discusses the use of single-cell RNA sequencing (scRNA-seq) to study cell states and phenotypes, as well as the potential applications in understanding biological processes and disease states. It also explores the use of deep learning, an artificial intelligence technique, in scRNA-seq data analysis. The review surveys recent developments in deep learning techniques for scRNA-seq data analysis, identifies key steps that have been advanced by deep learning, and explains the benefits of deep learning over conventional analytic tools. The article also summarizes the challenges faced by current deep learning approaches in scRNA-seq data analysis and discusses potential directions for improving deep learning algorithms in this field.


  1. Mining On Alzheimer’s Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing

The article discusses the use of knowledge graphs to identify opportunities for preventing or delaying neurodegenerative diseases, specifically Alzheimer's Disease (AD). The authors constructed a knowledge graph using biomedical annotations and extracted relations using SemRep via SemMedDB. They used a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The filtered triples were used to train knowledge graph completion algorithms to predict candidates that might be helpful for AD treatment or prevention. The results showed that TransE outperformed other models, and time-slicing techniques were used to further evaluate the prediction results. The authors found supporting evidence for most highly ranked candidates predicted by the model, indicating that their approach can inform reliable new knowledge. The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses in the field of neurodegenerative diseases.


  1. Recommendations for extending the GFF3 specification for improved interoperability of genomic data

The article discusses the GFF3 format, which is widely used to represent the structure and function of genes and other mapped features. However, the flexibility of this format has become an obstacle to standardized downstream processing due to the different notations used by common software packages. To address this issue, the AgBioData consortium has developed recommendations for improving the GFF3 format, including providing concrete guidelines for generating GFF3 and creating a standard representation of the most common biological data types. The AgBioData GFF3 working group suggests improvements for each GFF3 field, as well as special cases of modeling functional annotations and standard protein-coding genes, to increase efficiency for AgBioData databases and the genomics research community.


  1. An Artificial Intelligence Technique for Covid-19 Detection with eXplainability using Lungs X-Ray Images

The article discusses how limited healthcare resources and unequal distribution of healthcare facilities have made disease detection critical in averting epidemics, particularly in the case of COVID-19. PCR testing is commonly used to detect the virus, but deep learning approaches can also be used to classify chest X-RAY images. The study aims to detect COVID-19 by using deep learning approaches to analyze chest X-RAY images of COVID-19 patients, viral pneumonia patients, and healthy patients obtained from IEEE and Kaggle. The dataset was subjected to a data augmentation approach before classification, and multi classification deep learning models were used to classify the three groups.


  1. Deep learning for drug repurposing: methods, databases, and applications

The article discusses the potential of repurposing existing drugs for new therapies, specifically for COVID-19, as a way to accelerate drug development and reduce costs. However, effectively utilizing deep learning models for drug repurposing in complex diseases is still challenging. The article provides guidelines for utilizing deep learning methodologies and tools for drug repurposing, including commonly used bioinformatics and pharmacogenomics databases, sequence-based and graph-based representation approaches, and state-of-the-art deep learning-based methods. The article also presents applications of drug repurposing for COVID-19 and outlines future challenges.


  1. Direct Molecular Conformation Generation

The article presents a new method for generating the three-dimensional coordinates of atoms in a molecule, which is important in bioinformatics and pharmacology. The proposed method directly predicts the coordinates of atoms without predicting intermediate values such as interatomic distances or local structures. The method is invariant to roto-translation of coordinates and permutation of symmetric atoms, and adaptively aggregates bond and atom information to iteratively refine the generated conformation. The method achieves the best results on two datasets and improves molecular docking by providing better initial conformations. The article concludes that the direct approach has great potential and provides a link to the released code.


  1. MPVNN: Mutated Pathway Visible Neural Network Architecture For Interpretable Prediction Of Cancer-Specific Survival Risk

The article presents a novel approach for survival risk prediction using gene expression data in cancer, called Mutated Pathway Visible Neural Network (MPVNN). MPVNN is designed using prior knowledge of biological signaling pathways and gene mutation data-based edge randomization, which simulates signal flow disruption. The study uses the PI3K-Akt pathway as a case study and shows improved cancer-specific survival risk prediction results of MPVNN over standard non-NN and other similar sized NN survival analysis methods. The trained MPVNN architecture interpretation is reliable and points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that are important in risk prediction for particular cancer types. The article highlights the importance of interpretability in survival analysis models for making treatment decisions in cancer.


Conclusions

In conclusion, the application of XAI techniques in bioinformatics has the potential to enhance the accuracy, transparency, and interpretability of machine learning models, enabling scientists to make more informed decisions and gain a better understanding of the underlying biological processes. By providing explanations for the predictions made by these models, XAI can facilitate the identification of relevant biomarkers, aid in the diagnosis of diseases, and assist in the development of personalized treatments. However, there are still challenges to be addressed, such as the need for standardized guidelines and approaches to XAI in bioinformatics, as well as the integration of XAI with existing bioinformatics tools and workflows. As research in this area continues, it is expected that XAI will play an increasingly important role in advancing our understanding of complex biological systems and ultimately lead to improved patient outcomes.


Bibliography

[1] Wei Chen, Zhiwei Li, Hongyi Fang, Qianyuan Yao, Cheng Zhong, Jianye Hao, Qi Zhang, Xuanjing Huang, Jiajie Peng and Zhongyu Wei, “A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets” 

(https://arxiv.org/pdf/2204.08997v3.pdf

[2] Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang, “Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review” 

(https://www.sciencedirect.com/science/article/pii/S1672022922001668)

[3] Yi Nian, Xinyue Hu, Rui Zhang, Jingna Feng, Jingcheng Du, Fang Li, Larry Bu, Yuji Zhang, Yong Chen and Cui Tao, “Mining On Alzheimer’s Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing”

 (https://arxiv.org/pdf/2202.08712.pdf)

[4] Surya Saha, Scott Cain, Ethalinda K. S. Cannon, Nathan Dunn, Andrew Farmer, Zhi-Liang Hu, Gareth Maslen, Sierra Moxon, Christopher J Mungall, Rex Nelson, Monica F. Poelchau, “Recommendations for extending the GFF3 specification for improved interoperability of genomic data”

 (https://arxiv.org/ftp/arxiv/papers/2202/2202.07782.pdf

[5] Pranshu Saxena, Sanjay Kumar Singh, Gyanendra Tiwary, Yush Mittal, Ishika Jain; “An Artificial Intelligence Technique for Covid-19 Detection with eXplainability using Lungs X-Ray Images”

(https://ieeexplore.ieee.org/document/9793240

[6] Xiaoqin Pan, Xuan Lin, Dongsheng Cao, Xiangxiang Zeng, Philip S. Yu, Lifang He, Ruth Nussinov, Feixiong Cheng, “Deep learning for drug repurposing: methods, databases, and applications” 

(https://arxiv.org/ftp/arxiv/papers/2202/2202.05145.pdf)

[7] Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu; “Direct Molecular Conformation Generation” 

(https://arxiv.org/pdf/2202.01356.pdf

[8] Gourab Ghosh Roy, Nicholas Geard, Karin Verspoor, Shan He; “MPVNN: Mutated Pathway Visible Neural Network Architecture For Interpretable Prediction Of Cancer-Specific Survival Risk”

 (https://arxiv.org/pdf/2202.00882.pdf)




XAI for Bioinformatics January - April 2021

DeepGS: Predicting phenotypes from genotypes using Deep Learning

    The article presents DeepGS, a deep learning-based method for predicting phenotypes from genotypes. The authors propose using convolutional neural networks (CNNs) to model complex gene interactions and improve the accuracy of phenotype predictions. Their approach leverages sliding window-based input representation to capture local genomic patterns and learns high-level representations of genotypes for prediction tasks. 

    The authors evaluate DeepGS on four diverse datasets, including wheat, maize, rice, and Arabidopsis thaliana, demonstrating that their approach consistently outperforms traditional genomic prediction methods. The results indicate that DeepGS can effectively model complex genetic architectures and has the potential to advance genomic prediction and genome-wide association studies (GWAS), being a promising tool for plant and animal breeding programs and advancing the field of genomics.


Deep Learning Enables Fast and Accurate Imputation of Gene Expression

    The article presents a deep learning-based approach for fast and accurate imputation of gene expression. The authors propose a method called DeepImpute, which employs a multi-layered deep neural network to predict gene expression values from single-cell RNA sequencing (scRNA-seq) data. The goal is to fill in missing data points and improve data quality, which can be crucial for downstream analyses. DeepImpute is trained on a large compendium of scRNA-seq datasets, enabling it to learn generalizable features and effectively impute gene expression across various cell types and species.

     The authors demonstrate that DeepImpute outperforms existing imputation methods in terms of both accuracy and computational efficiency. In addition, they show that their approach can improve the performance of downstream analyses, such as cell type identification and differential gene expression analysis.


DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies 

    DeepCOMBI utilizes a deep neural network (DNN) to model intricate SNP interactions and integrates Layer-wise Relevance Propagation (LRP) to generate explanations for the AI-driven findings. This method allows for the detection and interpretation of significant SNPs, SNP-SNP interactions, and potential epistatic effects, which can enrich our comprehension of complicated genetic structures. 

    The authors assess DeepCOMBI on both simulated and real-world datasets, showing that their approach surpasses existing GWAS techniques in accuracy, interpretability, and computational efficiency. The outcomes emphasize DeepCOMBI's potential to further the genomics field and aid in uncovering new genetic factors related to complex traits and diseases.


Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

   The article investigates the potential of Explainable Machine Learning (XAI) in predicting breast cancer survival and providing insights compared to traditional Cox regression models. The authors develop an XAI-based approach that leverages the SHapley Additive exPlanations (SHAP) method to generate interpretable predictions. The study uses a large dataset of breast cancer patients from the Netherlands Cancer Registry, comparing the performance of the XAI-based model with the traditional Cox regression model.

    The results show that the XAI-based approach outperforms the Cox regression model in terms of predictive accuracy, while also providing valuable insights into the factors affecting breast cancer survival. The use of SHAP values allows the researchers to quantify the contribution of each feature in the prediction, helping to identify the most important factors influencing survival outcomes. These insights can facilitate a better understanding of breast cancer prognosis, ultimately contributing to improved patient care and personalized treatment strategies. The proposed approach offers not only improved predictive performance but also valuable insights into the underlying factors that influence survival outcomes, which can be of great importance in clinical decision-making and personalized medicine. 


Learning the Mental Health Impact of COVID-19 in the United States With Explainable Artificial Intelligence: Observational Study 

    The article investigates the mental health impact of the COVID-19 pandemic in the United States using an Explainable Artificial Intelligence (XAI) approach. The authors analyze a large dataset of tweets collected from Twitter to explore the mental health consequences of the pandemic on the general population. The study employs various XAI techniques, such as LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations), to develop interpretable machine learning models that can detect and predict mental health issues, including anxiety, depression, and stress, based on the content of the tweets. 

    The authors demonstrate that their XAI-driven approach can effectively identify and quantify the mental health impact of the COVID-19 pandemic, providing valuable insights into the factors contributing to the observed changes in mental health during this period. Moreover, the explainability of the models enables a better understanding of the underlying reasons for the detected mental health issues, which can inform targeted interventions and policies. 


    LIME explanation for the prediction made by custom CNN model on (a) COVID-19 positive and (b) COVID-19 negative chest X-ray scans. 


An Explainable Artificial Intelligence based Prospective Framework for COVID-19 Risk Prediction 

    The article presents an Explainable Artificial Intelligence (XAI) based framework for predicting the risk of COVID-19 infection in individuals. The authors develop a machine learning model that can estimate the likelihood of a person contracting the virus based on various factors, such as demographics, pre-existing health conditions, and exposure history. The proposed framework employs several XAI techniques, including Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), to provide interpretable and transparent predictions. This allows users to understand the factors contributing to the estimated risk and facilitates trust in the AI-driven decision-making process. 

    The authors evaluate their framework on a dataset of COVID-19 cases and demonstrate that it can effectively predict the risk of infection with a high degree of accuracy. Additionally, the explainability of the model enables the identification of the most important features affecting the risk, which can help inform targeted interventions and preventive measures. 


Prediction of caregiver quality of life in amyotrophic lateral sclerosis using explainable machine learning

    The article presents a study on predicting caregiver quality of life (QoL) in amyotrophic lateral sclerosis (ALS) using explainable machine learning (ML) techniques. The authors develop a model that can estimate the QoL of caregivers for individuals with ALS based on various factors, such as caregiver demographics, patient characteristics, and clinical data. The study employs Explainable Artificial Intelligence (XAI) techniques, including SHapley Additive exPlanations (SHAP) and feature importance measures, to provide interpretable and transparent predictions. This allows users to understand the factors contributing to the estimated QoL and enables the identification of key variables affecting caregiver well-being. 

    The authors evaluate their model using a dataset of ALS patients and caregivers, demonstrating that the explainable ML approach can accurately predict caregiver QoL. Moreover, the explainability of the model provides valuable insights into the most important factors influencing caregiver well-being, which can help inform targeted interventions and support strategies. 


Establishing Machine Learning Models to Predict Curative Resection in Early Gastric Cancer with Undifferentiated Histology: Development and Usability Study

    The article presents the development and usability study of machine learning (ML) models to predict curative resection in early gastric cancer (EGC) patients with undifferentiated histology. The authors focus on creating ML models that can estimate the likelihood of successful curative resection, which is crucial for optimizing treatment strategies and improving patient outcomes.

    The study involves the analysis of a large dataset of EGC patients with undifferentiated histology, using various machine learning techniques, such as logistic regression, support vector machines, decision trees, and random forests. The goal is to identify the most accurate and reliable ML model for predicting curative resection outcomes. The authors evaluate the performance of the developed ML models and demonstrate that they can effectively predict curative resection with high accuracy. Moreover, they show that ML models can outperform conventional statistical models, providing more accurate and reliable predictions. 


The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies

    The article presents a comprehensive survey on the role of explainability in creating trustworthy artificial intelligence (AI) for health care. The authors focus on the terminology, design choices, and evaluation strategies related to explainable AI (XAI) in the health care domain. The survey aims to provide a clear understanding of XAI's potential and challenges in creating reliable and interpretable AI systems for medical applications.     

    The authors review and discuss the various aspects of XAI, including: Terminology: They provide an overview of the key terms and concepts related to explainability in AI, such as interpretability, transparency, and trustworthiness. Design choices: They explore different design choices in XAI, examining various methods, techniques, and approaches for developing explainable models. Evaluation strategies: They discuss the evaluation of XAI models, focusing on various metrics and benchmarks to assess the quality of explanations and the overall performance of AI systems. The survey highlights the growing importance of explainability in the adoption of AI in health care, emphasizing the need for transparent and interpretable models that can be trusted by both medical professionals and patients. The authors also identify challenges and future research directions, including the development of standardized evaluation methods and the integration of domain knowledge into XAI techniques.



Bibliography: 

1. https://www.biorxiv.org/content/10.1101/241414v1.full

2. https://www.frontiersin.org/articles/10.3389/fgene.2021.624128/full

3. https://academic.oup.com/nargab/article/3/3/lqab065/6324603?login=false

4. https://www.nature.com/articles/s41598-021-86327-7

5. https://mental.jmir.org/2021/4/e25097

6. https://www.medrxiv.org/content/10.1101/2021.03.02.21252269v1.full

7. https://link.springer.com/content/pdf/10.1038/s41598-021-91632-2.pdf

8. https://www.jmir.org/2021/4/e25053/

9. https://www.sciencedirect.com/science/article/pii/S1532046420302835




VPDA - Mall Customers Data Analysis

Introduction Exploring a dataset of mall customers can be important because it can uncover patterns in spending habits, help identify distin...