Team: SummonersRift
Lesll Armand-Alessandro
Gaspar Eduard
Alexandru Raul-Ionut
Presented in Week 2
Period: September-December 2022
XAI for bioinformatics
Explainable AI for Bioinformatics: Methods, Tools, and Applications
Short Introduction
Artificial intelligence and machine learning in general have demonstrated remarkable performances in many tasks, from image processing to natural language processing, especially with the advent of deep learning.
Artificial intelligence (AI) systems that are built on machine learning (ML) and deep neural networks (DNNs) are increasingly deployed in numerous application domains such as military, cybersecurity, healthcare, etc. Further, ML and DNN models are applied to solving complex and emerging biomedical research problems: from text mining, drug discovery, and singlecell RNA sequencing to early disease diagnosis and prognosis. Another common application of AI in precision medicine is predicting what treatment protocols are likely to succeed on a patient based on patient phenotype, demographic, and treatment contexts .
AI has already surpassed human medical experts in specific areas such as detecting tumors and analyzing disease progression. However, complex DNNs or ML models, which are often perceived as opaque and black-box, can make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. Additionally, in sensitive areas like healthcare, explainability and accountability are not only desirable but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable artificial intelligence (XAI) aims to overcome the opaqueness of black-box models and provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and the factors that influence their outcomes. However, most state-of-the-art interpretable ML methods are domain-agnostic and evolved from fields like computer vision, automated reasoning, or statistics, making direct application to bioinformatics problems challenging without customization and domain-specific adaptation.
Importance of XAI in Bioinformatics
Handling large-scale biomedical data involves significant challenges, including heterogeneity, high dimensionality, unstructured data, and high levels of noise and uncertainty. Despite its data-driven nature and technical complexity, the adoption of data-driven approaches in many bioinformatics scenarios is hindered by the lack of efficient ML models capable of tackling these challenges. In order for AI systems to provide trustworthy and reliable decisions, the need for interpretable ML models has become increasingly important to ensure transparency, fairness, and accountability in critical situations. Although not all predictions need to be explained, having a model that is interpretable can make it easier for users to understand and trust its decisions. Is is showed via some experiments that under the right conditions, augmentations based on XAI can provide significant, diverse, and reliable benefits advantages over black-box models.
Helps avoid practical consequences
One of the critical applications of AI is aiding diagnosis and treatment of various cancerous conditions. Early detection and classification of patients into high or low-risk groups is crucial for effective management of the illness. An example of this is a doctor diagnosing a patient with breast cancer. Given that breast cancer is a leading cause of death in women, it is important for the diagnosis to be thoroughly investigated. By utilizing omics data, such as genetic mutations, copy number variations (CNVs), gene expression (GE), DNA methylation, and miRNA expression, accurate diagnosis and treatment can be identified. The diagnosis of a breast cancer patient requires a careful examination of multiple sources of data, including omics information, bioimaging, and clinical records. A multimodal DNN model trained on this data can classify samples with high accuracy.
Reduces complexity and improves accuracy
The aim of genomics data analysis is to extract biologically relevant information and gain insight into the role of biomarkers, such as genes, in cancer development. However, biological processes are complex systems controlled by the interactions of thousands of genes, not single gene-based mechanisms. It is crucial to select biologically significant features with high correlation to the target classes and low correlation between genes. Accurately identifying cancer-specific biomarkers: i) enhances classification accuracy, ii) enables biologists to study the interactions of relevant genes, and iii) helps understand their functional behavior, leading to further gene discovery. After identifying these biomarkers based on feature attributions, they can be ranked based on their relative or absolute importance. These identified genes can serve as cancer-specific marker genes that distinguish specific or multiple tumor classes
Improves decision fairness
With the widespread use of AI, it is essential to address fairness concerns, as AI systems can make significant and impactful decisions in sensitive environments. Bias is a major hindrance to fair decision-making and has been a subject of discussion in philosophy and psychology for a long time. Statistically, bias means a false representation of the truth with respect to the population, and it can occur at any stage of the ML pipeline, from data collection, feature selection, model training, hyperparameter setting, to interpretation of results for affected individuals, such as patients. For instance, in healthcare and biomedicine, representative, selection, and discriminatory biases can easily be present in biophysical data, raising fairness concerns and potentially leading to unfair outcomes in various learning tasks. ML algorithms are a type of statistical discrimination that may become problematic when they give certain privileged groups a systematic advantage and certain disadvantaged groups a systematic disadvantage. If the data used to train an ML model is biased, the model will also be biased and produce biased decisions if used in an AI system.
Internal governance and legal compliance
As AI is more widely used, there is a growing need for transparency and explanation of AI decisions for ethical, legal, and safety reasons. This is particularly important in sensitive domains such as healthcare where AI can affect human lives. The EU General Data Protection Regulation (EU GDPR) recognizes the importance of ethics, accountability, and robustness in AI and requires automated decision-making processes to have appropriate safeguards, including the right to understand and challenge the decision, and the right not to be solely subject to an AI-based decision that significantly impacts one's life. The regulation enforces that processing based on automated decision-making tools should be subject to suitable safeguards, including the right to obtain an explanation of the decision reached after such an assessment and to challenge the decision. The regulation also states that individuals have the right not to be subject to a decision based solely on automated processing and whenever human subjects have their lives significantly impacted by an automatic decision-making machine
Techniques and Methods for Interpretable ML
A variety of interpretable machine learning methods, both model-specific and model-agnostic, have been proposed and can be broadly classified into three categories: probing, perturbing, and model surrogation. Depending on the level of abstraction, they can also be categorized as local or global interpretability methods. This article covers a lot of important subjects for example bioimaging, cancer genomics, text mining, and reasoning.
- Conclusion -
Explainable Artificial Intelligence (XAI) aims to overcome the opaqueness of black-box models and provide transparency in how AI systems make decisions. It is important in bioinformatics because handling large-scale biomedical data involves significant challenges, including heterogeneity, high dimensionality, unstructured data, and high levels of noise and uncertainty. The adoption of data-driven approaches in many bioinformatics scenarios is hindered by the lack of efficient ML models capable of tackling these challenges. For AI systems to provide trustworthy and reliable decisions, the need for interpretable ML models has become increasingly important to ensure transparency, fairness, and accountability in critical situations. XAI can reduce complexity and improve accuracy, improve decision fairness, help avoid practical consequences, and ensure internal governance and legal compliance.
Niciun comentariu:
Trimiteți un comentariu