luni, 16 decembrie 2024

VPDA - Mall Customers Data Analysis

Introduction

Exploring a dataset of mall customers can be important because it can uncover patterns in spending habits, help identify distinct customer segments, and guide data-driven marketing strategies. The Mall Customers dataset from Kaggle provides demographic details (Age, Gender), Annual Income data, and a Spending Score metric for 200 individuals. 

Data Overview
The Mall Customers dataset includes 200 records, each with a CustomerID, Gender, Age, Annual Income (in thousands of dollars), and a Spending Score from 1 to 100. There are no missing values. The mean Age is approximately 38.85, and the average Annual Income is near 60.56k per year. The Spending Score, which averages around 50.2, represents an internal metric assigned by the mall. Distributions of the features show that Ages are spread roughly between 20 and 70, Annual Incomes are concentrated between 30k and 80k with a few higher outliers, and Spending Scores cluster around the mid-range without a clear linear relationship to the other features.

Pairwise plots of Age, Annual Income, and Spending Score reveal no straightforward correlations. Younger customers do not necessarily spend more, and higher incomes do not guarantee higher Spending Scores. 



Hierarchical Clustering
To discover natural groupings, hierarchical clustering was applied to the scaled Age, Annual Income, and Spending Score features. Choosing five clusters divided the 200 customers into groups of various sizes (66, 45, 39, 28, and 22 members). When plotting these clusters by Annual Income and Spending Score, visually distinct segments appear. Some groups trend toward moderate incomes and mid-level spending, while others represent higher-income customers with a wide range of spending patterns. These clusters, formed without predefined labels, highlight inherent segmentation in the customer base.



Dimensionality Reduction with PCA and t-SNE
Principal Component Analysis (PCA) provides a way to visualize complex, high-dimensional data on a two-dimensional plane. After applying PCA, the previously discovered clusters spread out across the principal components, confirming that the chosen features capture meaningful differences in customer behavior.


A further step—t-SNE (t-Distributed Stochastic Neighbor Embedding)—offers a nonlinear dimensionality reduction that often reveals clearer separations. The t-SNE plots show well-defined, tight groupings of points. Each cluster occupies a distinct region, reinforcing the idea that the hierarchical clustering discovered natural, data-driven segments. For instance, one cluster is tightly grouped far from the others, indicating a unique profile of customers that differ markedly from their peers.



Classification with a Decision Tree
While the dataset does not provide a direct classification target, the Spending Score can be used to define one. Labeling customers as “High Spenders” if their Spending Score is above 50 creates a binary classification problem. A decision tree was trained using Age, Annual Income, and Gender as inputs. The resulting classification report shows a balanced performance, with macro averages of around 0.72 for precision and recall. The confusion matrix indicates that both high and low spenders are identified reasonably well, though some misclassifications occur.

Examining feature importances reveals that Age is the most critical predictor of high-spending behavior (importance ~0.5378), followed by Annual Income (~0.4116), while Gender contributes minimally (~0.0505). These findings suggest that age and income brackets may offer a more reliable way to anticipate higher spending patterns than demographic factors like gender.






Insights and Applications
The combination of clustering, dimensionality reduction, and classification techniques provides a comprehensive overview of the customer landscape. Unsupervised methods expose distinct market segments, while dimensionality reduction confirms these clusters visually, making it easier to convey the findings. The decision tree model adds another layer of value by highlighting which attributes most strongly influence spending behavior.

From a practical standpoint, these insights enable more targeted marketing strategies. For example, if one cluster consists primarily of younger, moderate-income individuals with high Spending Scores, tailored loyalty programs or special promotions could resonate strongly with that segment. Similarly, identifying older customers who consistently appear in high-spending clusters may prompt personalized product suggestions or event invitations that match their interests and influence their future spending decisions.

Conclusion
The Mall Customers dataset, when explored with a combination of hierarchical clustering, PCA, t-SNE, and decision tree modeling, reveals nuanced patterns and valuable segments. These techniques highlight how different features interact to shape spending behavior, making it possible to identify unique customer groups and predict which individuals might respond best to certain marketing initiatives. The result is a data-driven approach to understanding mall clientele, ultimately guiding more informed and effective business decisions.

duminică, 2 aprilie 2023

 

AI for Medicine



    Introduction:

    Explainable artificial intelligence has played a key feature in medical developing in recent years, in this blog I'm going to present examples where AI has been used to accomplish major goal in this domain.

1. Using AI to find the best antibiotic treatments for specific patients. 


This is a 3 step project:
  • Generate a tabular dataset from the ontology, containing features defined on various domains and n-ary features
  • A preference model was then learned from patient profiles, antibiotic features and expert recommendations found in clinical practice guidelines.
  • Then visualize the preference model and its application to all antibiotics available on the market for a given clinical situation, using rainbow boxes, a recently developed technique for set visualization. 
2. Using AI to improve cancer diagnostic from histopathological images

The way this works is by getting specific tissue images overtime, the idea here being that as a cancer tumor develops, successive pictures of the same are will be able to highlight the differences caused by the development of the tumor. We feed those images into a convolutional neural network and we use a Cumulative Fuzzy Class Membership Criterion Classifier to figure out whether or not the tissue picture contains or not cancerous tumors.
    Finally we have a human double check the prediction, the power of this model is that we can detect tumors that might be hard to spot to humans, however at the same time we need someone to make sure because the treatment for cancer is very dangerous and we need to be absolutely sure before we start it.
    The algorithm consists of two phases: the initialization and the learning phase. The initialization phase consists of three processes: data splitting, clustering, and parameters initialization. First, input data are divided into three sets: training sets, validation sets, and testing sets.

3. Predict or draw blood: An integrated method to reduce lab tests

    Serial Lab testing can be harmful to patients especially those in ICUs, in the following pattern a model which can reduce the amount of testing has been proposed, this is done by predicting those tests that can be skipped. Those test would most likely would offer inconclusive results which wouldn't lead to reaching an accurate diagnostic. The model has the ability to cut unnecessary testing by up to 15%, this leads to a betterment of the patient who doesn't need to give that much blood to be tested.
    The reduction comes mostly from future lab tests, in general the same test will be done over a period
in order to check the evolution of the body under supervision, by relying on the evolution of those results up to a certain point we can figure out whether or not future tests are necessary.








Bibliography

[1]    Jean-Baptiste Lamy, Karima Sedki, Rosy Tsopra, “Explainable decision support through the learning and visualization of preferences from a formal ontology of antibiotic treatments.” Journal of Biomedical Informatics 8th March 2020



[2]    Patrik Sabol a, Peter Sinčák a, Pitoyo Hartono b, Pavel Kočan c, Zuzana Benetinová c, Alžbeta Blichárová c, Ľudmila Verbóová c, Erika Štammová c, Antónia Sabolová-Fabianová d, Anna Jašková d, “Explainable classifier for improving the accountability in decision-making for colorectal cancer diagnosis from histopathological images” Journal of Biomedical Informatics 3rd August 2020


[3]    Predict or draw blood: An integrated method to reduce lab tests Author links open overlay panelLishan Yu, Qiuchen                  Zhang, Elmer V. Bernstam Xiaoqian Jiang in Journal of Biomedical Informatics Volume 104, April 2020, 103394























duminică, 26 martie 2023

XAI for Medicine, period May-August 2021

 XAI in Medicine

XAI (Explainable Artificial Intelligence) applied to medicine refers to the use of machine learning and AI algorithms to aid in medical decision-making while also providing clear and interpretable explanations for their predictions. In the context of medicine, XAI can help clinicians and healthcare professionals make more informed decisions by providing insight into how the algorithm arrived at its decision, thereby improving transparency, trust, and accountability. XAI techniques can be applied to a wide range of medical tasks, such as disease diagnosis, treatment recommendation, drug discovery, and clinical trials, with the ultimate goal of improving patient outcomes and advancing medical research.
In this blog post we chose 10 research articles published in the period May-August 2021 and briefly described them in order to see practical examples of how XAI is applied to the field of Medicine

1. Diagnosis of Acute Poisoning using explainable artificial intelligence

Introduction
Medical toxicology is the clinical specialty that treats the toxic effects of substances, for example, an overdose, a medication error, or a scorpion sting. The volume of toxicological knowledge and research has, as with other medical specialties, outstripped the ability of the individual clinician to entirely master and stay current with it. The application of machine learning/artificial intelligence (ML/AI) techniques to medical toxicology is challenging because initial treatment decisions are often based on a few pieces of textual data and rely heavily on experience and prior knowledge. ML/AI techniques, moreover, often do not represent knowledge in a way that is transparent for the physician, raising barriers to usability. Logic-based systems are more transparent approaches, but often generalize poorly and require expert curation to implement and maintain.

Methods

We constructed a probabilistic logic network to model how a toxicologist recognizes a toxidrome, using only physical exam findings. Our approach transparently mimics the knowledge representation and decision-making of practicing clinicians. We created a library of 300 synthetic cases of varying clinical complexity. Each case contained 5 physical exam findings drawn from a mixture of 1 or 2 toxidromes. We used this library to evaluate the performance of our probabilistic logic network, dubbed Tak, against 2 medical toxicologists, a decision tree model, as well as its ability to recover the actual diagnosis.

Conclusions
The software, dubbed Tak, performs comparably to humans on straightforward cases and intermediate difficulty cases, but is outperformed by humans on challenging clinical cases. Tak outperforms a decision tree classifier at all levels of difficulty. Our results are a proof-of-concept that, in a restricted domain, probabilistic logic networks can perform medical reasoning comparably to humans.


2. Convolutional Neural Networks for the evaluation of cancer in Barrett's esophagus: Explainable AI to lighten up the black-box

Even though artificial intelligence and machine learning have demonstrated remarkable performances in medical image computing, their level of accountability and transparency must be provided in such evaluations. The reliability related to machine learning predictions must be explained and interpreted, especially if diagnosis support is addressed. For this task, the black-box nature of deep learning techniques must be lightened up to transfer its promising results into clinical practice. Hence, we aim to investigate the use of explainable artificial intelligence techniques to quantitatively highlight discriminative regions during the classification of early-cancerous tissues in Barrett's esophagus-diagnosed patients. Four Convolutional Neural Network models (AlexNet, SqueezeNet, ResNet50, and VGG16) were analyzed using five different interpretation techniques (saliency, guided backpropagation, integrated gradients, input × gradients, and DeepLIFT) to compare their agreement with experts' previous annotations of cancerous tissue. We could show that saliency attributes match best with the manual experts' delineations. Moreover, there is moderate to high correlation between the sensitivity of a model and the human-and-computer agreement. The results also showed that the higher the model's sensitivity, the stronger the correlation of human and computational segmentation agreement. We observed a relevant relation between computational learning and experts' insights, demonstrating how human knowledge may influence correct computational learning.

3. Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study

Artificial Intelligence (AI) has made leapfrogs in development across all the industrial sectors especially when deep learning has been introduced. Deep learning helps to learn the behavior of an entity through methods of recognising and interpreting patterns. Despite its limitless potential, the mystery is how deep learning algorithms make a decision in the first place. Explainable AI (XAI) is the key to unlocking AI and the black-box for deep learning. XAI is an AI model that is programmed to explain its goals, logic, and decision making so that the end users can understand. The end users can be domain experts, regulatory agencies, managers and executive board members, data scientists, users that use AI, with or without awareness, or someone who is affected by the decisions of an AI model. Chest CT has emerged as a valuable tool for the clinical diagnostic and treatment management of lung diseases associated with COVID-19. AI can support rapid evaluation of CT scans to differentiate COVID-19 findings from other lung diseases. However, how these AI tools or deep learning algorithms reach such a decision and which are the most influential features derived from these neural networks with typically deep layers are not clear. The aim of this study is to propose and develop XAI strategies for COVID-19 classification models with an investigation of comparison. The results demonstrate promising quantification and qualitative visualizations that can further enhance the clinician's understanding and decision making with more granular information from the results given by the learned XAI models.


4. Explainable AI-based clinical decision support system for hearing disorders

In clinical system design, human-computer interaction and explainability are important topics of research. Clinical systems need to provide users with not only results but also an account of their behaviors. In this research, we propose a knowledge-based clinical decision support system (CDSS) for the diagnosis and therapy of hearing disorders, such as tinnitus, hyperacusis, and misophonia. Our prototype eTRT system offers an explainable output that we expect to increase its trustworthiness and acceptance in the clinical setting. Within this paper, we: (1) present the problem area of tinnitus and its treatment; (2) describe our data-driven approach based on machine learning, such as association- and action rule discovery; (3) present the evaluation results from the inference on the extracted rule-based knowledge and chosen test cases of patients; (4) discuss advantages of explainable output incorporated into a graphical user interface; (5) conclude with the results achieved and directions for future work.


5. Improvement of a Prediction Model for Heart Failure Survival through Explainable Artificial Intelligence

Cardiovascular diseases and their associated disorder of heart failure are one of the major death causes globally, being a priority for doctors to detect and predict its onset and medical consequences. Artificial Intelligence (AI) allows doctors to discover clinical indicators and enhance their diagnosis and treatments. Specifically, explainable AI offers tools to improve clinical prediction models that experience poor interpretability of their results. This work presents an explainability analysis and evaluation of a prediction model for heart failure survival by using a dataset that comprises 299 patients who suffered heart failure. The model employs a data workflow pipeline able to select the best ensemble tree algorithm as well as the best feature selection technique. Moreover, different post-hoc techniques have been used for the explainability analysis of the model. The paper's main contribution is an explainability-driven approach to select the best prediction model for HF survival based on an accuracy-explainability balance. Therefore, the most balanced explainable prediction model implements an Extra Trees classifier over 5 selected features (follow-up time, serum creatinine, ejection fraction, age and diabetes) out of 12, achieving a balanced-accuracy of 85.1% and 79.5% with cross-validation and new unseen data respectively. The follow-up time is the most influencing feature followed by serum-creatinine and ejection-fraction. The explainable prediction model for HF survival presented in this paper would improve a further adoption of clinical prediction models by providing doctors with intuitions to better understand the reasoning of, usually, black-box AI clinical solutions, and make more reasonable and data-driven decisions.

6Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers

Problem: An application of Explainable Artificial Intelligence Methods for COVID CT-Scan classifiers is presented. Motivation: It is possible that classifiers are using spurious artifacts in dataset images to achieve high performances, and such explainable techniques can help identify this issue. Aim: For this purpose, several approaches were used in tandem, in order to create a complete overview of the classifications. Methodology: The techniques used included GradCAM, LIME, RISE, Squaregrid, and direct Gradient approaches (Vanilla, Smooth, Integrated). Main results: Among the deep neural networks architectures evaluated for this image classification task, VGG16 was shown to be most affected by biases towards spurious artifacts, while DenseNet was notably more robust against them. Further impacts: Results further show that small differences in validation accuracies can cause drastic changes in explanation heatmaps for DenseNet architectures, indicating that small changes in validation accuracy may have large impacts on the biases learned by the networks. Notably, it is important to notice that the strong performance metrics achieved by all these networks (Accuracy, F1 score, AUC all in the 80 to 90% range) could give users the erroneous impression that there is no bias. However, the analysis of the explanation heatmaps highlights the bias.

7. Brain Hemorrhage Classification in CT Scan Images Using Minimalist Machine Learning

Over time, a myriad of applications have been generated for pattern classification algorithms. Several case studies include parametric classifiers such as the Multi-Layer Perceptron (MLP) classifier, which is one of the most widely used today. Others use non-parametric classifiers, Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Naïve Bayes (NB), Adaboost, and Random Forest (RF). However, there is still little work directed toward a new trend in Artificial Intelligence (AI), which is known as eXplainable Artificial Intelligence (X-AI). This new trend seeks to make Machine Learning (ML) algorithms increasingly simple and easy to understand for users. Therefore, following this new wave of knowledge, in this work, the authors develop a new pattern classification methodology, based on the implementation of the novel Minimalist Machine Learning (MML) paradigm and a higher relevance attribute selection algorithm, which we call dMeans. We examine and compare the performance of this methodology with MLP, NB, KNN, SVM, Adaboost, and RF classifiers to perform the task of classification of Computed Tomography (CT) brain images. These grayscale images have an area of 128 × 128 pixels, and there are two classes available in the dataset: CT without Hemorrhage and CT with Intra-Ventricular Hemorrhage (IVH), which were classified using the Leave-One-Out Cross-Validation method. Most of the models tested by Leave-One-Out Cross-Validation performed between 50% and 75% accuracy, while sensitivity and sensitivity ranged between 58% and 86%. The experiments performed using our methodology matched the best classifier observed with 86.50% accuracy, and they outperformed all state-of-the-art algorithms in specificity with 91.60%. This performance is achieved hand in hand with simple and practical methods, which go hand in hand with this trend of generating easily explainable algorithms.

8. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice


Artificial intelligence (AI) is transforming society and health care. Growing numbers of artificial intelligence applications are being developed and applied to pathology and laboratory medicine. These technologies introduce risks and benefits that must be assessed and managed through the lens of ethics.

There is great enthusiasm for the potential of these AI tools to transform and improve health care. This is reflected in efforts by pathology and laboratory medicine professionals both to enhance the practice of pathology and laboratory medicine and to advance medical knowledge based on the data that we generate.

AI application developers use real-world data sets to “train” their applications to generate the desired output. Applications are ideally validated using separate real-world data sets to assess the accuracy and generalizability of the AI output. In pathology AI, for example, a training or validation data set might consist of digitized microscopic images together with the associated diagnoses as assessed by human expert pathologists.

Clinical laboratories performing in vitro diagnostic tests (including histopathologic diagnosis) constitute one of the largest single sources of objective and structured patient-level data within the health care system. 

Ethics in medicine, scientific research, and computer science all have deep academic roots. The foundational principles of medical ethics as articulated by Beauchamp and Childress are autonomy, beneficence, nonmaleficence, and justice.






9.  Explanatory artificial intelligence for diabetic retinopathy diagnosis

Diabetic Retinopathy (DR) is a leading and growing cause of vision impairment and blindness: by 2040, around 600 million people throughout the world will have diabetes 
a third of whom will have DR (Yau et al., 2012). Early diagnosis is key to slowing down 
the progression of DR and therefore preventing the occurrence of blindness

Annual retinal screening, generally using Color Fundus Photography (CFP), is thus 
recommended for all diabetic patients.

In order to improve DR screening programs, numerous Artificial Intelligence (AI) systems were thus, developed to automate DR diagnosis using CFP (Ting et al., 2019b). However, due to the “black-box” nature of state-of-the-art AI, these systems still need to gain the trust of clinicians and patients.

Nowadays exists an eXplanatory Artificial Intelligence (XAI) that reaches the same level of 
performance as black-box AI, for the task of classifying Diabetic Retinopathy (DR) severity using Color Fundus Photography (CFP). This algorithm, called ExplAIn, learns to segment and categorize lesions in images; the final image-level classification directly derives from these multivariate lesion segmentations. The novelty of this explanatory framework is that it is trained from end to end, with image supervision only, just like black-box AI algorithms: the concepts of lesions and lesion categories emerge by themselves. For improved lesion localization, foreground/background separation is trained through self-supervision, in such a way that occluding foreground pixels transforms the input image into a healthy-looking image. The advantage of such an architecture is that automatic diagnoses can be explained simply by an image and/or a few sentences. ExplAIn is evaluated at the image level and at the pixel level on various CFP image datasets. We expect this new framework, which jointly offers high classification performance and explainability, to facilitate AI deployment.


10.  Machine Learning and XAI approaches for Allergy Diagnosis

This work presents a computer-aided framework for allergy diagnosis which is capable of handling comorbidities. The system was developed using datasets collected from allergy testing centers in South India. Intradermal skin test results of 878 patients were recorded and it was observed that the data contained very few samples for comorbid conditions. Modified data sampling techniques were applied to handle this data imbalance for improving the efficiency of the learning algorithms. The algorithms were cross-validated to choose the optimal trained model for multi-label classification. The transparency of the machine learning models was ensured using post-hoc explainable artificial intelligence approaches. The system was tested by verifying the performance of a trained random forest model on the test data. The training and validation accuracy rate of the decision tree, support vector machine and random forest are 81.62, 81.04 and 83.07 respectively. During evaluation, random forest achieved a rate of 86.39 accuracy overall, and 75% sensitivity for the comorbid Rhinitis-Urticaria class. The average performance of the clinicians before and after using the decision support system were 77.21% and 81.80% respectively




Bibliography

1. Diagnosis of Acute Poisoning using explainable artificial intelligence
    Michael Chary, Ed W. Boyer, Michele M. Burns 
    Computers in Biology and Medicine, Volume 134, July 2021
    
2. Convolutional Neural Networks for the evaluation of cancer in Barrett's esophagus: Explainable AI         to lighten up the black-box
     Luis A. de Souza Jr. , Robert Mendel, Sophia Strasser, Alanna Ebigbo, Andreas Probst, Helmut             Messmann, João P. Papa, Christoph Palm 
     Computers in Biology and Medicine, Volume 135, August 2021

3. Explainable AI for COVID-19 CT Classifiers: An Initial Comparison Study
    Q. Ye, J. Xia and G. Yang
    IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS), 12 July 2021
    
4. Explainable AI-based clinical decision support system for hearing disorders
    Katarzyna A. Tarnowska, Ph.D., Brett C. Dispoto, B.S., and Jordan Conragan, B.S.
    Published online 2021 May 17.

5.Improvement of a Prediction Model for Heart Failure Survival through Explainable Artificial                 Intelligence
   Pedro A. Moreno-Sanchez
   20 Aug 2021

6. Explainable Artificial Intelligence for Bias Detection in COVID CT-Scan Classifiers
    Iam Palatnik de Sousa, Marley M. B. R. Vellasco, Eduardo Costa da Silva
    23 August 2021

7. Brain Hemorrhage Classification in CT Scan Images Using Minimalist Machine Learning
    José-Luis Solorio-Ramírez, Magdalena Saldana-Perez, Miltiadis D Lytras, Marco-Antonio Moreno-      Ibarra, Cornelio Yáñez-Márquez 
    2021 Aug 11

8. The Ethics of Artificial Intelligence in Pathology and Laboratory Medicine: Principles and Practice
     Brian R. Jackson MD,  Ye Ye , James M. Crawford, Michael J. Becich, Somak Roy, Jeffrey R.                 Botkin, Monica E. de Baca, Liron Pantanowitz 

9. ExplAIn: Explanatory artificial intelligence for diabetic retinopathy diagnosis
    Gwenolé Quellec, Hassan Al Hajj, Mathieu Lamard, Pierre-Henri Conze, Pascale Massin, Béatrice        Cochener
    August 2021


10. Machine Learning and XAI approaches for Allergy Diagnosis
      Ramisetty Kavya, Jabez Christopher, Subhrakanta Panda, Y. Bakthasingh Lazarus 
      August 2021



















XAI for Bioinformatics Jan-Apr 2022

Artificial Intelligence (AI) is increasingly being used in bioinformatics to analyze large volumes of biological data and to develop predictive models for various biological phenomena. However, as the complexity of these AI systems grows, it becomes more challenging to understand how they arrive at their conclusions or predictions. This lack of interpretability is a significant challenge for bioinformatics, where understanding the rationale behind a decision or prediction is critical for building trust in the model and identifying potential errors or biases. To address this challenge, the field of explainable AI (XAI) has emerged, which aims to develop AI models that can provide a transparent and interpretable rationale for their predictions. In bioinformatics, XAI can play a crucial role in enabling researchers to gain insights into complex biological systems, improve disease diagnosis and treatment, and identify new drug targets. This article explores the significance of XAI for bioinformatics and how it can help researchers understand and interpret the predictions of AI models in the field.


  1. A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

In the article two frameworks for supporting automatic medical consultation, which are doctor-patient dialogue understanding and task-oriented interaction, using machine learning, are proposed. The authors create a new large medical dialogue dataset with fine-grained annotations and establish five independent tasks including named entity recognition, dialogue act classification, symptom label inference, medical report generation, and diagnosis-oriented dialogue policy. The authors report benchmark results for each task, which demonstrate the usability of the dataset and establish a baseline for future studies. The article aims to improve the efficiency of automatic medical consultation and enhance patient experience.


  1. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review

The article discusses the use of single-cell RNA sequencing (scRNA-seq) to study cell states and phenotypes, as well as the potential applications in understanding biological processes and disease states. It also explores the use of deep learning, an artificial intelligence technique, in scRNA-seq data analysis. The review surveys recent developments in deep learning techniques for scRNA-seq data analysis, identifies key steps that have been advanced by deep learning, and explains the benefits of deep learning over conventional analytic tools. The article also summarizes the challenges faced by current deep learning approaches in scRNA-seq data analysis and discusses potential directions for improving deep learning algorithms in this field.


  1. Mining On Alzheimer’s Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing

The article discusses the use of knowledge graphs to identify opportunities for preventing or delaying neurodegenerative diseases, specifically Alzheimer's Disease (AD). The authors constructed a knowledge graph using biomedical annotations and extracted relations using SemRep via SemMedDB. They used a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The filtered triples were used to train knowledge graph completion algorithms to predict candidates that might be helpful for AD treatment or prevention. The results showed that TransE outperformed other models, and time-slicing techniques were used to further evaluate the prediction results. The authors found supporting evidence for most highly ranked candidates predicted by the model, indicating that their approach can inform reliable new knowledge. The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses in the field of neurodegenerative diseases.


  1. Recommendations for extending the GFF3 specification for improved interoperability of genomic data

The article discusses the GFF3 format, which is widely used to represent the structure and function of genes and other mapped features. However, the flexibility of this format has become an obstacle to standardized downstream processing due to the different notations used by common software packages. To address this issue, the AgBioData consortium has developed recommendations for improving the GFF3 format, including providing concrete guidelines for generating GFF3 and creating a standard representation of the most common biological data types. The AgBioData GFF3 working group suggests improvements for each GFF3 field, as well as special cases of modeling functional annotations and standard protein-coding genes, to increase efficiency for AgBioData databases and the genomics research community.


  1. An Artificial Intelligence Technique for Covid-19 Detection with eXplainability using Lungs X-Ray Images

The article discusses how limited healthcare resources and unequal distribution of healthcare facilities have made disease detection critical in averting epidemics, particularly in the case of COVID-19. PCR testing is commonly used to detect the virus, but deep learning approaches can also be used to classify chest X-RAY images. The study aims to detect COVID-19 by using deep learning approaches to analyze chest X-RAY images of COVID-19 patients, viral pneumonia patients, and healthy patients obtained from IEEE and Kaggle. The dataset was subjected to a data augmentation approach before classification, and multi classification deep learning models were used to classify the three groups.


  1. Deep learning for drug repurposing: methods, databases, and applications

The article discusses the potential of repurposing existing drugs for new therapies, specifically for COVID-19, as a way to accelerate drug development and reduce costs. However, effectively utilizing deep learning models for drug repurposing in complex diseases is still challenging. The article provides guidelines for utilizing deep learning methodologies and tools for drug repurposing, including commonly used bioinformatics and pharmacogenomics databases, sequence-based and graph-based representation approaches, and state-of-the-art deep learning-based methods. The article also presents applications of drug repurposing for COVID-19 and outlines future challenges.


  1. Direct Molecular Conformation Generation

The article presents a new method for generating the three-dimensional coordinates of atoms in a molecule, which is important in bioinformatics and pharmacology. The proposed method directly predicts the coordinates of atoms without predicting intermediate values such as interatomic distances or local structures. The method is invariant to roto-translation of coordinates and permutation of symmetric atoms, and adaptively aggregates bond and atom information to iteratively refine the generated conformation. The method achieves the best results on two datasets and improves molecular docking by providing better initial conformations. The article concludes that the direct approach has great potential and provides a link to the released code.


  1. MPVNN: Mutated Pathway Visible Neural Network Architecture For Interpretable Prediction Of Cancer-Specific Survival Risk

The article presents a novel approach for survival risk prediction using gene expression data in cancer, called Mutated Pathway Visible Neural Network (MPVNN). MPVNN is designed using prior knowledge of biological signaling pathways and gene mutation data-based edge randomization, which simulates signal flow disruption. The study uses the PI3K-Akt pathway as a case study and shows improved cancer-specific survival risk prediction results of MPVNN over standard non-NN and other similar sized NN survival analysis methods. The trained MPVNN architecture interpretation is reliable and points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that are important in risk prediction for particular cancer types. The article highlights the importance of interpretability in survival analysis models for making treatment decisions in cancer.


Conclusions

In conclusion, the application of XAI techniques in bioinformatics has the potential to enhance the accuracy, transparency, and interpretability of machine learning models, enabling scientists to make more informed decisions and gain a better understanding of the underlying biological processes. By providing explanations for the predictions made by these models, XAI can facilitate the identification of relevant biomarkers, aid in the diagnosis of diseases, and assist in the development of personalized treatments. However, there are still challenges to be addressed, such as the need for standardized guidelines and approaches to XAI in bioinformatics, as well as the integration of XAI with existing bioinformatics tools and workflows. As research in this area continues, it is expected that XAI will play an increasingly important role in advancing our understanding of complex biological systems and ultimately lead to improved patient outcomes.


Bibliography

[1] Wei Chen, Zhiwei Li, Hongyi Fang, Qianyuan Yao, Cheng Zhong, Jianye Hao, Qi Zhang, Xuanjing Huang, Jiajie Peng and Zhongyu Wei, “A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets” 

(https://arxiv.org/pdf/2204.08997v3.pdf

[2] Matthew Brendel, Chang Su, Zilong Bai, Hao Zhang, Olivier Elemento, Fei Wang, “Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review” 

(https://www.sciencedirect.com/science/article/pii/S1672022922001668)

[3] Yi Nian, Xinyue Hu, Rui Zhang, Jingna Feng, Jingcheng Du, Fang Li, Larry Bu, Yuji Zhang, Yong Chen and Cui Tao, “Mining On Alzheimer’s Diseases Related Knowledge Graph to Identity Potential AD-related Semantic Triples for Drug Repurposing”

 (https://arxiv.org/pdf/2202.08712.pdf)

[4] Surya Saha, Scott Cain, Ethalinda K. S. Cannon, Nathan Dunn, Andrew Farmer, Zhi-Liang Hu, Gareth Maslen, Sierra Moxon, Christopher J Mungall, Rex Nelson, Monica F. Poelchau, “Recommendations for extending the GFF3 specification for improved interoperability of genomic data”

 (https://arxiv.org/ftp/arxiv/papers/2202/2202.07782.pdf

[5] Pranshu Saxena, Sanjay Kumar Singh, Gyanendra Tiwary, Yush Mittal, Ishika Jain; “An Artificial Intelligence Technique for Covid-19 Detection with eXplainability using Lungs X-Ray Images”

(https://ieeexplore.ieee.org/document/9793240

[6] Xiaoqin Pan, Xuan Lin, Dongsheng Cao, Xiangxiang Zeng, Philip S. Yu, Lifang He, Ruth Nussinov, Feixiong Cheng, “Deep learning for drug repurposing: methods, databases, and applications” 

(https://arxiv.org/ftp/arxiv/papers/2202/2202.05145.pdf)

[7] Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu; “Direct Molecular Conformation Generation” 

(https://arxiv.org/pdf/2202.01356.pdf

[8] Gourab Ghosh Roy, Nicholas Geard, Karin Verspoor, Shan He; “MPVNN: Mutated Pathway Visible Neural Network Architecture For Interpretable Prediction Of Cancer-Specific Survival Risk”

 (https://arxiv.org/pdf/2202.00882.pdf)




VPDA - Mall Customers Data Analysis

Introduction Exploring a dataset of mall customers can be important because it can uncover patterns in spending habits, help identify distin...