1. Introduction
Deep learning technologies are currently used to detect pneumonia. Unfortunately, standard deep learning approaches for pneumonia detection take less into consideration the effects of the lung X-ray picture backdrop on the model’s testing effect, limiting the model’s accuracy development. The chosen paper focuses on a deep learning approach that takes image background elements into account and studies the suggested method with explainable deep learning for explainability. The main idea is to eliminate the image background, increase the accuracy of pneumonia recognition and use the Grad-CAM approach to create an explainable deep learning model for pneumonia identification. The proposed approach enhances pneumonia detection accuracy, with the best accuracy of VGG16 reaching 95.6%.
2. Materials and methods
2.1. Overview
The deep learning method proposed in our chosen paper considers image background factors, and explainable deep learning is used to analyse the proposed method.
Step 1.
A collection of X-ray images datasets of pneumonia patients and normal patients from publicly-available websites is collected, then raw images are organized and cleaned.
The authors obtained a dataset of 5840 chest X-ray images of children aged 1 to 5 years with pneumonia or without pneumonia from the Guangzhou Women's and Children's Medical Center. Out of these, 4265 images were of patients with pneumonia and 1575 were of healthy subjects.
The authors of the proposed paper obtained a dataset of almost 6000 chest X-rays images of children aged 1 to 5 years with or without pneumonia from a chinese women’s and children’s medical center. Out of these, a little over 4000 images were of patients with pneumonia and almost 1500 were healthy subjects. To ensure accurate training and testing results, they selected an equal number of images from both groups and randomly split them into an 80% training set and a 20% testing set. They then removed the background of all the images and retained only the heart and lung features, resulting in a chest X-ray image dataset without a background. Finally, before training and testing, the images were scaled to a uniform size of 224 x 224.
Data augmentation is a technique used to prevent overfitting and expand small datasets. It involves applying transformations such as changes in size, orientation, and color to the original images to create new variations, resulting in a more diverse and valuable dataset. This method is widely used in deep learning for image classification tasks.
There are various methods of data augmentation. These include flipping the image (horizontally or vertically), rotating it at different angles, scaling the image (enlarging or reducing without changing content), cropping it (centrally or randomly), adjusting brightness, contrast, and hue (with random perturbations). These methods are illustrated in Figure 2.
Data augmentation can improve model generalization, reduce overfitting, and improve model robustness. In medical image analysis, where datasets are typically small, like in this paper, data augmentation is used to expand the available data.
Step 2.
ResNet50 and the VGG16 pretrained model, suitable deep learning models, are selected in order to build deep learning models for pneumonia X-ray image classification using the idea of transfer learning. The training and testing are first conducted with the background condition included.
Transfer learning:
Transfer learning is a method used to transfer knowledge gained from solving one problem to another similar or related problem, especially when the amount of available data is limited for training. This approach can reduce the amount of training data required for deep learning, reduce data dependency, and solve problems with a small amount of data. In this study, transfer learning is used to build the deep learning model using pre-trained models, which reduces dependence on the amount of data and improves accuracy due to the small dataset. This principle is illustrated in Figure 3.
Deep learning model construction:
This paper employs a transfer learning approach to build deep learning models for identifying pneumonia, utilizing pre-trained ResNet50 and VGG16 models from the ImageNet dataset. The final models constructed are the ResNet50 model and the VGG16 model, which are explained in detail below.
The ResNet50 model, a deep residual network, was proposed by Kaiming He et al. in 2015. ResNet introduced the residual block to address the problem of gradient disappearance resulting from increasing the depth of the neural network. This block also speeds up the training process and improves model accuracy. The ResNet50 model with a residual blockage is used in this paper. This reduces the number of input channels by a 1x1 convolution, thereby decreasing the overall model parameters and increasing computation speed.
VGG16 model:
The VGG16 model is designed with a simple structure and utilizes small 3×3 convolutional kernels instead of larger ones to increase the network's depth and reduce its parameters, thereby improving the network's fitting ability. Compared to the AlexNet model architecture, the VGG16 network structure is deeper and wider, but the increase in computation is controlled.
Step 3.
The background of the image is removed, and the training and testing are performed using ResNet50 and VGG16 pretrained deep learning models for X-ray images without the background condition to obtain a deep learning model for pneumonia X-ray image recognition considering the image background.
Background removal:
The VGG16 model uses small 3x3 convolutional kernels to increase network depth and improve fitting ability, with a simpler structure than AlexNet but deeper and wider. The increase in computation is controlled.
Deep Learning Model Construction:
The proposed study uses two deep learning models, ResNet50 and VGG16 pretrained, for pneumonia recognition without considering the background. The models are trained and tested using a pneumonia X-ray image dataset with the background removed to develop a model for recognizing pneumonia in X-ray images.
Step 4.
A visual explainability analysis of the deep learning model is performed using the GradCAM method to further obtain a trustworthy deep learning model for practical application of pneumonia recognition.
The paper uses the Grad-CAM method for visualization to explain the deep learning model for pneumonia recognition, which leads to a trustworthy model. Grad-CAM is preferred over CAM as it does not require a global average pooling operation, which can be difficult for CNN models with multiple fully connected layers. The CAM method is only applicable if the CNN model structure is modified by replacing the fully connected layer with global average pooling.
The Grad-CAM method calculates the weights of each channel of the feature map using the weighted gradient class activation map, which is based on the network back propagation gradient and the output vector. The weights are then used to weight and sum the corresponding feature map, and the resulting class activation map is obtained using the ReLU activation function. This method visualizes the attention of the neural network in the form of a heat map that shows the important regions of the image for target prediction and classification.
The Grad-CAM method can be applied to different CNN models without altering their structure, and it provides better visualization and more fine-grained details compared to the CAM method. It also avoids the tradeoff between model accuracy and explainability, and does not require retraining. A heatmap generated using the Grad-CAM method for pneumonia identification is presented in Figure 4, where red areas indicate regions of high concern. The heatmap highlights the lungs as the most significant area for pneumonia identification.
3. Results
The experimental data used had to first be filtered and cleaned, as well as ensure an equal number of pneumonia and normal images. This was done because only the binary pneumonia and normal X-ray image recognition was performed.
In total, the dataset contains 2500 images, from which 80% of them are randomly selected for the training set. Each image then had to be processed using manual methods to remove the background.
The deep learning models used are ResNet50 pretraining model and VGG16 pretraining model. The loss function used was torch.nn.CrossEntropyLoss() and torch.optim.SGD() as the optimizer.
Results without X-Ray Background Removal
In the figure below can be seen a comparison between 10 epochs for the ResNet50 model and VGG16 model. What can be observed here is that the highest accuracy for both models after data augmentation is higher than before data augmentation, which is due to the small number of the samples in the dataset. Therefore, data augmentation has to be used to improve the accuracy of the model. Furthermore, based on the comparison, the accuracy of the VGG16 model is higher than the ResNet50 model, regardless of data augmentation. This is because the problem of binary classification is simple, and the dataset is smaller.
After data augmentation, the data set is extended, thus leading to better model training and better accuracy for both models. Even so, in this particular situation, VGG16 model is better for pneumonia identifications on small datasets because it has a higher accuracy rate.
Results considering X-Ray Image Background
From the images, only the lung and heart components are kept. As in the case before, ResNet50 and VGG16 models are used.
1. Analysis results when removing the X-Ray Image Background
Comparing the two models before and after data augmentation with 10 epochs, as seen in the graphical representation above, the accuracy of ResNet50 after the background removal increases with each epoch after data augmentation and then it tends to stabilize. The highest accuracy rate for the ResNet50 after data augmentation is improved when compared to the case without data augmentation, but the effect is not really obvious. When comparing the accuracy increase for the VGG16 model when having or not data augmentation, the accuracy increases greatly when having data augmentation.
After data enhancement, the VGG16 model still outperforms the ResNet50 model in overall results and accuracy. Even so, the VGG16 model manifests some fluctuations, but overall, the accuracy is high and it has strong prospects for the development of pneumonia identification.
2. Before and after background removal
For this part the graphical data was not added, but the results show that the ResNet50 model’s accuracy for identification of pneumonia after the background removal is the same as before removing the background. This happens because the model has a deeper depth. thus being able to achieve higher accuracy for small datasets. On the other hand, the VGG16 model’s accuracy for pneumonia identification is greatly improved after removing the background regardless of data augmentation. The highest accuracy that was obtained for the VGG16 model reached 95.6% (mainly due to the simple structure of the model when compared to ResNet50).
The discussed paper compares the proposed model with previous architectures related works and out of all of them it scores the highest accuracy of 95.6% with the note that the datasets are not the same. Even so, this means that pneumonia identification problem can achieve better results.
Explainability Analysis of Deep Learning Model
Using the Grad-CAM method, five normal and pneumonia lung X-ray images were randomly selected before and after the removal of the background. The heatmaps of ResNet50 and VGG16 models were generated before and after background removal to identify pneumonia X-ray images, as shown in figures 9-12. Figure 9 displays the heatmap of normal lung X-ray images before the background removal, while figure 10 shows the heatmap of pneumonia X-ray images after the background removal. The heatmap of healthy lung X-ray images after background removal is illustrated in figure 11, while figure 12 shows the heatmap of pneumonia X-ray images after the background removal. The red color in the heatmap signifies high attention, blue signifies low attention, and yellow indicates the attention between the two.
After analyzing Fig. 9∼12, it is evident that both ResNet50 and VGG16 models can detect important lung areas. However, the ResNet50 model has a broader focus range, and some images are recognized inaccurately. On the other hand, the VGG16 model has a finer focus range, accurately identifying key areas of the lung texture, thus achieving more precise classification of pneumonia X-ray images. This could explain why the VGG16 model performs better than the ResNet50 model in terms of accuracy.
The ResNet50 and VGG16 models identified important lung areas before removing the background but also generated unnecessary attention in the abdomen and mandible, among other areas, which could affect the final model's accuracy. To address this issue, the researchers removed irrelevant background factors, such as the mandible, abdomen, and arm, and retained only the heart and lung parts relevant to pneumonia identification. As a result, attention to irrelevant locations was significantly reduced, and the models focused more accurately on key lung areas, resulting in improved accuracy in pneumonia recognition and enhanced model performance.4. Discussions
Transfer learning and data augmentation are used to reduce the problem of deep learning models requiring massive training data and improve the accuracy of pneumonia X-ray image recognition. However, there are still limitations, such as the small dataset used, and better test results cannot be obtained for some models that apply to large sample datasets. In the future, more models will be compared and analyzed, and a deep learning model that is more suitable for the classification of lung X-ray images will be selected. The authors hope to achieve a rapid diagnosis of X-ray pictures of COVID-19 using deep learning methods in the future and propose a more accurate image segmentation and extraction method for efficient extraction of lung X-ray images with the background removed.
5. Conclusions
The method uses pre-existing deep learning models, removes the image background, and applies the Grad-CAM method to obtain an explainable deep learning model. The results show that considering the image background improves the accuracy of the deep learning model for pneumonia identification. Additionally, after removing the image background factors, the attention to key locations of the lungs increases, as shown by the Grad-CAM heatmap results. Future work is planned to establish a high-precision and trusted pneumonia recognition model.










Niciun comentariu:
Trimiteți un comentariu