This project involves the development of a data preprocessing application for an AI-supported lung disease diagnosis system. The application focuses on extracting the lung region from X-ray images, ensuring that the subsequent analysis is both accurate and efficient. By using advanced image processing techniques, the application identifies and crops the lung area, removing irrelevant parts of the image. This preprocessing step is crucial for enhancing the performance of AI models in diagnosing lung diseases such as pneumonia, tuberculosis, and lung cancer. The clean, focused images enable the AI to make more accurate predictions and diagnoses, ultimately improving patient outcomes.
1. First, the film is converted to black and white with a high threshold value to detect text and symbols.
6. The white pixels at the far left, far right, top, and bottom of the film are detected to determine the body's frame.
8. The frame is cropped from the original image, and the body image is prepared to be sent to the lung detection function.
Note
The lung detection function operates in three stages. The first stage attempts to find the entire lung by matching it with templates. If a full lung match is not found, the process moves to the second stage. In this stage, the left and right lungs are searched for separately. If the left lung is found, the right lung is then searched for. If both are found, the images are framed from their furthest coordinates to obtain the result. However, if either the left or right lung, or both, cannot be found, the process moves to the third stage. This stage searches sequentially for the upper left, lower left, upper right, and lower right parts of the lung. If all four corners are found a frame is created from the furthest coordinates to obtain the lung image.
12. Second Stage - Creating a Frame from the Furthest Points of Both Images and Cropping to Complete Lung Detection
13. Third Stage - Matching All Corners (Instead of adding a separate image for each corner, only the matching image for the last corner is included.)
As of 16.09.2022, the algorithm has been modified. The working principle and pseudo code of the old algorithm are provided below.
1. First, the film is converted to black and white with a high threshold value to detect text and symbols.
5. The white pixels at the far left, far right, top, and bottom of the film are detected to determine the body's frame. The image processed in step 4 is cropped from these frames and sent to the lung detection function.
9. The black pixels at the far left, far right, top, and bottom of the image are detected to determine the frame of the rib cage.
-
Start.
-
Convert the image to black and white with a high threshold value (240/255). (a)
-
Store the coordinates of the white pixels.
-
Paint the pixels at these coordinates black in the original image. (b)
-
Apply blur ((x/30), (y/15)) to the image and convert it to black and white with a normal threshold value (127/255). (c)
-
Find the furthest white pixels in all four directions (up, down, left, right) and frame the image from these points.
-
Crop the framed image to create a new image. (d)
-
Paint the black pixels outside the new image white. (e)
-
Apply a high amount of blur ((x/5), (y/5)) to the image and convert it to black and white with a normal threshold value (127/255). (f)
-
Find the furthest black pixels in all four directions (up, down, left, right) in the new image and frame the image from these points.
-
Crop the coordinates of the frame from the original image to obtain the lung image. (g)
-
End.
X = Horizontal length of the image (number of pixels)
Y = Vertical length of the image (number of pixels)
X-ray films contain biomedical images as well as information such as the X-ray number, patient name, hospital, and doctor's name. To remove this information and reduce the margin of error during lung detection, the threshold value used in the initial black and white conversion is higher than normal.
The first blurring operation is to eliminate outlier pixels that can be considered noise on the X-ray film.
The second blurring operation is kept high to determine the coordinates of the corners of the lung. This prevents the imaging of non-lung entities (such as gas build-up in the abdomen, which appears darker and denser than the lungs, often seen in children).