# MultiScaleMicroscopy

## Contents

## Summary

Counting white blood cells in microscopic images is a tool for physicians to diagnose the human body health condition. The aim of this project has been twofold. Firstly, we made an application for synthesizing data with known provided ground truth. These objects are elliptical in appearance of five different classes within in a volume. By considering the depth of focus (DOF),15 photos are taken (captured) from the volume at different depths (see examples below). Secondly, we provided an algorithm that exploits the multiscale nature of the problem in order to improve recognition. The variation of gray values for each pixel in different depths is used as feature extraction source for classifier. The gray value of each Pixel index in different depths makes one input vector. The classifier categorizes the input vectors as background pixels, pixels in individual cells and pixels in overlapping parts. For counting the total number of cells, centroids of overlapping areas are used for bisecting the overlapping ones. Final result describes different synthesized models made by varying noise density and images resolution.The result contains accuracy in finding overlapping positions centroids and also the area of overlapping found correctly in the image. In non-noisy environment the performance for accuracy of finding overlapping positions is 93% and the performance of finding overlapping areas is around 98.4% for density of 150 cells.

## Introduction

Counting white blood cells from microscopic images is a tool for physicians to diagnose human body health condition. Providing a large clinical data set is necessary for calculating performance measurements of different segmentation algorithms. In spite of hard accessing to datasets, experts are needed to use visual analysis to obtain the “Golden Ground Truth’’. In addition, there are negative factors like noise, poor image quality and variant illumination which make it harder to extract the ground truth.

In our project, we have introduced an algorithm for making different models of volumes containing cells with some flexible parameters .The requested numbers of elliptical cells of different classes are generated randomly inside a defined volume. The analyzers could have access to the ground truth for evaluation of the performance. Furthermore, the effect of variation in factors like noise, resolution and illumination variance can be studied independently or in combination in our models. The synthesizing algorithm provides an opportunity to analyzers to experience and consider different conditions and environment properties such as different cell density, noise density and resolution by varying different factors in the algorithm for generating different cell models. Requested numbers of photos by user are captured by a virtual fixed camera by changing its focal length while focusing at different equidistant depths in a volume. Depth of focus effect for cells in different depths is considered.

Identifying overlapping areas (positions) and counting the total number of cells in different environments is our main concerns in this project. For an environment with predefined constant brightness, a method using the gray value average of all depths is applied for finding the overlap places due to its simplicity, less computational time and efficient results. The average of gray values from the sequence of photos besides defining a threshold is used for this approach. Furthermore, different features such as the magnitude of discrete Fourier transform and 1D derivation of the sequence of each pixel gray values in the depths are extracted ´. After preprocessing the data, it is given as input to linear and neural network classifiers to make a robust algorithm for classification of each pixel in three different groups, background pixels, pixels in single cells and pixels in overlapping parts, by considering different noise densities and photos with different resolutions. The results are compared with some pervious works.

## Background

Developing a robust algorithm for Identifying and separating overlapping particles in a microscopic images is one of the main concerns and research work for computer scientists. The current overlap separating approaches are based on prior particle shape and gray value intensity. Like watershed method [10,11], and mathematical morphology method [3,8,9] faces an over segmentation problems and expensive in computational time. [xxx]

According to [Gloria Diaz,Fabio Gonzalez and Eduardo Romer] they proposed a static template matching for a specific cell class knowing their prior knowledge of the cell shape and size. They construct a single cell template model to identify the overlapping particles. However, their approach could not be used for many cell classes which contain different sizes of single cells as well as complex overlapping cells. The approach for separating touching and overlapping objects by [Jose M.Korath, Ali Abas, Jose A.Romagnoli] uses geometrical features and variation of intensity in the places of touching and overlapping places. For noisy and poor resolution images the gray value feature will introduce an error in identifying overlapping objects. Besides, we got a 70% performance for identifying the overlapping places using their methods. Finally, for the total cell counting and segmentation we have used the classical watershed algorithm as a comparison with our approach. Since, we are using 15 layers of multi scale microscopic images as input, where each slice contains some information of the entire particle density in a volume, for a fairly segmentation and cell counting process comparison ,we even gave the ground truth as an input to watershed instead of the slice images. However, it results in an over segmentation outcome.

# Method

### Synthesizing a Multi layered Microscopic Images

In our model we have considered a 3D volume which can contain different cells inside, with their specific x, y and z coordinates. As we know in real world, most of the microscopic cells have a circular or ellipsoid like shapes. Hence, we have used a general ellipse equation in order to model the real cells.

An ellipse is a curve described implicitly by an equation of the second degree
**Ax ^{2}+ Bxy + Cy^{2} + Dx + Ey + F = 0** when the constraint

**B**is less than zero.

^{2}- 4AC#### Sample of Synthesized Microscopic Images :

Current thumbnail generation does not work for the encoding on this page. Press play on the videos below to get a better view of what the data looks like. Encoding is .OGG and distorts the appeareance slightly (lossy compression).

#### Pure Data:

#### With Additive Gaussian noise:

#### With Salt and Pepper noise:

### Methods for classification

#### Data Preprocessing

Standardization is applied on the data set to deal with features with different scales. It transforms each feature in data set to have zero mean and unit variance.

Principal Component Analysis method (PCA) is then applied on the standardized data. It makes an orthogonal transformation to the data set into new linearly uncorrelated features space.

Feature Selection is done by *Backward Feature Elimination* and checking the generalization error to guarantee that all used features are beneficial.

#### Classifiers

Linear classifier and Neural Networks are used to classify all pixels into 3 different classes, background pixels,pixels captured by single cells and pixels captured by areas of overlapping in 2D output image.

Linear classifier is used due to its simplicity, speed of classification and as a measurement for comparison to more complex models, here Neural Networks.

Neural Network with one hidden layer is used for classification as a nonlinear classifier. The complexity of the Neural Network, number of hidden nodes,is decided by the least generalization error while changing the node numbers.

#### Watershed

The output of the classification part is used as the input for counting cells part.For comparing the result of counting cell part of the project with previous works, Watershed algorithm is chosen as one. Watershed uses binary images as its input. As it is presented in the videos, none of the 15 photos contain all information about the volume. Therefore, instead of using a binary image from one of the photos, the binary image of the ground truth is used as the input for watershed.In spite of giving this privilege to Watershed Algorithm here, it will be shown that the counting result of watershed algorithm suffers from over segmentation.

# Analysis Method or Test (Experiment) Setup:

As we mentioned in section xx1, we have tried to compare our approach based on a classifier which uses the features (1D derivation, FFT and the Moments) of the 15 scaled photos as an input with the [Jose M.Korath, Ali Abas, Jose A.Romagnoli] which uses variation of intensity.

### Experiment 1

For Experiment1 we used an averaged gray value image, obtained from 15 scaled photos, as an input to analyse the system. External factors like noise and resolution are considered.

### Experiment 2

Similarly for Experiment2 we combined different features (1D derivative, FT and Moment) of the 15 scaled photos as an input to a classifier (Neural Networks and Linear models) External factors like noise and resolution are considered.

### Experiment 3

Here we have compared our approch with watershed algorithm for counting the total number of particles inside a volume and the performance result is shown using bar graph.

# Results

The graphs below illustrate the performance of finding overlapping particles in a noisy environment and different resolution images. We have used a Venn diagram for illustrating the accuracy in finding overlapping positions, where B is for our model output, G is for ground truth image and B n G is their intersection. For measuring the relative overlapping particles we have used the Dice coefficient[Dice LR. Measures of the amount of ecologic association between species ecology.J Ecology. 1945;26:297–302.] which is mostly used for measuring spatial overlap for binary images. The Dice coefficient D is given by <math> D=\frac{2|B n G|}{|B|+|G|}\times 100</math> and the values for the coefficients varies from 0 (non overlap) and 1 (perfectly overlapped).

### Result 1

**Figure 1:**Performance versus noise density using State of art

**Figure 2:**Performance versus Resolution using State of art

### Result 2

**Figure 3:**Performance versus noise density using Linear network

**Figure 4:**Performance versus noise density using Neural network

**Figure 5:**Performance versus Resolution using Linear network

**Figure 6:**Performance versus Resolution using Neural network

### Result 3

**Figure 8:**Performance versus number of cells

# Conclusion

As we described in sec xxx we have compared the output results between state of art and our method in noisy environment and also independently, in images with different resolutions.

The performance result using the state of art is around 70%. Similarly, by using our approach we have improved to 93% in noise free environment, with highest resolution images (i.e. 1000x1000 pixels) by the neural network classifier.

By taking 70 different models which contains 1106 overlapping places of ground truth image (reference image) and the missing overlaps from classifier output is 18 with respect to ground truth image. Hence, the performance of the missed overlaps is around 98.4%, which is better than the performance measure using disc.

In our case, 8 features are extracted for each observation. According to Hermmetian symmetry in Discrete Fourier Transform coefficients the magnitude of the first eight ones are chosen to describe each data element for classification using linear models and neural network while [Jose M.Korath, Ali Abas, Jose A.Romagnoli] use one of the 8 features which is the dc part , average gray value intesinisy.

Similarly, we have compared the performance of total cell counting between water shed approach and our method in different cell density models (i.e. 100,150, 200,250 and 300).

The cell counting performance, in noise free and high resolution images, using water shed in different cell density models varies from 20% to 25 %, while our method performance varies from 95% to 99%.

The performance of counting cells for water shed is calculated in a situation that water shed algorithm was fed with the ground truth 2D image, to compare the water shed result with ours in a fair way.

Even by accessing to Golden Ground Truth, watershed suffers too much from over segmentation that causes a visible difference between watershed output and our method output in cell counting even when we are using our classifier output (not the ground truth).

# Future work

In the synthesizing part, a complete 3D model can be designed which can be viewed from different angles, and also a pattern can be added to cells structures as a future work.

Meanwhile, for the performance measure of overlapping places it can be used another accuracy measurement like Gaussian function and tent function for expressing the performance accuracy in a more fuzzy way.

For detailed information of the thesis project you can find on thedetailed page.