Genetic Algorithms and Application in Hand Written Character Recognition System

Following is the seminar report on ” Genetic Algorithms and Application in Hand Written Character Recognition System”.

Abstract

World is full of uncertainties and often it challenges us, with intractable problems having diverse solutions, out of which the best is to be selected. Genetic algorithms can be used in such cases. They can handle complex search spaces and can lead to a solution that is close to the best nimbly. These are basically evolutionary search algorithms based on the principles of natural selection and natural genetics. Biological systems use principle of abundance of production, survival of the fittest and variation (crossover and mutation) to create novel and fitter creatures. Genetic algorithms use these principles to yield better solutions to a problem.

Character recognition is a contemporary research topic. Backbone for all OCR system is feature extraction method. In any OCR system for any language there are many features present out of which a few are selected for classification of the normalized characters. Genetic algorithms can be used to select a subset of this feature set which will add to the efficiency of entire OCR system in terms of accuracy as well as complexity.

Introduction

The problem with the artificial systems is that they are not adaptive and robust but they can be made robust if they incorporate the flexibility and robustness of biological systems. Biological systems have the property of self repair self guidance and reproduction. Where as they barely exist in artificial systems.

John Holland and his colleagues while working in the area of incorporating features of natural systems in artificial ones developed genetic algorithm.

Genetic algorithms are evolutionary search algorithms based on mechanics of natural selection and natural genetics. Genetic algorithms are theoretically and empirically proven to provide robust search in complex search spaces.

Types of search methods

Calculus based methods

In this method one calculates the gradient of the function to be optimized and equates it to 0 to find the maximum.

Disadvantages

This method finds the Local maximum.
Involves the computation of the numerical approximation of derivatives.
Real world functions are rarely continuous.

Enumerative Methods

In this method all the points in the search space are traversed and the best solution is found.

Advantages

The best solution can be found.

Disadvantage

This method cannot be used in case of complex search spaces as to traverse all the points will require a lot of time.

Genetic algorithms

These are evolutionary search algorithms that start with a set of possible solutions selected randomly (called initial Population) at each iteration the set of solutions (population) becomes better and better. Thus we reach very close to the best solution after some iteration.

Relationship with genetics

Life originated on the earth in very simple form but as time passed the creatures became more and more complex adapting to the new conditions through the process of mutation and crossover. Mutation and crossover did not necessarily produce better organisms but they were responsible for changes in the existing species. Some creatures became stronger than before others became weaker. Due to the theory of survival of the fittest only strong organisms survived. Genetic algorithm simulates survival of the fittest among the current population and selects the fit matting pool. It changes the current population by applying the operators named cross over and mutation.

Example

Consider a simple problem to calculate the maximum value of x^2 over {0, 1, 2…31}. This is a linear problem just used to show how genetic algorithms are implemented.

GA approach:

We represent x in the form of a binary coded string, e.g. 01101 «
For ease we chose an initial Population size: 4.
We define a function called fitness function f(x) which determines the relative fitness of a string with respect to other strings.
Based on the value of the fitness function we find probability if the string will be selected in the mating pool.
Actual selection is made using the roulettes wheel. Each string is allocated area on the wheel proportional to its probability. The wheel is rotated and when the wheel stops rotating the string corresponding to the area pointed by a pointer is selected in the next mating pool. Thus fit strings are selected with more probability. Here the string 2 is selected twice, string 1,4 are selected once and string 3 is not selected at all.

Now we change the strings of the mating pool by crossover in which 2 strings of the mating pool are selected randomly and with a probability pc that they will crossover. Site of crossover is selected randomly. The segments of each string at the crossover site are separated and interchanged.

A string is changed through mutation by a probability pm where the string is randomly changed at a point.

Mathematical formulae

Similarity templates

(Schema) is a set of strings

e.g.

111**00={1111100,1110000,1111000,1110100}

* can be replaced by 0,1 to obtain the elements of the set.

Fundamental Theorem of Genetic Algorithm

It is formulated using the theory of probability through the analysis of each operator. It is given as follows

E[m(s,t+1)]>=u(s,t)/f(t)*m(s,t)*[i-pc*ds/(l-1)]*[1-pm*o(s)]

m(s, t) º number of instances of schema s in population at time t
f(t) º average fitness of population at time t
u(s,t) º average fitness of instances of schema s at time t
p_c º probability of single point crossover operator
p_m º probability of mutation operator
l º length of individual bit strings
o(s) º number of defined (non “*”) bits in s

d(s) º distance between rightmost, leftmost defined bits in s

Intuitive Meaning

The theorem shows that “The expected number of instances of a schema in the population tends toward its relative fitness”.

Traveling Salesman Problem

It is one of the most classical problems of history. It is stated as follows.

Find a tour of a given set of cities so that

Each city is visited only once.
Salesman returns to the stating city after the entire journey.
The total distance traveled is minimized.

Various approaches have been used to solve this problem .If number of cities are small Enumerative method can be used but when number of cities become large the search space becomes extremely complex. Using enumerative method for 100 cities 100! Routes need to be evaluated, which would take years of C.P.U time.

In such a situation GA can be used.

Let us use this for 10 cities. We number each city from 1-10. We can represent the entire journey as an ordered list of city. Initially we may generate 30 routes randomly

* *

CityList1 (3 5 7 2 1 6 4 8 9 10)

CityList2 (2 5 7 9 6 8 10 3 4 1)

For crossover

Copy the segment all those numbers of list 2 which are not in the segment

* *

Child 1 ( 5 7 2 1 6 10 3 4 )

As 2 of city list 2 is already present we find the number which 2 replaced i.e. 9

Similarly for 1

* *

Child 1 (9 5 7 2 1 6 10 3 4 8)

Similarly * *

Child 2 (3 5 7 9 6 8 4 1 2 10)

For mutation we use inversion of 2 cities

For example child2 can be changed as

* *

(3 10 7 9 6 8 4 1 2 5)

Genetic Algorithms and Application in Hand Written Character Recognition System

Introduction

In a character recognition system there are many phases starting from data collection to segmentation, normalization, feature extraction, classification.

After the normalization phase the input data is in the form of some standard size matrix each cell of the matrix representing a pixel.

Feature extraction

In practical character recognition problems, a classification function learned through an inductive learning algorithm assigns a given input character to one of the existing classes of the system. Usually, the representation of each input character consists of features since they can distinguish one class of characters from another in a more concise and meaningful way than offered by the raw representation. In many applications, it is not unusual to find problems involving hundreds features. However, it has been observed that, beyond a certain point, the inclusion of additional features leads to a worse rather than better performance. Moreover, the choice of features to represent the characters affects several aspects of the character recognition problem such as accuracy, required learning time and necessary number of samples. This apparent paradox presents us with a feature subset selection problem in automated design of character classifiers. Such a problem refers to the task of identifying and selecting a useful subset of features to be used to represent characters from a larger set of often mutually redundant or even irrelevant features. Therefore, the main goal of feature subset selection is to reduce the number of features used in classification while maintaining acceptable classification accuracy.

Feature subset selection algorithms can be classified into two categories based on whether or not feature selection is performed independently of the learning algorithm used to construct the verifier. If feature selection is done independently of the learning algorithm, the technique is said to follow a filter approach. Otherwise, it is said to follow a wrapper approach. The first one is computationally more efficient but its major drawback is that an optimal selection of features may not be independent of the inductive and representational biases of the learning algorithm that is used to build the classifier. On the other hand, the wrapper approach involves the computational overhead of evaluating a candidate feature subset by executing a selected learning algorithm on the database using each feature subset under consideration. Feature subset selection in the context of practical applications such as handwritten recognition presents a multi criterion optimization function, e.g. number of features and accuracy of classification. Genetic algorithms offer a particularly attractive approach for this kind of problems since they are generally quite effective for rapid global search of large, non-linear and poorly understood spaces. Moreover, genetic algorithms are very effective in solving large-scale problems .This paper focuses on the feature subset selection for handwritten digit recognition through a modified wrapper based multi-criterion approach using genetic algorithms in conjunction with a multi-layer perceptron neural network.

Methodology

Representation and Operators

In this subsection we present the choice of a representation for encoding candidate solutions to be manipulated by the genetic algorithm.

Each individual in the population represents a candidate solution to the feature subset selection problem. Let M be the total number of features available to choose from to represent the characters to be classified. The individual (chromosome) is represented by a binary vector of dimension M. If a bit is a 1, it means that the corresponding feature is selected; otherwise, the feature is not selected. This is the simplest and most straight forward representation scheme. As mentioned before, other genetic representations require the use of appropriate genetic operators.

Since we are representing a chromosome through a binary string, the operator mutation and crossover operates in the following way: Mutation operates on a single string and generally changes a bit at random. Thus, a string 11010 as a consequence of random mutation may change to 11110. Crossover operates on two parent strings to produce two off springs. With a randomly chosen crossover position 4, the two strings 01101 and 11000 yield the offspring 01100 and11001 as a result of crossover.

Parameter Settings

Our experiments used the following parameter settings:

Population size: 30
Number of generation: 1000
Probability of crossover: 0.8
Probability of mutation: 0.007

The parameter settings were based on results of several preliminary runs.

Objective Function and Fitness Evaluation

The fitness evaluation is a mechanism used to determine the confidence level of the optimized solutions to the problem. Usually, there is a fitness value associated with each chromosome, e.g., in a minimization problem, a lower fitness value means that the chromosome or solution is more optimized to the problem while a higher value of fitness indicates a less optimized chromosome.

Our problem consists of optimizing two objectives:

minimization of the number of features
minimization of the error rate of the classifier

Therefore, we are dealing with a multi-objective optimization problem. While in single-objective optimization the optimal solution is usually clearly defined, this does not hold for multi-objective optimization problem. Instead of a single optimum, there is rather a set of alternative trade-offs, generally known as Pareto-optimal solutions. In order to generate the Pareto-optimal set, we are using a classical approach called weighting method, which aggregates the objectives into a single and parameterized objective. Such an aggregation is performed through a linear combination of the objectives,

f(x)=w1*f1(X)+w2*f2(x)

Where w1, w2 are called weights and, without loss of generality, normalized such that w1+w2=1 and f1(x) is the error rate produced by the classifier for a given feature subset (represented by the chromosome) and f2(x) is the number of features selected in the chromosome. Therefore, the fitness of a chromosome is represented by a single and parameterized objective function f(x)

Using genetic algorithms for feature subset selection involves the running of a genetic algorithm for several generations.

Regarding a wrapper approach, in each generation, evaluation of a chromosome (a feature subset) requires training the corresponding neural network and computing its accuracy. This evaluation has to be performed for each of the chromosomes in the population. Since such a strategy is not feasible due to the limits imposed by the learning time of the huge training set considered in this work, we have adopted the strategy proposed by Moody and Utans in [3], which uses the sensitivity of the network to estimate the relationship of input features with network performance.

The sensitivity of the network model to variable is defined as:

Replacement of a variable by its average value removes its influence on the network output. So, in order to evaluate a given feature subset we replace the unselected features by their averages. In this way, we avoid training the neural network and hence turn the wrapper approach feasible for our problem. We call this strategy modified-wrapper.

Feature Set and Classifier

In this section we describe both the feature set and classifier used in our experiments. The feature vector is based on a mixture of concavity and contour-based features while the classifier is a neural network trained with the back propagation algorithm.

The basic idea of concavity measurements is the following: for each white pixel in the component, we verify in each possible direction (Figure 2a), if a black pixel

can be reached. The number of times as well as the directions leading to the black pixels are computed and stored in a vector. When black pixels are reached in four directions

(e.g. point x1 in Figure 2a), we branch out in four auxiliary directions (s1 to s4 in Figure 2b) in order to confirm if the current white pixel is really inside a closed contour.

Those pixels that reach just one black pixel are discarded. Therefore, the concavity measurements are represented by 13 components.

The second part of the vector contains contour information, which is extracted from a histogram of contour directions. Taking into account 8-Freeman directions (Figure2d), we have 8 more components in our feature vector. The last component of this vector corresponds to the character surface. Finally, the image is divided into six regions and 132 components normalized between 0 and 1 are considered

In the next section we will compare two different strategies to carry out this task.

Simple genetic algorithm

In this experiment an SGA was used, i.e., an algorithm based on bit representation, one-point crossover, bit-flip mutation, roulette wheel selection (with elitism). The sole modification that we have carried out was the initialization of the population. We have inserted a chromosome with all features selected. Since we know an admissible solution of the system, it is very interesting to use such knowledge in order to speed up the convergence time of the genetic algorithm.

Iterative genetic algorithm

This approach is based on the work presented by Man etal [4]. The main idea behind this approach is to speed up the convergence time of the algorithm by restricting the search space in each iteration. The algorithm is described as follows:

Let N2 be the maximum allowable topology for searching (132 features in our case). The algorithm is applied and terminated when a solution x1 with (f1=0) is obtained.

(Figure 3a).

Assuming that f2(x1) = N3, the searching domain for the complexity of the topology is reduced from N2 to N3-1. The algorithm is then applied again until another solution with (f1=0) is obtained (Figure 3b).

Repeat step 2 by reducing the searching domain of the topology complexity and eventually the optimal point xopt with f2 (xopt) = N1 will be obtained (Figure 3c).

4. Another iteration with the complexity of the topology bounded by is carried out and no solution may be found. This process can be terminated by setting a maximum number of generations for the algorithm. If no solution is found after the generation exceeds this maximum number, the solution obtained in step 3 would be considered as the optimal solution for the problem (Figure 3d).

The same methodology and parameter settings used in the previous approach are also used here. The differences lie basically in two points: search mechanism and initialization of the population. The search mechanism is clearly illustrated in Figure 3, where the best solution found in the previous iteration is used to initialize the current one. The initialization of the population was modified in this approach in order to allow a more focused search in each iteration. In order to perform this, we have used a Hamming distance d between the injected solution and the chromosomes generated at the initialization time. Such a constraint is defined as

d(S₁,S₂)<t …(4)

Where S1 is the chromosome that represents the best solution found in the last iteration of the algorithm (if the algorithms running the first iteration, S1 will be represented by the chromosome with all features selected), S2 is a chromosome generated at the initialization time and τ is the threshold that defines the maximal distance between S1 and S2.

Figure 4 shows an example of the initialization using τ=5 in such a case, the initialization of the population produces a population entirely located in the sub-space A

Conclusion

In this paper we have discussed two different strategies for feature subset selection using genetic algorithms. All experiments reported in this work use a wrapper-based multi criterion approach in conjunction with a multi-layer perceptron neural network. We have shown that such a scheme became feasible by means of the sensitivity analysis.

We have seen that both approaches discussed in this paper achieved interesting results in reducing the complexity of the classifier. However, the SGA seems more suitable for our problem than the IGA. This conclusion is based on comprehensive experiments carried out on NIST SD19 databases, where the SGA provided a reduction of about 28% of the feature vector and maintained the error rates at the same level of the original feature set.

In spite of the fact that both approaches did not succeed in reducing the error rate of the classifier, we consider they achieved their objective, since the classifier used in our experiments already have a good performance on NIST SD19 database.

For future works we plan to study different approaches of multi-objective optimization as well as to apply different operators and schemes of representation for our problem.

Genetic Algorithms and Application in Hand Written Character Recognition System

Abstract

Introduction