p.p1 16.0px} p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; text-align:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center; font: 14.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 16.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}
p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 15.0px}
p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: center; line-height: 14.0px; font: 16.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}
p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 16.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 18.0px}
p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 14.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}
p.p8 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 14.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000; min-height: 16.0px}
p.p9 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 12.0px Times; color: #000000; -webkit-text-stroke: #000000}
p.p10 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 12.0px Times; color: #000000; -webkit-text-stroke: #000000; min-height: 14.0px}
li.li3 {margin: 0.0px 0.0px 0.0px 0.0px; text-align: justify; line-height: 14.0px; font: 12.0px ‘Times New Roman’; color: #000000; -webkit-text-stroke: #000000}
span.s1 {font-kerning: none}
span.Apple-tab-span {white-space:pre}
ol.ol1 {list-style-type: decimal}


We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Robots are used successfully in many areas today, particularly in industrial production, military operations, deep sea drilling, and space exploration. This success drives the interest in the feasibility of using robots in human social environments, particularly in the care of the aged and the handicapped. In social environments, humans communicate easily and naturally by both speech (audio) and gesture (vision) without the use of any external devices (like keyboards) requiring special training. Robots have to adapt to human modes of communication to promote a more natural interaction with humans. Given a choice between speech and gesture, some researchers have opined that gesture recognition would be more reliable than speech recognition because the latter would need a greater number of training datasets to deal with the greater variability in human voice and speech 1. 

This project is about implementing the control of a robot (vehicle) through simple hand gestures. The main motivation is the desirability of developing robots that can interact smoothly with humans without the need of any special devices.

The objectives of the project are: 

Study and apply the needed tools, namely: 

a) A robot
b) The OpenCV Computer Vision Library (version 2.0) 
c) Algorithms for computer vision and artificial intelligence 
2) Develop a computer vision application for simple gesture recognition 
3) Test the computer application 
4) Document the results of the project 

During the project, four gestures were chosen to represent four navigational commands for the robot, namely Move Forward, Move Left, Move Right, and Stop. A simple computer vision application was written for the detection and recognition of the four gestures and their translation into the corresponding commands for the robot. The appropriate OpenCV functions and image processing algorithms for the detection and interpretation of the gestures were used. Thereafter, the program was tested on a webcam with actual hand gestures in real-time and the results were observed. 

The results of the project demonstrated that a simple computer vision application can be designed to detect and recognise simple hand gestures for robot navigational control based on simple heuristic rules. The program was able to correctly interpret the gestures and translate it into the corresponding commands most of the time. 



The sense of sight is arguably the most important of man’s five senses. It provides a huge amount of information about the world that is rich in detail and delivered at the speed of light. However, human vision is not without its limitations, both physical and psychological. Through digital imaging technology and computers, man has transcending many visual limitations. He can see into far galaxies, the microscopic world, the sub-atomic world, and even “observe” infra-red, x-ray, ultraviolet and other spectra for medical diagnosis, meteorology, surveillance, and military uses, all with great success. 

While computers have been central to this success, for the most part man is the sole interpreter of all the digital data. For a long time, the central question has been whether computers can be designed to analyse and acquire information from images autonomously in the same natural way humans can. According to Gonzales and Woods 2, this is the province of computer vision, which is that branch of artificial intelligence that ultimately aims to “use computers to emulate human vision, including learning and being able to make inferences and take actions based on visual inputs.” 

The main difficulty for computer vision as a relatively young discipline is the current lack of a final scientific paradigm or model for human intelligence and human vision itself on which to build a infrastructure for computer or machine learning 3. The use of images has an obvious drawback. Humans perceive the world in 3D, but current visual sensors like cameras capture the world in 2D images. The result is the natural loss of a good deal of information in the captured images. Without a proper paradigm to explain the mystery of human vision and perception, the recovery of lost information (reconstruction of the world) from 2D images represents a difficult hurdle for machine vision 4. However, despite this limitation, computer vision has progressed, riding mainly on the remarkable advancement of decades-old digital image processing techniques, using the science and methods contributed by other disciplines such as optics, neurobiology, psychology, physics, mathematics, electronics, computer science, artificial intelligence and others. 

Computer vision techniques and digital image processing methods both draw the proverbial water from the same pool, which is the digital image, and therefore necessarily overlap. Image processing takes a digital image and subjects it to processes, such as noise reduction, detail enhancement, or filtering, for the purpose of producing another desired image as the end result. For example, the blurred image of a car registration plate might be enhanced by imaging techniques to produce a clear photo of the same so the police might identify the owner of the car. On the other hand, computer vision takes a digital image and subjects it to the same digital imaging techniques but for the purpose of analysing and understanding what the image depicts. For example, the image of a building can be fed to a computer and thereafter be identified by the computer as a residential house, a stadium, high-rise office tower, shopping mall, or a farm barn. 5 

Russell and Norvig 6 identified three broad approaches used in computer vision to distill useful information from the raw data provided by images. The first is the feature extraction approach, which focuses on simple computations applied directly to digital images to measure some useable characteristic, such as size. This relies on generally known image processing algorithms for noise reduction, filtering, object detection, edge detection, texture analysis, computation of optical flow, and segmentation, which techniques are commonly used to pre-process images for subsequent image analysis. This is also considered an “uninformed” approach. 

The second is the recognition approach, where the focus is on distinguishing and labelling objects based on knowledge of characteristics that sets of similar objects have in common, such as shape or appearance or patterns of elements, sufficient to form classes. Here computer vision uses the techniques of artificial intelligence in knowledge representation to enable a “classifier” to match classes to objects based on the pattern of their features or structural descriptions. A classifier has to “learn” the patterns by being fed a training set of objects and their classes and achieving the goal of minimising mistakes and maximising successes through a step-by-step process of improvement. There are many techniques in artificial intelligence that can be used for object or pattern recognition, including statistical pattern recognition, neural nets, genetic algorithms and fuzzy systems. 

The third is the reconstruction approach, where the focus is on building a geometric model of the world suggested by the image or images and which is used as a basis for action. This corresponds to the stage of image understanding, which represents the highest and most complex level of computer vision processing. Here the emphasis is on enabling the computer vision system to construct internal models based on the data supplied by the images and to discard or update these internal Real-Time Hand Gesture Detection and Recognition Using Simple Heuristic Rules Page 4 of 57 models as they are verified against the real world or some other criteria. If the internal model is consistent with the real world, then image understanding takes place. Thus, image understanding requires the construction, manipulation and control of models and at the moment relies heavily upon the science and technology of artificial intelligence. 

2.2   OPENCV

OpenCv is a widely used tool in computer vision. It is a computer vision library for real-time applications, written in optimised C/C++, which works with the Windows, Linux and Mac platforms. It is freely available as open source software from https://opencv.org/. 

OpenCv was started by Gary Bradsky at Intel in 1999 to encourage computer vision research and commercial applications and, side-by-side with these, promote the use of ever faster processors from Intel 7. OpenCV contains optimised code for a basic computer vision infrastructure so developers do not have to re-invent the proverbial wheel. The reference documentation for OpenCV is at https://docs.opencv.org/. The basic tutorial documentation is provided by Bradsky and Kaehler 6. According to its website, OpenCV has been downloaded more than 14 million times and has a user group of more than 47 thousand members. This attests to its popularity. 

A digital image is generally understood as a discrete number of light intensities captured by a device such as a camera and organised into a 2-D matrix of picture elements or pixels, each of which may be represented by number and all of which may be stored in a particular file format (such as jpg or gif) 8. OpenCV goes beyond representing an image as an array of pixels. It represents an image as a data structure called an IplImage that makes immediately accessible useful image data or fields, such as: 
• width – an integer showing the width of the image in pixels 
• height – an integer showing the height of the image in pixels 
• imageData – a pointer to an array of pixel values 
• nChannels – an integer showing the number of colors per pixel 
• depth – an integer showing the number of bits per pixel 
• widthStep – an integer showing the number of bytes per image row 
• imageSize – an integer showing the size of in bytes
• roi – a pointer to a structure that defines a region of interest within the image 9. 

OpenCV has a module containing basic image processing and computer vision algorithms. These include: 
• smoothing (blurring) functions to reduce noise, 
• dilation and erosion functions for isolation of individual elements, 
• floodfill functions to isolate certain portions of the image for further processing, 
• filter functions, including Sobel, Laplace and Canny for edge detection, 
• Hough transform functions for finding lines and circles, 
• Affine transform functions to stretch, shrink, warp and rotate images, 
• Integral image function for summing subregions (computing Haar wavelets), 
• Histogram equalisation function for uniform distribution of intensity values, 
• Contour functions to connect edges into curves, 
• Bounding boxes, circles and ellipses, 
• Moments functions to compute Hu’s moment invariants, 
• Optical flow functions (Lucas-Kanade method), 
• Motion tracking functions (Kalman filters), and 
• Face detection/ Haar classifier. 

OpenCV also has an ML (machine learning) module containing well known statistical classifiers and clustering tools. These include: 
• Normal/ naïve Bayes classifier, 
• Decision trees classifier, 
• Boosting group of classifiers, 
• Neural networks algorithm, and 
• Support vector machine classifier.


In computer vision a physical object maps to a particular segmented region in the image from which object descriptors or features may be derived. A feature is any characteristic of an image, or any region within it, that can be measured. Objects with common features may be grouped into classes, where the combination of features may be considered a pattern. Object recognition may be understood to be the assignment of classes to objects based on their respective patterns. The program that does this assignment is called a classifier. 10 
The general steps in pattern recognition may be summarised in Figure 1 below:       
Figure 1. General pattern recognition steps. 3 

The most important step is the design of the formal descriptors because choices have to be made on which characteristics, quantitative or qualitative, would best suit the target object and in turn determines the success of the classifier. 

In statistical pattern recognition, quantitative descriptions called features are used. The set of features constitutes the pattern vector or feature vector, and the set of all possible patterns for the object form the pattern space X (also known as feature space). Quantitatively, similar objects in each class will be located near each other in the feature space forming clusters, which may ideally be separated from dissimilar objects by lines or curves called discrimination functions. Determining the most suitable discrimination function or discriminant to use is part of classifier design. 

A statistical classifier accepts n features as inputs and gives 1 output, which is the classification or decision about the class of the object. The relationship between the inputs and the output is a decision rule, which is a function that puts in one space or subset those feature vectors that are associated with a particular output. The decision rule is based on the particular discrimination function used for separating the subsets from each other. 

The ability of a classifier to classify objects based on its decision rule may be understood as classifier learning, and the set of the feature vectors (objects) inputs and corresponding outputs of classifications (both positive and negative results) is called the training set. It is expected that a well-designed classifier should get 100% correct answers on its training set. A large training set is generally desirable to optimise the training of the classifier, so that it may be tested on objects it has not encountered before, which constitutes its test set. If the classifier does not perform well on the test set, modifications to the design of the recognition system may be needed.