The main objective of this project is detecting and identifying house-number signs from street view images. The dataset I am considering for this project is street view house numbers dataset taken from 5 has similarities with MNIST dataset. The svhn dataset has more than 600,000 labeled characters and the images are in .png format. After extract the dataset I resize all images in 32×32 pixels with three color channels. There are 10 classes, 1 for each digit. Digit ‘1’ is label as 1, ‘9’ is label as 9 and ‘0’ is label as 10. 5 The dataset is divided into three subgroups: train set, test set, and extra set. The extra set is the largest subset contains almost 531,131 images. Correspondingly, train dataset has 73,252 and test data set has 26,032 images.
Figure 3: Example of the original, variable-resolution, colored
house-number images with character level bounding
Characters in the images are level in bounding boxes and then bounding box information is stored in digitStruct.mat instead of drawn directly on the images in the dataset. digitStruct.mat file contains a struct called digitStruct with the same length as the number of original images. Each element in digitStruct has the following fields: “name” which is a string containing the filename of the corresponding image. “bbox” is a struct array that contains the position, size, and label of each digit bounding box in the image. For example, digitStruct(300). Bbox (2). height gives the height of the 2nd digit bounding box in the 300th image. 5
This is very clear from Figure 3 that in SVHN dataset maximum house numbers signs are printed signs and they are easy to read. 2 Because there is a large variation in font, size, and colors it makes the detection very difficult. The variation of resolution is also large here. (Median: 28 pixels. Max: 403 pixels. Min: 9 pixels). 2 The graph below indicates that there is the large variation in character heights as measured by the height of the bounding box in original street view dataset. That means the size of all characters in the dataset, their placement, and character resolution is not evenly distributed across the dataset. Due to data are not uniformly distributed it is difficult to make correct house number detection.