Limiting a CNN’s complexity, and hence memory, forces it to generalise a solution to the prob-lem at hand; a crude, but effective, method to prevent overfitting. In standard multi-layerperceptrons this typically manifest itself as limiting the number of hidden layers within thenetwork, and the number of neurons per layer. In CNNs one can also constrain the filter sizeand number of max pooling layers. While limiting the network in this was does restrict itsability to model complex non-linear patterns it is deemed unlikely to present a limiting factorgiven the binary classification of this approach and the relative simplicity of the images thatthe networks will be presented.
The following figure (3.6) provides a graphical representation illustrates the basic model:Figure 3.6: A graphical representation of the CNN used to predict edges within a BN. Two layersof two filters/convolutions exist, combined with a (max) pooling layer, and then aggregated witha dense, fully connected layer, culminating in a final softmax predictive layer3.6.
2 HyperparametersEven within architectures as simple as Brownlee (Brownlee 2017) and keras’ samples (Cholletet al. 2015) a significant tuning challenge exists. In the interests of constraining this aspectof the research further exploration was performed on the available optimisations and theirpotential interactions.Dense LayersDense layer(s) perform classification on the features derived and down sampled by the convo-lutional layers and pooling layers of the neural network.
As in a multi-layer perceptron, whereevery node in a given layer is connected to every node in the preceding layer, creating a (seriesof) bipartite graph(s). Accordingly, the width and depth of the dense layer directly effects thecomplexity of the relationship that the neural network can model. Given the binary learningtask a single dense layer, of width uniformly distributed around the Brownlee implementation(50, 100, 150, 200, 250), was used.Filter shapeFilter size must be calibrated to find the right level of granularity so as to create abstractionsat the proper scale, given a particular dataset.
Common filter (or kernel in keras nomenclature)shapes found in the literature vary greatly and are usually chosen based on the dataset. A briefexploration of image recognition tutorials online suggested a 3 by 3 grid was an appropriateshape. Initial experimentation with larger sizes (including a filter that spanned all dimensions),suggested no benefit was to be gained from larger sizes.