he classical L1 and L2 regularisation7still readily employed within CNNs.
A more recent innovation to the neural network regularisation arsenal has been the Dropout(Srivastava et al. 2014) and DropConnect (Wan et al. 2013) on models. In these models a morerobust model is encouraged via the temporary removal or freezing of subsections of the neuralnetwork architecture (nodes in the case of Dropout, edges in the case of DropConnect). Thisencourages the model to rely upon more nodes, and hence pushes the model to construct amore diffuse weight distribution. Just as in the aforementioned L1 and L2 regularisation thisthen reduces the likelihood of overfitting.2.5.
6 Kernel InitialisationWhile not an insurmountable barrier to learning, poorly initialised weights remain a challengewithin neural network training, a symptom of the stochastic, non-deterministic nature of thetraining. This has been explored thoroughly in literature (Yam & Chow 2000, Hinton et al. 2012,Krizhevsky et al. 2012), as well as written extensively about online (Stanford 2017, Perunicic2017).2.
5.7 Hyperparameter OptimisationOwing to their more nuanced architecture, CNNs employ significantly more hyperparametersthan their standard Multi-Layer Perceptron relatives. In order to optimise the CNN’s hyperpa-rameter set a variety of options was initially considered, including a custom built Markov Chaintype optimisation (iteratively cycling through, and optimising each variable, to achieve an ap-proximately optimal solution) and sci-kit learn’s Pedregosa et al. (2011) random and grid searchparameter tuning.
However research into the problem yielded a promising package, hyperopt(Bergstra et al. 2013), which had demonstrated impressive performance in optimisation of highdimensional search space problems via its sequential model based optimisation functionality.2.
5.8 Pre-Training, Transfer Learning and Data AugmentationRobust, ‘good’, machine learning is asymptotically unbiased; that is its ability to model andgeneralise being intrinsically linked to the volume of data that an algorithm is presented. Evenin circumstances where the data is of suspect quality algorithms often perform better with moredata assuming that a useful signal is present within the additional data (i.e. it is not just noise).Harnessing this premise, a common technique in machine vision problems is to augment theexisting training data, through a variety of manipulations such as horizontal or vertical trans-lations, distortions and mirroring.
This create an expanded (albeit partially artificial) data set,