Paper ID: 2205.15304
Improvements to Supervised EM Learning of Shared Kernel Models by Feature Space Partitioning
Graham W. Pulford
Expectation maximisation (EM) is usually thought of as an unsupervised learning method for estimating the parameters of a mixture distribution, however it can also be used for supervised learning when class labels are available. As such, EM has been applied to train neural nets including the probabilistic radial basis function (PRBF) network or shared kernel (SK) model. This paper addresses two major shortcomings of previous work in this area: the lack of rigour in the derivation of the EM training algorithm; and the computational complexity of the technique, which has limited it to low dimensional data sets. We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier, making use of data association theory to obtain the complete data likelihood, Baum's auxiliary function (the E-step) and its subsequent maximisation (M-step). To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables. The resulting product decomposition of the joint data likelihood, which is exact when the feature partitions are independent, allows the SKEM to be implemented in parallel and at $R^2$ times lower complexity. The operation of the partitioned SKEM algorithm is demonstrated on the MNIST data set and compared with its non-partitioned counterpart. It eventuates that improved performance at reduced complexity is achievable. Comparisons with standard classification algorithms are provided on a number of other benchmark data sets.
Submitted: May 31, 2022