Paper ID: 2402.15883

Extraction Propagation

Stephen Pasteris, Chris Hicks, Vasilios Mavroudis

We consider the problem of learning to map large instances, such as sequences and images, to outputs. Since training one large neural network end to end with backpropagation is plagued by vanishing gradients and degradation, we develop a novel neural network architecture called Extraction propagation, which works by training, in parallel, many small neural networks which interact with one another. We note that the performance of Extraction propagation is only conjectured as we have yet to implement it. We do, however, back the algorithm with some theory. A previous version of this paper was entitled "Fusion encoder networks" and detailed a slightly different architecture.

Submitted: Feb 24, 2024