Fast Feedforward Network

Fast feedforward networks (FFNs) are a novel neural network architecture designed to drastically reduce computational cost during inference by selectively activating only a small subset of neurons based on the input. Current research focuses on improving FFN efficiency and accuracy through techniques like load balancing and incorporating "master leaf" nodes, inspired by Mixture of Experts models. This approach offers significant speedups (up to several orders of magnitude) compared to traditional feedforward networks, impacting applications ranging from language modeling to image processing by enabling the use of larger models and faster processing on resource-constrained devices.

Papers