Paper ID: 2208.08933

Sequence Prediction Under Missing Data : An RNN Approach Without Imputation

Soumen Pachal, Avinash Achar

Missing data scenarios are very common in ML applications in general and time-series/sequence applications are no exceptions. This paper pertains to a novel Recurrent Neural Network (RNN) based solution for sequence prediction under missing data. Our method is distinct from all existing approaches. It tries to encode the missingness patterns in the data directly without trying to impute data either before or during model building. Our encoding is lossless and achieves compression. It can be employed for both sequence classification and forecasting. We focus on forecasting here in a general context of multi-step prediction in presence of possible exogenous inputs. In particular, we propose novel variants of Encoder-Decoder (Seq2Seq) RNNs for this. The encoder here adopts the above mentioned pattern encoding, while at the decoder which has a different structure, multiple variants are feasible. We demonstrate the utility of our proposed architecture via multiple experiments on both single and multiple sequence (real) data-sets. We consider both scenarios where (i)data is naturally missing and (ii)data is synthetically masked.

Submitted: Aug 18, 2022