Paper ID: 2407.18929
THEA-Code: an Autoencoder-Based IDS-correcting Code for DNA Storage
Alan J. X. Guo, Mengyi Wei, Yufan Dai, Yali Wei, Pengchen Zhang
The insertion, deletion, substitution (IDS) correcting code has garnered increased attention due to significant advancements in DNA storage that emerged recently. Despite this, the pursuit of optimal solutions in IDS-correcting codes remains an open challenge, drawing interest from both theoretical and engineering perspectives. This work introduces a pioneering approach named THEA-code. The proposed method follows a heuristic idea of employing an end-to-end autoencoder for the integrated encoding and decoding processes. To address the challenges associated with deploying an autoencoder as an IDS-correcting code, we propose innovative techniques, including the differentiable IDS channel, the entropy constraint on the codeword, and the auxiliary reconstruction of the source sequence. These strategies contribute to the successful convergence of the autoencoder, resulting in a deep learning-based IDS-correcting code with commendable performance. Notably, THEA-Code represents the first instance of a deep learning-based code that is independent of conventional coding frameworks in the IDS-correcting domain. Comprehensive experiments, including an ablation study, provide a detailed analysis and affirm the effectiveness of THEA-Code.
Submitted: Jul 10, 2024