Authors: Poroni Koiknzi Fousseni, Elvis Thierry Sounna Vofo
Abstract: Long-term dependency modeling remains one of the fundamental challenges in sequence processing tasks across natural language processing, time series analysis, and sequential decision-making. This paper presents a comprehensive analysis of methods for handling long-term dependencies, examining the evolution from traditional recurrent neural networks (RNNs) to modern attention-based architectures. We provide theoretical foundations for the vanishing gradient problem, analyze key architectural innovations including Long Short- Term Memory (LSTM), Gated Recurrent Units (GRU), and Transformer models, and dis- cuss emerging approaches such as State Space Models and Linear Attention mechanisms. Our analysis includes mathematical formulations, computational complexity considerations, and empirical performance comparisons across various sequence modeling tasks. We identify current limitations and propose future research directions for improving long-range sequence modeling capabilities in deep learning systems.
DOI: https://doi.org/10.5281/zenodo.16908505