Advances In Dealing With Long-Term Dependencies: From Vanishing Gradients To Transformer Architectures And Beyond

Authors: Poroni Koiknzi Fousseni, Elvis Thierry Sounna Vofo

Abstract: Long-term dependency modeling remains one of the fundamental challenges in sequence processing tasks across natural language processing, time series analysis, and sequential decision-making. This paper presents a comprehensive analysis of methods for handling long-term dependencies, examining the evolution from traditional recurrent neural networks (RNNs) to modern attention-based architectures. We provide theoretical foundations for the vanishing gradient problem, analyze key architectural innovations including Long Short- Term Memory (LSTM), Gated Recurrent Units (GRU), and Transformer models, and dis- cuss emerging approaches such as State Space Models and Linear Attention mechanisms. Our analysis includes mathematical formulations, computational complexity considerations, and empirical performance comparisons across various sequence modeling tasks. We identify current limitations and propose future research directions for improving long-range sequence modeling capabilities in deep learning systems.

DOI: https://doi.org/10.5281/zenodo.16908505

Related posts

Follow Us on