Authors: Ansh Jena, Sujit Kakade, Arya Kedar
Abstract: Sentiment analysis can help track passengers’ per- ceptions and improve the service offered by an airline due to the increasing importance of social media, such as Twitter. It is about conducting a comparative analysis of three models of natural language processing, namely lexicon-based, machine learning, and transformer-based classification techniques for determining sentiments of airline tweets. Twitter US Airline Sentiment was chosen to be analyzed as it comprised labeled tweets from the major U.S. airlines. Data quality was improved by applying methods of text preprocessing, such as removing noise, tokeniz- ing, and eliminating stopwords. Lexicon-based sentiment analysis relied on VADER polarity baselines, machine-learning approach entailed extraction of TF-IDF features and further application of Random Forest classification technique while transformer model applied RoBERTa to identify the context of sentiment. As a result of the analysis, it was found out that while the lexicon model was faster and provided more easily understandable results, machine- learning model allowed identifying sentiments more accurately. Transformer-based RoBERTa performed the best in terms of handling more complex linguistic structures, such as negations and sarcasm.