Authors: Anuradha Muttamwar, Esha Dorkhande, Vaibhavi Meshram
Abstract: The exponential proliferation of digital media in the modern era has created an environment where mis- and disinformation as well as "fake news" can spread uncontrollably, leading to challenges to public discourse, political trust and integrity. In this paper we present a detailed research approach toward fake news detection through efficient feature engineering and the use of supervised machine learning. We use a dataset composed of 5,000 current news articles (2,537 real, 2,463 fake news) and conduct an in-depth research regarding the performance of TF-IDF with n-grams. We build and train a Multinomial Naive Bayes model and attain excellent classification accuracy. Furthermore, we investigate the importance of text preprocessing such as stop word removal, stemming and lemmatization. Our model achieves a final accuracy of 93.6%, while also achieving scores for precision, recall and F1 greater than 0.92. When comparing with baseline models, the presented method with enhanced feature engineering shows excellent results. We then developed a web based system with the help of Flask that allows real time fake news detection and confidence. It will establish a reusable, light and scalable pipeline to automate fake news detection in real world applications.