Authors: Mrs. P.V. Javkar, Mr. Damodhar N Bulbule, Mr. Arya D Tapkir, Mr. Kaivalya R Bhadange, Mr. Devraj A Yadav
Abstract: Social media platforms have become a major part of daily communication, marketing, entertainment, and information sharing. Among them, Instagram is one of the most widely used platforms across the world. However, the rapid growth of Instagram has also led to the creation of a large number of fake accounts. These fake accounts are often used for scams, impersonation, phishing, spam promotion, fake giveaways, misinformation, and fraudulent advertisements. Detecting such accounts has become an important research problem in the field of cybersecurity and social media analysis. Traditional fake account detection systems mainly focus on profile-related information such as follower count, following count, number of posts, account age, and user activity. Although these features are useful, they may fail to detect accounts that hide suspicious content inside images. Many fake Instagram accounts include scam messages, promotional offers, fake links, or misleading text inside profile images, stories, and post images. Such hidden text cannot be effectively analyzed using normal text-based techniques alone. This paper proposes a method for detecting fake Instagram accounts using Optical Character Recognition (OCR). OCR is used to extract text from profile pictures, post images, and other visual content associated with an Instagram account. After text extraction, suspicious keywords, spam patterns, links, and unusual promotional phrases are analyzed. These OCR-based features are combined with profile-level features such as follower-following ratio, posting behavior, account age, username structure, and bio information. Based on these features, the account is classified as genuine or fake. The proposed approach improves the detection of fake accounts by analyzing both textual and visual content. This makes the system more effective in identifying hidden spam techniques used by fake profiles. The paper also discusses methodology, algorithm steps, feature extraction, preprocessing, system architecture, results, limitations, and future scope.