Authors: Aarush Kukade, Advait Deogade, Atharva Mane, Dr. Saurabh Saoji
Abstract: In the modern digital banking era, effective marketing lead generation depends on leveraging heterogeneous, multimodal customer data. Traditional predictive models primarily rely on static, flat tabular attributes, overlooking the relational and temporal information inherent in customer transaction histories and support networks. This paper proposes MTHGT, a Multimodal Temporal Heterogeneous Graph Transformer framework that integrates multimodal data—including structured CRM attributes, sequential transactional records, and unstructured call transcripts—to predict customer lead conversion in banking. The proposed system models customers, transactions, locations, and events as nodes in a heterogeneous graph with relationships based on transactional similarity, campaign logs, and temporal history. Using a multimodal embedding strategy, the model learns customer representations via Graph Transformer layers with type-aware, distance, and temporal bias encodings. Empirical results on the Multimodal Banking Dataset (MBD; 85,620 client-month nodes, 2.26% positive rate) demonstrate that graph-based models outperform tabular baselines on ranking (HGT ROC-AUC of 0.7809 ± 0.0092 and MTHGT ROC-AUC of 0.7763 ± 0.0160 vs. Logistic Regression ROC-AUC of 0.7397 ± 0.0002). Furthermore, MTHGT improves F1-score over HGT (0.0778 ± 0.0225 vs. 0.0651 ± 0.0050) and exposes dynamic modality attributions (CRM features: 25%, dialogue text: 36%, temporal transactions: 39%), enabling explainable CRM lead scoring. The paper details system design, dataset structure, implementation, graph construction methodology, performance evaluation, and outlines a roadmap to bridge the tabular baseline gap using Focal Loss and behavioral k-NN edges.