Authors: Vijayalakshmi. G, Ms. P. Kalaiselvi, B. Lalitha
Abstract: Crack detection in building structures is critical for ensuring safety and preventing costly repairs. Traditional crack detection methods often face challenges in accurately identifying cracks due to the complexity of the structure and the subtlety of the damage. This work proposes a hybrid deep learning framework that integrates CaTNet (ConvNeXt + Transformer Block) and Vision Transformer (ViT) for effective feature extraction, followed by XGBoost for classification. CaTNet combines ConvNeXt-style convolutional blocks and Transformer encoders to capture both fine-grained spatial details and global contextual relationships within the building images, while ViT processes the images as patch sequences to further enhance the capture of global structural patterns. The extracted features from both models are fused using dense layers with dropout for refinement. XGBoost is then employed for classification, optimized using multi-log loss (mlogloss) and evaluated with classification reports, confusion matrices, and training loss curves. Experimental results show that the proposed model significantly outperforms conventional crack detection methods in terms of accuracy, robustness, and real-time applicability, positioning it as a promising approach for crack detection in building infrastructure