Self-Supervised Representation Learning in Sparse Data Regimes

Authors

  • Syed Rashid Anwar Assistant Professor, Department of Computer Science & IT, Arka Jain University, Jamshedpur, Jharkhand, India.
  • Josephine R Assistant Professor, Department of Computer Science and Engineering, Presidency University, Bangalore, Karnataka, India.
  • Santosh Kumar Behera Associate Professor, Centre for Artificial Intelligence and Machine Learning, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.
  • Vinay Kumar Sadolalu Boregowda Assistant Professor-I, Department of Electronics and Communication Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-be University), Bengaluru, Karnataka, India.
  • Pushpa Nagini Sripada Professor, Department of English, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, India.
  • Sivasankari V Assistant Professor, Department of Mathematics, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research, India.

Keywords:

Self-Supervised Learning, Fraud Detection, Imbalanced Data, Anomaly Detection, Representation Learning.

Abstract

Self-supervised representation learning enables scalable modeling by reducing reliance on labeled data. Yet fraud detection in financial systems remains challenged due to sparse, imbalanced observations and evolving transaction patterns, where existing methods fail to generalize and capture minority fraudulent behaviors effectively. This research aims to design a robust fraud detection framework for sparse and highly skewed financial datasets by leveraging self-supervised representations. Transactional data are aggregated from publicly available financial repositories and simulated streams reflecting realistic fraud scenarios, ensuring diversity in temporal and categorical attributes. Preprocessing incorporates normalization and missing value imputation. Feature extraction employs Time2Vec temporal encoding and rolling window statistical descriptors to capture evolving behavioral patterns. The proposed model integrates representation learning with Genetic Algorithm-tuned Dynamic Variational Autoencoders (GA-DVA), where data are first encoded into latent representations and subsequently refined for anomaly-aware discrimination. The Dynamic Variational Autoencoder models evolving transaction distributions, while the Genetic Algorithm optimizes latent space parameters and reconstruction constraints to enhance detection sensitivity under imbalanced datasets. This combination enables adaptive learning of rare fraud signatures. For robust fraud detection in sparse and imbalanced financial datasets, the framework prioritizes minority pattern amplification and distribution-aware learning using Python. Performance evaluation demonstrates improved precision (0.920), recall balance (0.912), F1-Score (0.916), AUC-ROC (0.950), Early Detection Rate (0.890), and reduced false alarms (0.050), and consistent adaptability to shifting data distributions. The approach delivers interpretable, scalable, and resilient fraud detection suitable for real-world financial environments.

Downloads

Published

2026-06-01

How to Cite

Anwar, S. R., R, J., Behera, S. K., Boregowda, V. K. S., Sripada, P. N., & V, S. (2026). Self-Supervised Representation Learning in Sparse Data Regimes. International Journal of Artificial Intelligence and Machine Learning, 6(4s), 650–658. Retrieved from https://www.svedbergopen.com/index.php/ijaiml/article/view/498