Synthetic Data Generation for Fraud Detection Using Diffusion Models
Source:
Information & Security: An International Journal,Abstract:
Detection of fraudulent transactions in payment and banking systems using credit cards is a significant challenge, primarily due to the limitations in accessing real-world data necessary for training models and developing algorithms to analyze transaction streams for accuracy. Real data related to contractual relationships between financial systems and their clients is confidential, which influences both the formation of the data recorded in transactions and the analysis of transaction flows to identify fraudulent activities.
This paper explores the potential of using diffusion models to generate realistic synthetic transaction data aimed at improving the performance of fraud detection algorithms. Particular emphasis is placed on processing datasets that contain a mix of categorical (textual) and numerical attributes and exhibit a pronounced class imbalance between legitimate and fraudulent transactions.
A comparison is presented between the effectiveness of traditional fraud detection methods on real transaction data and the proposed approach, which actively employs synthetic data generated using diffusion models. The results demonstrate significant improvements in the reliability of models in accurately detecting fraud, highlighting the potential of diffusion models as a powerful tool in the development of more effective fraud detection systems.