01850nas a2200169 4500000000100000008004100001653002500042653002000067653002100087653003300108100002200141700002500163245007300188300001300261490000700274520139900281 2024 d10abanking transactions10aFraud Detection10adiffusion models10asynthetic dataset generation1 aYurii Pushkarenko1 aVolodymyr Zaslavskyi00aSynthetic Data Generation for Fraud Detection Using Diffusion Models a185-198 0 v553 a

Detection of fraudulent transactions in payment and banking systems using credit cards is a significant challenge, primarily due to the limitations in accessing real-world data necessary for training models and developing algorithms to analyze transaction streams for accuracy. Real data related to contractual relationships between financial systems and their clients is confidential, which influences both the formation of the data recorded in transactions and the analysis of transaction flows to identify fraudulent activities.

This paper explores the potential of using diffusion models to generate realistic synthetic transaction data aimed at improving the performance of fraud detection algorithms. Particular emphasis is placed on processing datasets that contain a mix of categorical (textual) and numerical attributes and exhibit a pronounced class imbalance between legitimate and fraudulent transactions. 

A comparison is presented between the effectiveness of traditional fraud detection methods on real transaction data and the proposed approach, which actively employs synthetic data generated using diffusion models. The results demonstrate significant improvements in the reliability of models in accurately detecting fraud, highlighting the potential of diffusion models as a powerful tool in the development of more effective fraud detection systems.