Money Laundering Detection Bad on Improved Minimum Spanning Tree Clustering and Its Application

更新时间:2023-05-22 08:17:29 阅读: 评论:0

Rearch on Money Laundering Detection bad on Improved Minimum Spanning Tree Clustering and Its Application
Xingqi Wang, Guang Dong
School of Computer Science
Hangzhou Dianzi University (HDU)
Hangzhou, P. R. China
e-mail:  xqwang@
Abstract—To detect suspicious money laundering transaction in the real world financial applications, a new dissimilarity metric was propod and a novel Money Laundering Detection Algorithm bad on Improved Minimum Spanning Tree clustering was put forward in this paper. Suspicious money laundering transaction detection experiment on financial data t from the real world indicates that our algorithm is effective and succinct.
Keywords-money laundering; minimum spanning tree; outliers; clustering analysis
1.I NTRODUCTION
Detection of money laundering transactions is an important branch of financial data mining, which belongs to application of anomaly data mining in financial field. It is the same with other anomaly data mining algorithms [1, 2] that the effectiveness of money laundering transactions detection algorithm can not be parated from the complete and valid analysis of data ts and lection of similarity metric.
There have been extensive rearch efforts in the detection of money laundering transactions. Existing methods mostly pay attention to the introduction and application of traditional data mining technology into financial data field. Liu and Zhang [3] established a system architecture bad on agent technology for financial supervision. Wang and Yang [4] introduced decision tree into financial transaction detection and evaluation. Lv et al. [5] utilized neural network technology to fulfill anti-money laundering detection task. Tang and Yin [6] prented an intelligent data discriminating system for anti-money laundering on ba of SVM (Support Vector Machine). Liu et al. [7] adopted quence matching to detect suspicious money laundering transactions. Gao et al. [8] propod a novel multi-agent architecture for anti-money laundering, which can integrate existing classical data mining technologies sufficiently.
The existing methods have some drawbacks: (1). The existing approaches mostly have high complexity and their efficiency needs to be improved gradually. (2). Most of existing methods will fail to run without parameters tuned suitably, just becau of this situation, clustering and outlier clustering analysis is becoming the new hot rearch topic in various data mining field. Zhang and Zhao [9] began to make u of CURE clustering to realize effective detection of suspicious financial transactions.
Through analyzing existing money laundering transactions detection methods [3-9], in view of existing high dimensional data mining theory and techniques [10], a new dissimilarity metric was propod and then a novel money laundering transactions detection algorithm was put forward bad on improved minimum spanning tree clustering algorithm. Experimental result on financial data t shows that, due to taking account of characteristics of financial data ts and that of money-laundering transactions detection, the new algorithm is simple and effective and can detect suspicious money laundering transactions effectively and efficiently.
The remainder of the paper is organized as follows. Section 2 details our new algorithm, a new dissimilarity metric bad on high dimensional data clustering analysis is described. Section 3 discuss and evaluates the experimental result of our method on data ts from the real world finan
cial field. We conclude and summarize the findings in Section 4.
2.T HE PROPOSED ALGORITHM
In this part, we prent our algorithm to detect suspicious financial money laundering transactions. Firstly, new dissimilarity metric is provided. Secondly, the main ideas and basic process of new algorithm are detailed.
2.1Dissimilarity Metric
Considered the data ts obtained from real world applications, bad on analysis of similarity measure and distance metric between points in data space [1-2, 10], and combined with characteristics of money laundering detection, we design the following similarity metric function.
Set X = (x1, x2, ..., x d) and Y = (y1, y2, ..., y d) for two points of the d-dimensional space. Following similarity metric function is defined.
d
y
x
Y
X
d
i i
i
¦
=
=1
|
|
1
1
)
,
(
徐灿林
sim                (1) Define dissimilarity metric function as
)
,
(
1
)
,
(Y
X
役前训练多少天
sim
Y
X
dsim−
=                (2) Dissimilarity metric function defined above indicates:
(1) The greater the value of the function, the greater difference between the two data points.
(2) The function has the minimum value 0 when the similarity metric equals 1. Dissimilarity metric having value 0 means that the value of X and Y in each dimension equals to each other. The difference between two data points is also smallest.
(3) The function has the maximum value 1 when the value of X and Y in each dimension tends to infinity. While the similarity metric of the two data points equals 0, dissimilarity metric equals 1.
2009 Second International Symposium on Knowledge Acquisition and Modeling
The dissimilarity mentioned above with following advantages. First of all, this dissimilarity metric function can effectively reflect the degree of difference between two points in data space. Secondly, compared with the traditional distance metrics, the new function is dominated by the values of X and Y's dimensions with smaller values, thus the new function has good robustness for noi data (The value on each dimension of noi data tends to significantly deviate from the majority of the normal data values). In addition, the clor the value on each dimension of two points, the more dimensions which value is clor between them and the greater the similarity metric, that is, the smaller the dissimilarity metric, and vice versa, which is also consistent with human being's view.
2.2Improved Minimum Spanning Tree Clustering
Algorithm (iMST)
Here we have improved the traditional MST algorithm. Improved MST algorithm (iMST) is described
as follows. Firstly, randomly lected an element a from the data t (usually the first), and then let a as root, all elements will be partitioned into three ts: namely, upper data t, lower data t and middle data t. In upper data t included all elements with the value is less than or equal to that of the root element. Lower data t includes all elements with the value for their each dimension greater than or equal to that of the root. And middle data t includes all elements which don't belong to any one of the above two ts. Secondly, t upper data t, lower data t and middle data t a’s left sub-tree, right sub-tree and middle  sub-tree respectively and then for right sub-tree and left sub-tree an element which has the shortest distance to a is lected from the corresponding sub-tree. The element is therefore made the root of a’s corresponding sub-tree. The other data ts and corresponding sub-trees do the same thing recursively as above process until only the middle data t and corresponding sub-tree node left. Finally, the nodes’ corresponding middle ts u the rules described in the algorithm [11] to connect and adjust so that they become the nodes of the minimum spanning tree. To meet the practical need, by removing the longest K-1 edges, the MST can be partitioned into K sub-trees in which all elements included form an initial cluster. For all initial clusters we can find a center point. Each point is partitioned into each target cluster according to the distance between all points and the center point of each cluster. In this way the data t is clustered finally.
Since this iMST algorithm reduces comparisons of distance between points as far as possible through partitioning all data into upper data t, lower data t and middle data t and making comparisons bad on the division, so its efficiency is better than traditional algorithm.
For the iMST, time complexity is )
(tn土豆怎么做薯条
O, where n is the number of elements in given data t, t is the number of middle data ts in constructed Minimum Spanning Tree and t << n. From above complexity, we can conclude that iMST is better than traditional MST algorithms. 2.3Money Laundering Transaction Detection Algorithm
bad on iMST Clustering
The basic idea of money laundering transaction detection algorithm bad on iMST is to take dissimilarity
metric introduced in 2.1 ction as distance measurement
to construct minimum spanning tree in the light of iMST algorithm with parameter K t according to f
ield knowledge and priori experience in advance, then find anomaly clusters with the average of all clusters’ dissimilarity metrics as threshold,  all the elements included in anomaly clusters are outliers, i.e. suspicious
money laundering transactions.
The following is Money Laundering transaction Detection Algorithm Bad on improved Minimum Spanning Tree clustering (MLDABiMST):
Input: data t U, clustering parameter K
Output: suspicious money laundering transactions
(1) Data t preprocess: lect correlative fields and standardize the values of all records in given data t according to different application area.
(2) Call improved Minimum Spanning Tree algorithm clustering algorithm (iMST) to generate the corresponding clusters, iMST (U, K).
(3) Calculate difference matrix between clusters, and
then generate anomaly clusters.
(4) Sort dissimilarity values for each clusters in descending order (for all points included in each cluster,
笔记本屏幕花屏output them in descending order by their own dissimilarity metric with other points in the same cluster).
3.E XPERIMENT AND A NALYSIS
The experimental data t comes from the real world financial bank applications, which had 70 fields, 64941 records, after preprocess 29 fields and 65001 records were
left. According to priori knowledge [12], 60 money-laundering records artificially created are added into original data t, which is about one-thousandth of the original data t size. Our program, coded using Microsoft
Visual C++ 6.0 is run on data ts above. Experimental
result are listed in table 1 and more analysis are shown in
Figure 1 and Figure 2, in which Number of covering clusters is the number of required clusters to include all anomaly points and Proportion of anomaly points is average of all anomaly cluster's outliers proportion.
TABLE I. R ESULT OF  ANOMALY CLUSTERING FOR NEW
ALGORITHM
The value of K
Number of
covering clusters
Proportion of
anomaly points
5 2 22%
高级技师
10 4 67%
15 4 72%
20 5 61%
Parameter K
N
u m
b e r  o f
c o v e r i n g  c
l u s t e r s
Figure 1.  Number  of covering clusters
Figure 2.  Proportion of anomaly points
According to Table 1, Figure 1 and Figure 2, it is not difficult to find that the new algorithm can effectively detect suspicious money laundering transactions, and is able to sort outliers by anomaly degree. The dissimilarity metric is effective and consistent with financial priori knowledge and real wo
rld money launderings. As the only input parameter of the new algorithm, K  has a certain impact on the clustering result. It can be en from above two figures that the best clustering result can be obtained in the experiment data with parameter K  having value 15, while number of required clusters to cover all anomaly points tends to be steady and proportion of anomaly points reaches highest value, that is to say under this circumstances target clusters have been found successfully which include anomaly points as many as possible and have better clustering effect. 4. C ONCLUSIONS
In this paper, according to characteristic of financial data ts and money laundering transactions detection, we defined a novel dissimilarity metric to measure difference degree of outliers and then designed a new money laundering detection algorithm bad on improved minimum spanning tree algorithm. Experiment in the real world applications show that the new algorithm is simple and has not been affected by noi data. Detection of suspicious money laundering transactions is complete and effective. Additionally, the dependence on the parameters is decread.
In the future, following three aspects need to be carried out. First, we will do more experiments to test our
algorithm and find some relations between parameter K and feature dimension to make it suitable for more applications. Secondly, we should also focus on how to t parameter K  more effectively, incremental improvement of the algorithm and integration with other classical data mining algorithms such as classification algorithms. Finally, we will integrate our algorithm into real world financial data warehou system to assist financial constitutes to alarm possible money laundering and minimize los for them.
A CKNOWLEDGMENT
We would like to acknowledge the support of Scientific Rearch Fund of Zhejiang Provincial Education Department, China, under Grant No. 20040458. We would also like to thank M. Dong from financial bank, China for her valuable comments and evaluations. R EFERENCES
[1]    D. Hawkins, Identification of outliers. London: Chapman and Hall,
1980. pp. 56-88.
[2] J. Xi, “Outlier Detection Algorithms in Data Mining,” Proc.
Second Intelligent Information Technology Application National Symposium (IITA 2008) Vol.1, 2008, 
pp. 94-97.
[3] X. Liu, P. Zhang, “An Agent bad Anti-money Laundering
System Architecture for Financial Supervision,” Proc. International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2007), 2007, pp. 5467-5470.
[4] S. Wang, J. Yang, “A Monney Laundering Risk Evaluation
Method Bad On Decision Tree,” Proc. Sixth International Conference on Machine Learning and Cybernetics (ICMLC 2007), Aug. 2007, pp. 283-286.
[5]  L. Lv, N. Ji, J. Zhang, “A RBF Neural Network Model for Anti-money Laundering,” Proc. 2008 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR 2008), 2008, pp. 209-215.
[6] J. Tang, J. Yin, “Developing an Intelligent Data Discriminating
人的作文
System of Anti-money Laundering bad on SVM,” Porc. International Conference on Machine Learning and Cybernetics (ICMLC 2005), 2005, pp. 3453-3457.
[7] X. Liu, P. Zhang, D. Zeng, “Sequence Matching for Suspicious
员工考核细则及考核评分表Activity Detection in Anti-money Laundering,” Proc. Intelligence and Security Informatics - IEEE ISI 2008 International Workshops: PAISI, PACCF, and SOCO 2008, 2008, pp. 50-61.
[8] S. Gao, D. Xu, H. Wang, Y. Wang, “Intelligent Anti-money
Laundering System,” Proc. IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI 2006), 2006, pp. 851-856.
[9]    C. Zhang, X. Zhao, “Rearch on Suspicious Financial
Transactions Information Search bad on CURE Algorithm,” Journal of Information, Jun. 2008, pp. 52-54.
[10] F. Yang, Rearch on technologies for high dimensional data
mining, (in Chine). Nanjing: Southeast University Press, 2007, pp. 26-33.
[11] Z. Xie, L. Yu, J. Yang, “A Clustering Algorithm Bad on
Improved Minimum Spanning Tree,” Proc. Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), 2007, pp. 396-400.
[12] The People’s Bank of China, Administrative Rules for the怎样制作ppt详细步骤
Reporting of Large-Value and Suspicious Payment Transactions (in Chine), v/detail.asp?col=1510&ID=75, 2006.

本文发布于:2023-05-22 08:17:29,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/729657.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:考核   薯条   笔记本   步骤
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图