Density-based Outlier Detection in Multi-dimensional Datasets

Vol. 16, No. 12, December 31, 2022
10.3837/tiis.2022.12.002


Density-based outlier detection is one of the hot issues in data mining. A point is determined as outlier on basis of the density of points near them. The existing density-based detection algorithms have high time complexity, in order to reduce the time complexity, a new outlier detection algorithm DODMD (Density-based Outlier Detection in Multidimensional Datasets) is proposed. Firstly, on the basis of ZH-tree, the concept of micro-cluster is introduced. Each leaf node is regarded as a micro-cluster, and the micro-cluster is calculated to achieve the purpose of batch filtering. In order to obtain n sets of approximate outliers quickly, a greedy method is used to calculate the boundary of LOF and mark the minimum value as πΏπΏπΏπΏπΏπΏπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š. Secondly, the outliers can filtered out by πΏπΏπΏπΏπΏπΏπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š, the real outliers are calculated, and then the result set is updated to make the boundary closer. Finally, the accuracy and efficiency of DODMD algorithm are verified on real dataset and synthetic dataset respectively.


