عنوان انگلیسی مقاله:
A solution to reconstruct cross-cut shredded text documents based on constrained seed K-means algorithm and ant colony algorithm
ترجمه فارسی عنوان مقاله:
یک راه حل برای بازسازی اسناد متنی خرد شده برش خورده بر اساس الگوریتم بذر محدود K-means و الگوریتم کلونی مورچه ها
Sciencedirect - Elsevier - Expert Systems With Applications, 127 (2019) 35-46: doi:10:1016/j:eswa:2019:02:039
Junhua Chen, Miao Tian, Xingming Qi, Wenxing Wang, Youjun Liu
The reconstruction of cross-cut shredded text documents (RCCSTD) is an important problem in forensics and is a real, complex and notable issue for information security and judicial investigations. It can be considered a special kind of greedy square jigsaw puzzle and has attracted the attention of many re- searchers. Clustering fragments into several rows is a crucial and difficult step in RCCSTD. However, exist- ing approaches achieve low clustering accuracy. This paper therefore proposes a new clustering algorithm based on horizontal projection and a constrained seed K-means algorithm to improve the clustering ac- curacy. The constrained seed K-means algorithm draws upon expert knowledge and has the following characteristics: 1) the first fragment in each row is easy to distinguish and the unidimensional signals that are extracted from the first fragment can be used as the initial clustering center; 2) two or more prior fragments cannot be clustered together. To improve the splicing accuracy in the rows, a penalty coefficient is added to a traditional cost function. Experiments were carried out on 10 text documents. The accuracy of the clustering algorithm was 99.1% and the overall splicing accuracy was 91.0%, according to our measurements. The algorithm was compared with two other approaches and was found to offer significantly improved performance in terms of clustering accuracy. Our approach obtained the best re- sults of RCCSTD problem based on our experiment results. Moreover, a more complex and real problem –reconstruction of cross-cut shredded dual text documents (RCCSDTD) problem –was tried to solve. The satisfactory results for RCCSDTD problems in some cases were obtained, to authors’ best knowledge, our method is the first feasible approach for RCCSDTD problem. On the other hand, the developed system is fundamentally an expert system that is being specifically applied to solve RCCSTD problems.
Keywords: Reconstruction of cross-cut shredded | documents (RCCSTD) | Constrained seed K-means algorithm | Horizontal projection | Penalty coefficient | Ant colony algorithm