عنوان انگلیسی مقاله:
Distributed mining of high utility time interval sequential patterns using mapreduce approach
ترجمه فارسی عنوان مقاله:
کاوش توزیع شده الگوهای پی در پی فاصله زمانی ابزار مطلوب با استفاده از روش Mapreduce
Sciencedirect - Elsevier - Expert Systems With Applications, 141 (2020) 112967: doi:10:1016/j:eswa:2019:112967
Saleti Sumalatha ∗, R.B.V. Subramanyam
High Utility Sequential Pattern mining (HUSP) algorithms aim to find all the high utility sequences from a sequence database. Due to the large explosion of data, recently few distributed algorithms have been designed for mining HUSPs based on the MapReduce framework. However, the existing HUSP algorithms such as USpan, HUS-Span and BigHUSP are able to predict only the order of items, they do not pre- dict the time between the items, that is, they do not include the time intervals between the successive items. But in a real-world scenario, time interval patterns provide more valuable information than con- ventional high utility sequential patterns. Therefore, we propose a distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using the MapReduce approach that is suitable for big data. DHUTISP creates a novel time interval utility linked list data structure (TIUL) to efficiently calculate the utility of the resulting patterns. Moreover, two utility upper bounds, namely, remaining utility upper bound (RUUB) and co-occurrence utility upper bound (CUUB) are proposed to prune the unpromising candidates. We conducted various experiments to prove the efficiency of the proposed algorithm over both the distributed and non-distributed approaches. The experimental results show the efficiency of DHUTISP over state-of-the-art algorithms, namely, BigHUSP, AHUS-P, PUSOM and UTMining_A.
Keywords: Big data | High utility itemset mining | High utility sequential pattern mining | Time interval sequential pattern mining | Mapreduce framework