با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
دسته بندی:
داده های بزرگ - big data
سال انتشار:
2020
عنوان انگلیسی مقاله:
Dual incremental fuzzy schemes for frequent itemsets discovery in streaming numeric data
ترجمه فارسی عنوان مقاله:
طرح های فازی افزایشی دوگانه برای کشف مکرر آیتم ها در جریان داده های عددی
منبع:
Sciencedirect - Elsevier - Information Sciences, 514 (2020) 15-43: doi:10:1016/j:ins:2019:11:023
نویسنده:
Hui Zheng a , b , c , d , Peng Li a , c , ∗, Qing Liu d , Jinjun Chen b , Guangli Huang e , Junfeng Wu f , Yun Xue g , Jing He
چکیده انگلیسی:
Discovering frequent itemsets is essential for finding association rules, yet too computa- tional expensive using existing algorithms. It is even more challenging to find frequent itemsets upon streaming numeric data. The streaming characteristic leads to a challenge that streaming numeric data cannot be scanned repetitively. The numeric characteristic requires that streaming numeric data should be pre-processed into itemsets, e.g., fuzzy- set methods can transform numeric data into itemsets with non-integer membership val- ues. This leads to a challenge that the frequency of itemsets are usually not integer. To overcome such challenges, fast methods and stream processing methods have been ap- plied. However, the existing algorithms usually either still need to re-visit some previous data multiple times, or cannot count non-integer frequencies. Those existing algorithms re-visiting some previous data have to sacrifice large memory spaces to cache those pre- vious data to avoid repetitive scanning. When dealing with big streaming data nowadays, such large-memory requirement often goes beyond the capacity of many computers. Those existing algorithms unable to count non-integer frequencies would be very inaccurate in estimating the non-integer frequencies of frequent itemsets if used with integer approxi- mation of frequency-counting. To solve the aforementioned issues, in this paper we propose two incremental schemes for frequent itemsets discovery that are capable to work efficiently with streaming nu- meric data. In particular, they are able to count non-integer frequency without re-visiting any previous data. The key of our schemes to the benefits in efficiency is to extract essen- tial statistics that would occupy much less memory than the raw data do for the ongoing streaming data. This grants the advantages of our schemes 1) allowing non-integer count- ing and thus natural integration with a fuzzy-set discretization method to boost robustness and anti-noise capability for numeric data, 2) enabling the design of a decay ratio for dif- ferent data distributions, which can be adapted for three general stream models: landmark, damped and sliding windows, and 3) achieving highly-accurate fuzzy-item-sets discovery with efficient stream-processing. Experimental studies demonstrate the efficiency and effectiveness of our dual schemes with both synthetic and real-world datasets.
Keywords: Incremental algorithm | Data stream mining | Frequent itemsets | Without re-visiting
قیمت: رایگان
توضیحات اضافی:
تعداد نظرات : 0