با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
دسته بندی:
داده های بزرگ - big data
سال انتشار:
2018
عنوان انگلیسی مقاله:
Principal Components Analysis Random Discretization Ensemble for Big Data
ترجمه فارسی عنوان مقاله:
تحلیل مولفه های اصلی تصادفی گروه گسسته برای داده های بزرگ
منبع:
Sciencedirect - Elsevier - Knowledge-Based Systems, Corrected proof: doi:10:1016/j:knosys:2018:03:012
نویسنده:
Diego García-Gil a,∗, Sergio Ramírez-Gallego a, Salvador García a, Francisco Herrera a,b
چکیده انگلیسی:
Humongous amounts of data have created a lot of challenges in terms of data computation and analysis.
Classic data mining techniques are not prepared for the new space and time requirements. Discretization
and dimensionality reduction are two of the data reduction tasks in knowledge discovery. Random Pro
jection Random Discretization is a novel and recently proposed ensemble method by Ahmad and Brown
in 2014 that performs discretization and dimensionality reduction to create more informative data. De
spite the good efficiency of random projections in dimensionality reduction, more robust methods like
Principal Components Analysis (PCA) can improve the performance.
We propose a new ensemble method to overcome this drawback using the Apache Spark platform and
PCA for dimension reduction, named Principal Components Analysis Random Discretization Ensemble.
Experimental results on five large-scale datasets show that our solution outperforms both the original
algorithm and Random Forest in terms of prediction performance. Results also show that high dimen
sionality data can affect the runtime of the algorithm.
Keywords: Big Data ، Discretization ، Spark ، Decision tree ، PCA ، Data reduction
قیمت: رایگان
توضیحات اضافی:
تعداد نظرات : 0