دانلود و نمایش مقالات مرتبط با ابعاد بالا::صفحه 1
دانلود بهترین مقالات isi همراه با ترجمه فارسی
نتیجه جستجو - ابعاد بالا

تعداد مقالات یافته شده: 11
ردیف عنوان نوع
1 Default prediction in P2P lending from high-dimensional data based on machine learning
پیش بینی به طور پیش فرض در اعطای وام P2P از داده های با ابعاد بالا بر اساس یادگیری ماشین-2019
In recent years, a new Internet-based unsecured credit model, peer-to-peer (P2P) lending, is flourishing and has become a successful complement to the traditional credit business. However, credit risk remains inevitable. A key challenge is creating a default prediction model that can effectively and accurately predict the default probability of each loan for a P2P lending platform. Due to the characteristics of P2P lending credit data, such as high dimension and class imbalance, conventional statistical models and machine learning algorithms cannot effectively and accurately predict default probability. To address this issue, a decision tree model-based heterogeneous ensemble default prediction model is proposed in this paper for accurate prediction of customer default in P2P lending. Gradient boosting decision trees (GBDT), extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) are employed as individual classifiers to create a heterogeneous ensemble learning-based default prediction model. Learning model-based feature ranking is applied to P2P lending credit data, and individual classifiers undergo hyperparameter optimization. Finally, comparison with benchmark models shows that the prediction model can achieve desirable prediction results and thus effectively solve the challenge of predictions based on high-dimensional and imbalanced data.
Keywords: Default prediction | High-dimensional data | Imbalanced data | Machine learning | P2P lending
مقاله انگلیسی
2 A deep learning solution approach for high-dimensional random differential equations
یک روش راه حل یادگیری عمیق برای معادلات دیفرانسیل تصادفی با ابعاد بالا-2019
Developing efficient numerical algorithms for the solution of high dimensional random Partial Differential Equations (PDEs) has been a challenging task due to the well-known curse of dimensionality. We present a new solution approach for these problems based on deep learning. This approach is intrusive, entirely unsupervised, and mesh-free. Specifically, the random PDE is approximated by a feed-forward fully-connected deep residual network, with either strong or weak enforcement of initial and boundary constraints. Parameters of the approximating deep neural network are determined iteratively using variants of the Stochastic Gradient Descent (SGD) algorithm. The satisfactory accuracy of the proposed approach is numerically demonstrated on diffusion and heat conduction problems, in comparison with the converged Monte Carlo-based finite element results.
Keywords: Deep learning | Deep neural networks | Residual networks | Random differential equations | Curse of dimensionality | Least squares
مقاله انگلیسی
3 Hybrid fast unsupervised feature selection for high-dimensional data
انتخاب ویژگی بدون نظارت هیبریدی سریع برای داده های با ابعاد بالا-2019
The emergence of “curse of dimensionality”issue as a result of high reduces datasets deteriorates the ca- pability of learning algorithms, and also requires high memory and computational costs. Selection of fea- tures by discarding redundant and irrelevant features functions as a crucial machine learning technique aimed at reducing the dimensionality of these datasets, which improves the performance of the learning algorithm. Feature selection has been extensively applied in many application areas relevant to expert and intelligent systems, such as data mining and machine learning. Although many algorithms have been developed so far, they are still unsatisfying confronting high-dimensional data. This paper presented a new hybrid filter-based feature selection algorithm based on acombination of clustering and the modi- fied Binary Ant System (BAS), called FSCBAS, to overcome the search space and high-dimensional data processing challenges efficiently. This model provided both global and local search capabilities between and within clusters. In the proposed method, inspired by genetic algorithm and simulated annealing, a damped mutation strategy was introduced that avoided falling into local optima, and a new redundancy reduction policy adopted to estimate the correlation between the selected features further improved the algorithm. The proposed method can be applied in many expert system applications such as microar- ray data processing, text classification and image processing in high-dimensional data to handle the high dimensionality of the feature space and improve classification performance simultaneously. The perfor- mance of the proposed algorithm was compared to that of state-of-the-art feature selection algorithms using different classifiers on real-world datasets. The experimental results confirmed that the proposed method reduced computational complexity significantly, and achieved better performance than the other feature selection methods.
Keywords: Feature selection | High-dimensional data | Binary ant system | Clustering | Mutation
مقاله انگلیسی
4 PUMA: Parallel subspace clustering of categorical data using multi-attribute weights
PUMA: خوشه بندی موازی زیر فضای داده های دسته ای با استفاده از وزنهای چند صفته-2019
There are two main reasons why traditional clustering schemes are incompetent for high-dimensional categorical data. First, traditional methods usually represent each cluster by all dimensions without dif- ference; and second, traditional clustering methods only rely on an individual dimension of projection as an attribute’s weight ignoring relevance among attributes. We solve these two problems by a MapReduce- based subspace clustering algorithm (called PUMA ) using multi-attribute weights. The attribute subspaces are constructed in our PUMA by calculating an attribute-value weight based on the co-occurrence prob- ability of attribute values among different dimensions. PUMA obtains sub-clusters corresponding to re- spective attribute subspaces from each computing node in parallel. Lastly, PUMA measures various scale clusters by applying the hierarchical clustering method to iteratively merge sub-clusters. We implement PUMA on a 24-node Hadoop cluster. Experimental results reveal that using multi-attribute weights with subspace clustering can achieve better clustering accuracy on both synthetic and real-world high dimen- sional datasets. Experimental results also show that PUMA achieves high performance in terms of exten- sibility, scalability and the nearly linear speedup with respect to number of nodes. Additionally, exper- imental results demonstrate that PUMA is reasonable, effective, and practical to expert systems such as knowledge acquisition, word sense disambiguation, automatic abstracting and recommender systems.
Keywords: Parallel subspace clustering | Multi-attribute weights | High dimension | Categorical data | MapReduce
مقاله انگلیسی
5 QoE-Driven Big Data Management in Pervasive Edge Computing Environment
مدیریت داده های بزرگ بر مبنای QoE در محدوده محاسباتی فراگیر لبه-2018
In the age of big data, services in the pervasive edge environment are expected to offer end-users better Quality-of-Experience (QoE) than that in a normal edge environment. However, the combined impact of the storage, delivery, and sensors used in various types of edge devices in this environment is producing volumes of high-dimensional big data that are increasingly pervasive and redundant. Therefore, enhancing the QoE has become a major challenge in high-dimensional big data in the pervasive edge computing environment. In this paper, to achieve high QoE, we propose a QoE model for evaluating the qualities of services in the pervasive edge computing environment. The QoE is related to the accuracy of high-dimensional big data and the transmission rate of this accurate data. To realize high accuracy of high-dimensional big data and the transmission of accurate data through out the pervasive edge computing environment, in this study we focused on the following two aspects. First, we formulate the issue as a high-dimensional big data management problem and test different transmission rates to acquire the best QoE. Then, with respect to accuracy, we propose a Tensor-Fast Convolutional Neural Network (TF-CNN) algorithm based on deep learning, which is suitable for high-dimensional big data analysis in the pervasive edge computing environment. Our simulation results reveal that our proposed algorithm can achieve high QoE performance.
Key words: Quality-of-Experience (QoE); high-dimensional big data management; deep learning; pervasive edge computing
مقاله انگلیسی
6 Efficient algorithms for mining colossal patterns in high dimensional databases
الگوریتم های کارآمد برای کاوش الگوهای عظیم در پایگاه داده های ابعادی بالا-2017
Mining association rules plays an important role in decision support systems. To mine strong association rules, it is necessary to mine frequent patterns. There are many algorithms that have been developed to efficiently mine frequent patterns, such as Apriori, Eclat, FP-Growth, PrePost, and FIN. However, these are only efficient with a small number of items in the database. When a database has a large number of items (from thousands to hundreds of thousands) but the number of transactions is small, these al gorithms cannot run when the minimum support threshold is also small (because the search space is huge). This thus causes the problem of mining colossal patterns in high dimensional databases. In 2012, Sohrabi and Barforoush proposed the BVBUC algorithm for mining colossal patterns based on a bottom up scheme. However, this needs more time to check subsets and supersets, because it generates a lot of candidates and consumes more memory to store these. In this paper we propose new, efficient algo rithms for mining colossal patterns. Firstly, the CP (Colossal Pattern)-tree is designed. Next, we develop two theorems to rapidly compute patterns of nodes and prune nodes without the loss of information in colossal patterns. Based on the CP-tree and these theorems, an algorithm (named CP-Miner) is pro posed to solve the problem of mining colossal patterns. A sorting strategy for efficiently mining colossal patterns is thus developed. This strategy helps to reduce the number of significant candidates and the time needed to check subsets and supersets. The PCP-Miner algorithm, which uses this strategy, is then proposed, and we also conduct experiments to show the efficiency of these algorithms.
Keywords: Bottom up | Colossal patterns | Data mining | High dimensional databases
مقاله انگلیسی
7 Machine-learned cluster identification in high-dimensional data
شناسایی خوشه ماشین یادگیری شده در داده های با ابعاد بالا-2017
Background: High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We ana lyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Methods: Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the dis tance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Results: Ward clustering imposed cluster structures on cluster-less ‘‘golf ball”, ‘‘cuboid” and ‘‘S-shaped” data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canon ical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typ ically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. Conclusions: The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data.
Keywords: Machine-learning | Clustering
مقاله انگلیسی
8 Energy Efficient Data Mining Scheme for High Dimensional Data
انرژی کارآمد طرح داده کاوی برای داده ها با ابعاد بالا-2015
In this paper, we propose energy efficient big data mining scheme for forest cover type and gas drift classification. Efficient machine learning and data mining techniques provide unprecedented opportunity to monitor and characterize physical environments, such as forest cover type, using low cost wireless sensor networks. The experimental validation on two different sensor network datasets, forest cover type and gas sensor array drift dataset from publicly available UCI machine learning repository. Coupled with an appropriate feature selection, the complete scheme leads towards an energy efficient protocol for intelligent monitoring of large physical environments instrumented with wireless sensor networks.© 2014 The Authors. Published by Elsevier B.V.Peer-review under responsibility of organizing committee of the International Conference on Information and Communication Technologies (ICICT 2014).© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Peer-review under responsibility of organizing committee of the International Conference on Information and Communication Technologies (ICICT 2014)
Wireless sensor networks | physical environment monitoring | machine learning | data mining | feature selection
مقاله انگلیسی
9 کلاس‌بندی معنایی صحنه های شهری ناهمگون با استفاده از شباهت ویژگی میان صحنه‌ای و وابستگی معنایی میان صحنه ای
سال انتشار: 2015 - تعداد صفحات فایل pdf انگلیسی: 10 - تعداد صفحات فایل doc فارسی: 32
هدف کلاس بندی معنایی صحنه‌های شهری، طبقه بندی کردن صحنه‌هایی است که از انواع مختلفی از اشیاء که کلاس آن ها قبلا مشخص شده‌ است، تشکیل شده‌اند. برای یادگیری رابطه بین صحنه‌های شهری و کلاس‌های معنایی، پنج وظیفه مورد نیاز است: 1) بخش‌بندی تصاویر به صحنه‌ها؛ 2) ایجاد کلاس‌های معنایی صحنه‌ها؛ 3) استخراج و تبدیل صحنه‌ها؛ 4) اندازه‌گیری شباهت ویژگی میان صحنه‌‌ها؛ و 5) برچسب‌زنی هر صحنه با یک روش کلای‌بندی معنایی. علیرغم تلاش‌های زیادی که بر روی این وظایف صورت گرفته است، اکثر کارهای موجود تنها ویژگی‌های بصری با معیار شباهت متناقض را در نظر می‌گیرند، در حالیکه چشم‌پوشی از ویژگی معنایی دروم صحنه‌ها و تعاملات بین صحنه‌ها، منجر به نتایج کلاس بندی ضعیفی برای صحنه‌هایی با ناهمگونی بالا می شود. برای حل این مسائل، این تحقیق شباهت ویژگی میان صحنه‌ای را با وابستگی معنایی میان‌ صحنه‌ای ترکیب می کند تا یک رویکرد کلاس‌بندی دو مرحله‌ای ایجاد کند. برای مرحله اول، ابتدا ویژگی‌های بصری و معنایی به منظور تبدیل به مقدار ثابت بهینه شده و سپس به منظور کلاس‌بندی اولیه‌ی صحنه‌ها در k-نزدیکترین همسایه، بکار گرفته می شوند. برای مرحله دوم، توزیع چندجمله‌ای برای مدلسازی هر دو مورد وابستگی فضایی و معنایی بین صحنه‌ها ارائه شده، و سپس به منظور بهبود نتایج کلاس‌بندی اولیه مورد استفاده قرار می‌گیرند. پیاده‌سازی‌های انجام شده در دو حوزه تحقیقاتی حاکی از آن هستند که رویکرد پیشنهاد شده نسبت به تفسیر بصری، نتایج بهتری را برای صحنه‌های ناهمگون تولید می‌کند، چراکه آن می‌تواند اطلاعات مخفی بین صحنه‌ها را که معمولا روش‌های موجود از آن چشم‌پوشی می‌کنند، کشف کرده و مدل می‌کند. بعلاوه، در مقایسه با کلاس‌بندی اولیه، مرحله بهینه شده دقت را دو دو حوزه تحقیقاتی، بترتیب به اندازه 3.6% و 5% بهبود می‌بخشد.
کلمات کلیدی: ترکیب ویژگی‌هایی با ابعاد بالا | توزیع چند‌جمله‌ای | کلاس‌بندی صحنه.
مقاله ترجمه شده
10 شبکه‌های عصبی رو‌ به جلو تعمیم‌یافته با وزن‌های تصادفی برای تشخیص چهره
سال انتشار: 2014 - تعداد صفحات فایل pdf انگلیسی: 7 - تعداد صفحات فایل doc فارسی: 27
تشخیص چهره در رشته تشخیص الگو و بینایی کامپیوتر، همیشه یک موضوع داغ بوده‌است. در حالت کلی، در فرایند تشخیص، تصاویر یا ویژگی‌ها معمولا به بردارها تبدیل می‌شوند. این روش معمولا منجر به ایجاد اعوجاج در اطلاعات همبستگی عناصر در بردارسازی یک ماتریس تصویر می‌شود. این مقاله یک کلاس‌بندی کننده با نام شبکه عصبی دو بعدی با وزن‌های تصادفی (2D-NNRW) می‌شود که می‌تواند از داده ماتریسی بعنوان یک ورودی مستقیم استفاده کرده و از ساختار ماتریس تصویر محافظت کند. در حالت خاص، کلاس بندی‌ کننده پیشنهاد شده از بردارهای projection چپ و راست برای جایگزینی با وزن ورودی با ابعاد بالا در لایه مخفی به منظور حفظ اطلاعات همبستگی عناصر استفاده کرده، و ایده شبکه عصبی با وزن‌های تصادفی (NNRW) را به منظور یادگیری همه پارامترها اتخاذ می‌کند. آزمایشات انجام شده بر روی پایگاه داده‌های معروف حاکی از آن است که کلاس‌بندی کننده 2D-NNRW پیشنهاد شده می‌تواند خاصیت ساختاری تصویر صورت را مجسم کرده (در برداشته) و دارای کارایی خوبی برای تشخیص چهره می‌باشد.
کلمات کلیدی: تشخیص چهره | کلاس‌بندی‌کننده | شبکه عصبی با وزن‌های تصادفی | (NNRW) | داده ماتریسی
مقاله ترجمه شده
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی