دانلود و نمایش مقالات مرتبط با عدم تعادل کلاس::صفحه 1
بلافاصله پس از پرداخت دانلود کنید

با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد). 

نتیجه جستجو - عدم تعادل کلاس

تعداد مقالات یافته شده: 5
ردیف عنوان نوع
1 Automated classification of fauna in seabed photographs: The impact of training and validation dataset size, with considerations for the class imbalance
طبقه بندی خودکار جانوران در عکس های بستر دریا: تأثیر اندازه مجموعه داده های آموزش و اعتبار سنجی ، با ملاحظاتی برای عدم تعادل کلاس-2021
Machine learning is rapidly developing as a tool for gathering data from imagery and may be useful in identifying (classifying) visible specimens in large numbers of seabed photographs. Application of an automated classifi- cation workflow requires manually identified specimens to be supplied for training and validating the model. These training and validation datasets are generally generated by partitioning the available manual identified specimens; typical ratios of training to validation dataset sizes are 75:25 or 80:20. However, this approach does not facilitate the desired scalability, which would require models to successfully classify specimens in hundreds of thousands to millions of images after training on a relatively small subset of manually identified specimens. A second problem is related to the ‘class imbalance’, where natural community structure means that fewer spec- imens of rare morphotypes are available for model training. We investigated the impact of independent variation of the training and validation dataset sizes on the performance of a convolutional neural network classifier on benthic invertebrates visible in a very large set of seabed photographs captured by an autonomous underwater vehicle at the Porcupine Abyssal Plain Sustained Observatory. We tested the impact of increasing training dataset size on specimen classification in a single validation dataset, and then tested the impact of increasing validation set size, evaluating ecological metrics in addition to computer vision metrics. Computer vision metrics (recall, precision, F1-score) indicated that classification improved with increasing training dataset size. In terms of ecological metrics, the number of morphotypes recorded increased, while diversity decreased with increasing training dataset size. Variation and bias in diversity metrics decreased with increasing training dataset size. Multivariate dispersion in apparent community composition was reduced, and bias from expert-derived data declined with increasing training dataset size. In contrast, classification success and resulting ecological metrics did not differ significantly with varying validation dataset sizes. Thus, the selection of an appropriate training dataset size is key to ensuring robust automated classifications of benthic invertebrates in seabed photographs, in terms of ecological results, and validation may be conducted on a comparatively small dataset with confidence that similar results will be obtained in a larger production dataset. In addition, our results suggest that automated classification of less common morphotypes may be feasible, providing that the overall training dataset size is sufficiently large. Thus, tactics for reducing class imbalance in the training dataset may produce improvements in the resulting ecological metrics.
Keywords: Computer vision | Deep learning | Benthic ecology | Image annotation | Marine photography | Artificial intelligence | Convolutional neural networks | Sample size
مقاله انگلیسی
2 Data imbalance in classification: Experimental evaluation
عدم تعادل داده ها در طبقه بندی: ارزیابی تجربی-2020
The advent of Big Data has ushered a new era of scientific breakthroughs. One of the com- mon issues that affects raw data is class imbalance problem which refers to imbalanced distribution of values of the response variable. This issue is present in fraud detection, network intrusion detection, medical diagnostics, and a number of other fields where neg- atively labeled instances significantly outnumber positively labeled instances. Modern ma- chine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class. The goal of our pa- per is demonstrate the effects of class imbalance on classification models. Concretely, we study the impact of varying class imbalance ratios on classifier accuracy. By highlighting the precise nature of the relationship between the degree of class imbalance and the cor- responding effects on classifier performance we hope to help researchers to better tackle the problem. To this end, we carry out extensive experiments using 10-fold cross validation on a large number of datasets. In particular, we determine that the relationship between the class imbalance ratio and the accuracy is convex.
Keywords: Classification | Class imbalance | Data analysis | Machine learning | Statistical analysis | Supervised learning
مقاله انگلیسی
3 Champion-challenger analysis for credit card fraud detection: Hybrid ensemble and deep learning
تجزیه و تحلیل قهرمان-چالشگر برای کشف تقلب در کارت اعتباری: گروه ترکیبی و یادگیری عمیق-2019
Credit card fraud detection is an essential part of screening fraudulent transactions in advance of their authorization by card issuers. Although credit card frauds occur extremely infrequently, they result in huge losses as most fraudulent transactions have large values. An adequate detection of fraud allows investigators to take timely actions that can potentially prevent additional fraud or financial losses. In practice, however, investigators can only check a few alerts per day since the investigation process can be long and tedious. Thus, the primary goal of the fraud detection model is to return accurate alerts with fewer false alarms and missed frauds. Conventional fraud detection is mainly based on the hybrid en- semble of diverse machine learning models. Recently, several studies have compared deep learning and traditional machine learning models including ensemble. However, these studies used evaluation meth- ods without considering that the real-world fraud detection system operated with the constraints: (i) the number of investigators who check the high-risk transactions from the data-driven scoring models are limited and (ii) the two types of misclassification, false alarms and missed frauds, have different costs. In this study, we conducted an in-depth comparison between the hybrid ensemble and deep learning method to determine whether or not to adopt the latter in our partner’s system that currently operates with the hybrid ensemble model. To compare the two, we introduced the champion-challenger frame- work and the development process of the two models. After developing the two models, we evaluated them on large transaction data sets taken from our partner, a major card issuing company in South Korea. We used various practical evaluation metrics appropriate for this domain that has severe class and cost imbalances. Moreover, we deployed these models in a real-world fraud detection system to check the post-launch performance for one month. The challenger outperformed the champion on both in off-line and post-launch tests.
Keywords: Credit card fraud detection | Deep learning | Hybrid ensemble | Model evaluation | Class imbalance
مقاله انگلیسی
4 The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift
اثر کلی مجموعه تلفیقی گسسته برای کاوش معادلات ناپایدار جریان با مفهوم رانش-2018
Knowledge extraction from data streams has received increasing interest in recent years. However, most of the existing studies assume that the class distribution of data streams is relatively balanced. The reac tion of concept drifts is more difficult if a data stream is class imbalanced. Current oversampling methods generally selectively absorb the previously received minority examples into the current minority set by evaluating similarities of past minority examples and the current minority set. However, the similarity evaluation is easily affected by data difficulty factors. Meanwhile, these oversampling techniques have ignored the majority class distribution, thus risking class overlapping. To overcome these issues, we propose an ensemble classifier called Gradual Resampling Ensemble (GRE). GRE could handle data streams which exhibit concept drifts and class imbalance. On the one hand, a selectively resampling method, where drifting data can be avoidable, is applied to select a part of pre vious minority examples for amplifying the current minority set. The disjuncts can be discovered by the DBSCAN clustering, and thus the influences of small disjuncts and outliers on the similarity evaluation can be avoidable. Only those minority examples with low probability of overlapping with the current majority set can be selected for resampling the current minority set. On the other hand, previous com ponent classifiers are updated using latest instances. Thus, the ensemble could quickly adapt to a new condition, regardless types of concept drifts. Through the gradual oversampling of previous chunks us ing the current minority events, the class distribution of past chunks can be balanced. Favorable results in comparison to other algorithms suggest that GRE can maintain good performance on minority class, without sacrificing majority class performance.
Keywords: Concept drift ، Data stream mining ، Ensemble classifier ، Class imbalance
مقاله انگلیسی
5 A dissimilarity-based imbalance data classification algorithm
یک الگوریتم طبقه بندی داده های نا متعادل مبتنی بر عدم تشابه-2015
Class imbalances have been reported to compromise the performance of most standard classifiers, such as Naive Bayes, Decision Trees and Neural Networks. Aiming to solve this problem, various solutions have been explored mainly via balancing the skewed class distribution or improving the existing classification algorithms. However, these methods pay more attention on the imbalance distribution, ignoring the discriminative ability of features in the context of class imbalance data. In this perspective, a dissimilarity-based method is proposed to deal with the classification of imbalanced data. Our proposed method first removes the useless and redundant features by feature selection from the given data set; and then, extracts representative instances from the reduced data as prototypes; finally, projects the reduced data into a dissimilarity space by constructing new features, and builds the classification model with data in the dissimilarity space. Extensive experiments over 24 benchmark class imbalance data sets show that, compared with seven other imbalance data tackling solutions, our proposed method greatly improves the performance of imbalance learning, and outperforms the other solutions with all given classificationalgorithms. Keywords: Dissimilarity-based classification · Class imbalance · Software defect prediction · Feature selection · Prototype selection
مقاله انگلیسی
rss مقالات ترجمه شده rss مقالات انگلیسی rss کتاب های انگلیسی rss مقالات آموزشی
logo-samandehi
بازدید امروز: 9028 :::::::: بازدید دیروز: 0 :::::::: بازدید کل: 9028 :::::::: افراد آنلاین: 73