با سلام خدمت کاربران عزیز، به اطلاع می رساند ترجمه مقالاتی که سال انتشار آن ها زیر 2008 می باشد رایگان بوده و میتوانید با وارد شدن در صفحه جزییات مقاله به رایگان ترجمه را دانلود نمایید.
A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy
یک الگوریتم جستجوی سریع جدید برای همسایگان دقیق k-مبتنی بر استراتژی بررسی مبتنی بر مثلث-نابرابری بهینه-2020
The k-nearest neighbor (KNN) algorithm has been widely used in pattern recognition, regression, outlier detection and other data mining areas. However, it suffers from the large distance computation cost, especially when dealing with big data applications. In this paper, we propose a new fast search (FS) algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based (OTI) check strategy. During the procedure of searching exact k-nearest neighbors for any query, the OTI check strategy can eliminate more redundant distance computations for the instances located in the marginal area of neighboring clusters compared with the original TI check strategy. Considering the large space complexity and extra time complexity of OTI, we also propose an efficient optimal triangle-inequalitybased (EOTI) check strategy. The experimental results demonstrate that our proposed two algorithms (OTI and EOTI) achieve the best performance compared with other related KNN fast search algorithms, especially in the case of dealing with high-dimensional datasets
Keywords: Exact k-nearest neighbors | Fast search algorithm | Clustering | Triangle inequality | Optimal check strategy
Percolation theory for the recognition of patterns in topographic images of the cortical activity
نظریه انباشت برای تشخیص الگوهای موجود در تصاویر توپوگرافی از فعالیت قشر مغز-2019
Electroencephalogram (EEG) is one of the mechanisms used to collect complex data. Its use includes evaluating neurological disorders, investigating brain function and correlations between EEG signals and real or imagined movements. The Topographic Image of Cortical Activity (TICA) records obtained by the EEG make it possible to observe, through color discrimination, the cortical areas that represent greater or lesser activity. Percolation Theory (PT) reveals properties on the aspects of fluid spreading from a central point, these properties being related to the aspects of the medium, topological characteristics and ease of penetration of a fluid in materials. The hypothesis presented so far considers that synaptic activities originate in points and spread from them, causing different areas of the brain to interact in a diffusive associative behavior, generating electric and magnetic fields by the currents that spread through the brain tissue and have an effect on the scalp sensors. Brain areas spatially separated create large-scale dynamic networks that are described by functional and effective connectivity. The proposition is that this phenomenon behaves like a fluidic spreading, so we can use the PT, through the topological analysis we detect specific signatures related to neural phenomena that manifest changes in the behavior of synaptic diffusion. This signature must be characterized by the Fractal Dimension (FD) values of the scattering clusters, these values will be used as properties in the k-Nearest Neighbors (kNN) method, an TICA will be categorized according to the degree of similarity to the preexisting patterns. In this context, our hypothesis will consolidate as a more computational resource in the service of medicine and another way that opens with the possibility of analysis and detailed inferences of the brain through TICA that go beyond a simply visual observation, as it happens in the present day.
Keywords: Electroencephalogram | Cortical topography | Percolation theory
Deep Representation Learning for Individualized Treatment Effect Estimation using Electronic Health Records
یادگیری بازنمایی عمیق برای ارزیابی اثر درمانی شخصی با استفاده از سوابق الکترونیکی بهداشت-2019
Utilizing clinical observational data to estimate individualized treatment effects (ITE) is a challenging task, as confounding inevitably exists in clinical data. Most of the existing models for ITE estimation tackle this problem by creating unbiased estimators of the treatment effects. Although valuable, learning a balanced representation is sometimes directly opposed to the objective of learning an effective and discriminative model for ITE estimation. We propose a novel hybrid model bridging multi-task deep learning and K-nearest neighbors (KNN) for ITE estimation. In detail, the proposed model firstly adopts multi-task deep learning to extract both outcome-predictive and treatment-specific latent representations from Electronic Health Records (EHR), by jointly performing the outcome prediction and treatment category classification. Thereafter, we estimate counterfactual outcomes by KNN based on the learned hidden representations. We validate the proposed model on a widely used semi-simulated dataset, i.e. IHDP, and a real-world clinical dataset consisting of 736 heart failure (HF) patients. The performance of our model remains robust and reaches 1.7 and 0.23 in terms of Precision in the estimation of heterogeneous effect (PEHE) and average treatment effect (ATE), respectively, on IHDP dataset, and 0.703 and 0.796 in terms of accuracy and F1 score respectively, on HF dataset. The results demonstrate that the proposed model achieves competitive performance over state-of-the-art models. In addition, the results reveal several findings which are consistent with existing medical domain knowledge, and discover certain suggestive hypotheses that could be validated through further investigations in the clinical domain.
Keywords: Individualized Treatment Effect Estimation | Counterfactual Inference | Deep Representation Learning | Multi-task Learning | K-Nearest Neighbors
Consensus-oriented cloud manufacturing based on blockchain technology: An exploratory study
تولید ابر توافق گرا مبتنی بر فناوری بلاکچین : یک مطالعه اکتشافی-2019
In the era of cloud computing and Industry 4.0, significant research efforts on cloud manufacturing have been witnessed in recent years. Nevertheless, challenges, such as issues of trust, safety, payment, remain in this emerging area, which cause less confidence for industry to adopt cloud manufacturing. In this regard, the recent development of blockchain technology provides a potential viable solution thanks to its unique advantages in decentralization and security. As such, we propose a new framework of cloud manufacturing by integrating the blockchain technology. In essence, consensus-oriented mechanisms are employed to generate the operating standards for the blockchain cloud manufacturing model. Moreover, based on the open source Ethereum code, we construct a simulation case study for 3D printing services using the proposed framework. A consortium or federated blockchain is simulated which uses Proof-of-Authority (PoA) as the consensus algorithm of block generation. The simulation involves 939 job requests from 100 users, as well as 10 service providers. The k-nearest neighbors (KNN) algorithm is employed to recommend the service provider for each request. The results show that the provider’s score of service evaluation tends to be stabilize, and 934 requests for service are successfully fulfilled by the appropriate providers while the remaining 5 requests fail to be serviced.
Keywords: Cloud manufacturing | Blockchain technology | KNN | Ethereum | POA | Consensus-oriented
Locality constrained representation-based K-nearest neighbor classification
طبقه بندی همسایه نزدیکترین-K مبتنی بر نمایندگی دارای محدودیت محلی-2019
K-nearest neighbor rule (KNN) is one of the most widely used methods in pattern recognition. However, the KNN-based classification performance is severely affected by the sensitivity of the neighborhood size k and the simple majority voting in the regions of k-neighborhoods, especially in the case of the small sample size with the existing outliers. To overcome the issues, we propose two locality constrained representation-based k-nearest neighbor rules with the purpose of further improving the KNN-based classification performance. The one is the weighted representation-based k-nearest neighbor rule (WRKNN). In the WRKNN, the test sample is represented as the linear combination of its k-nearest neighbors from each class, and the localities of k-nearest neighbors per class as the weights constrain their corresponding representation coefficients. Using the representation coefficients of k-nearest neighbors per class, the representation-based distance between the test sample and the class-specific k-nearest neighbors is calculated as the classification decision rule. The other one is the weighted local mean representation-based k-nearest neighbor rule (WLMRKNN). In the WLMRKNN, k-local mean vectors of k-nearest neighbors per class are first calculated and then used for representing the test sample. In the linear combination of the class-specific k-local mean vectors to represent the test sample, the localities of k-local mean vectors per class are considered as the weights to constrain the representation coefficients of k-local mean vectors. The representation coefficients are employed to design the classification decision rule which is the class-specific representation-based distance between the test sample and k-local mean vectors per class. To demonstrate the effectiveness of the proposed methods, we conduct extensive experiments on the UCI and UCR data sets and face databases, in comparisons with seven related competitive KNN-based methods. The experimental results show that the proposed methods perform better with less sensitiveness to k, especially in the small sample size cases.
Keywords: K-nearest neighbor rule | Local mean vector | Representation-based distance | Pattern recognition
Is mass classification in mammograms a solved problem? - A critical review over the last 20 years
آیا طبقه بندی دسته جمعی در ماموگرافی ها یک مشکل حل شده است؟ - بررسی انتقادی طی 20 سال گذشته-2019
Breast cancer is one of the most common and deadliest cancers that affect mainly women worldwide, and mammography examination is one of the main tools to help early detection. Several papers have been published in the last decades reporting on techniques to automatically recognize breast cancer by analyzing mammograms. These techniques were used to create computer systems to help physicians and radiologists obtain a more precise diagnosis. The objective of this paper is to present an overview re- garding the use of machine learning and pattern recognition techniques to discriminate masses in digi- tized mammograms. The main differences we found in the literature between the present paper and the other reviews are: 1) we used a systematic review method to create this survey; 2) we focused on mass classification problems; 3) the broad scope and spectrum used to investigate this theme, as 129 papers were analyzed to find out whether mass classification in mammograms is a problem solved. In order to achieve this objective, we performed a systematic review process to analyze papers found in the most im- portant digital libraries in the area. We noticed that the three most common techniques used to classify mammographic masses are artificial neural network, support vector machine and k-nearest neighbors. Furthermore, we noticed that mass shape and texture are the most used features in classification, al- though some papers presented the usage of features provided by specialists, such as BI-RADS descriptors. Moreover, several feature selection techniques were used to reduce the complexity of the classifiers or to increase their accuracies. Additionally, the survey conducted points out some still unexplored research opportunities in this area, for example, we identified that some techniques such as random forest and logistic regression are little explored, while others, such as grammars or syntactic approaches, are not being used to perform this task.
Keywords: Mammography | Mammogram | Breast cancer | Classification | Diagnosis | Pattern recognition
A machine learning algorithm for high throughput identification of FTIR spectra: Application on microplastics collected in the Mediterranean Sea
A machine learning algorithm for high throughput identification of FTIR spectra: Application on microplastics collected in the Mediterranean Sea-2019
The development of methods to automatically determine the chemical nature of microplastics by FTIRATR spectra is an important challenge. A machine learning method, named k-nearest neighbors classification, has been applied on spectra of microplastics collected during Tara Expedition in the Mediterranean Sea (2014). To realize these tests, a learning database composed of 969 microplastic spectra has been created. Results show that the machine learning process is very efficient to identify spectra of classical polymers such as poly(ethylene), but also that the learning database must be enhanced with less common microplastic spectra. Finally, this method has been applied on more than 4000 spectra of unidentified microplastics. The verification protocol showed less than 10% difference in the results between the proposed automated method and a human expertise, 75% of which can be very easily corrected.
Keywords: Microplastic | Tara mediterranean campaign | FTIR spectra | Machine learning | k-nearest neighbor classification
Analysis of operating system identification via fingerprinting and machine learning
تجزیه و تحلیل شناسایی سیستم عامل از طریق اثر انگشت و یادگیری ماشین-2019
In operating system (OS) fingerprinting, the OS is identified using network packets and a rule-based matching method. However, this matching method has problems when the network packet information is insufficient or the OS is relatively new. This study com- pares the OS identification capabilities of several machine learning methods, specifically, K-nearest neighbors (K-NN), Decision Tree, and Artificial Neural Network (ANN), to that of a conventional commercial rule-based method. It is shown that the ANN correctly iden- tifies operating systems with 94% probability, which is higher than the accuracy of the conventional rule-based method.
Keywords: Operating system fingerprinting | Machine learning | Artificial Neural Network | NetworkMiner | K-nearest Neighbors | Decision Tree
On the application of machine learning techniques to derive seismic fragility curves
استفاده از روش های یادگیری ماشین برای استنتاج منحنی های شکنندگی لرزه ای-2019
Deriving the fragility curves is a key step in seismic risk assessment within the performance-based earthquake engineering framework. The objective of this study is to implement machine learning tools (i.e., classification-based tools in particular) for predicting the structural responses and the fragility curves. In this regard, ten different classification-based methods are explored: logistic regression, lasso regression, support vector machine, Naïve Bayes, decision tree, random forest, linear and quadratic discriminant analyses, neural networks, and K-nearest neighbors with the structural responses resulted from the multiple strip analyses. In addition, this study examines the impact of class imbalance in training dataset, which is typical among data of structural responses, when developing classification-based models for predicting structural responses. The statistical results using the implemented dataset demonstrate that among applied methods, random forest and quadratic discriminant analysis are, respectively, preferable with the imbalanced and balanced datasets since they show the highest efficiency in predicting the structural responses. Moreover, a detailed procedure is presented on how to derive the fragility curves based on the classification-based tools. Finally, the sensitivity of the applied machine learning methods to the size of employed dataset is investigated. The results explain that logistic regression, lasso regression, and Naïve Bayes are not sensitive to the size of dataset (i.e., the number of performed time history analyses); while the performance of discriminant analysis significantly depends on the size of applied dataset
Keywords: Fragility curve | Machine learning tools | Imbalanced dataset | Random forest | Support vector machine | Multiple strip analysis
Automating orthogonal defect classification using machine learning algorithms
خودکارسازی طبقه بندی نقص متعامد با استفاده از الگوریتم های یادگیری ماشین-2019
Software systems are increasingly being used in business or mission critical scenarios, where the presence of certain types of software defects, i.e., bugs, may result in catastrophic consequences (e.g., financial losses or even the loss of human lives). To deploy systems in which we can rely on, it is vital to understand the types of defects that tend to affect such systems. This allows developers to take proper action, such as adapting the development process or redirecting testing efforts (e.g., using a certain set of testing techniques, or focusing on certain parts of the system). Orthogonal Defect Classification (ODC) has emerged as a popular method for classifying software defects, but it requires one or more experts to categorize each defect in a quite complex and time-consuming process. In this paper, we evaluate the use of machine learning algorithms (k-Nearest Neighbors, Support Vector Machines, Naïve Bayes, Nearest Centroid, Random Forest and Recurrent Neural Networks) for automatic classification of software defects using ODC, based on unstructured textual bug reports. Experimental results reveal the difficulties in automatically classifying certain ODC attributes solely using reports, but also suggest that the overall classification accuracy may be improved in most of the cases, if larger datasets are used.
Index Terms : Software Defects | Bug Reports | Orthogonal Defect Classification | Machine Learning | Text Classification