Bias reduction in the population size estimation of large data sets
کاهش تمایل در برآورد اندازه جمعیت مجموعه داده های بزرگ-2020
Estimation of the population size of large data sets and hard to reach populations can be a significant problem. For example, in the military, manpower is limited and the manual processing of large data sets can be time consuming. In addition, accessing the full population of data may be restricted by factors such as cost, time, and safety. Four new population size estimators are proposed, as extensions of existing methods, and their performances are compared in terms of bias with two existing methods in the big data literature. These would be particularly beneficial in the context of time-critical decisions or actions. The comparison is based on a simulation study and the application to five real network data sets (Twitter, LiveJournal, Pokec, Youtube, Wikipedia Talk). Whilst no single estimator (out of the four proposed) generates the most accurate estimates overall, the proposed estimators are shown to produce more accurate population size estimates for small sample sizes, but in some cases show more variability than existing estimators in the literature.
Keywords: Relative bias | Twitter | Size estimator | Youtube | Random walk sampling
On testing pseudorandom generators via statistical tests based on the arcsine law
در مورد آزمایش ژنراتورهای شبه تصادفی از طریق تست های آماری براساس قانون arcsine-2020
Testing the quality of pseudorandom number generators is an important issue. Security requirements become more and more demanding, weaknesses in this matter are simply not acceptable. There is a need for an in-depth analysis of statistical tests – one has to be sure that rejecting/accepting a generator as good is not a result of errors in computations or approximations. In this paper we propose a second level statistical test based on the arcsine law for random walks. We provide upper bounds for the approximation of the arcsine distribution, what allows us to perform a detailed error analysis of the proposed test.
Keywords: The arcsine law | Random walks | Pseudorandom number generator | Statistical testing | Second level testing | Dyck paths
Dynamic texture analysis with diffusion in networks
تجزیه و تحلیل بافت پویا با انتشار در شبکه-2019
Dynamic texture is a field of research that has gained considerable interest from computer vision community due to the explosive growth of multimedia databases. In addition, dynamic texture is present in a wide range of videos, which makes it very important in expert systems based on videos such as medical systems, traffic monitoring systems, forest fire detection system, among others. In this paper, a new method for dynamic texture characterization based on diffusion in directed networks is proposed. The dynamic texture is modeled as a directed network. The method consists in the analysis of the dynamic of this network after a series of graph cut transformations based on the edge weights. For each network transformation, the activity for each vertex is estimated. The activity is the relative frequency that one vertex is visited by random walks in balance. Then, texture descriptor is constructed by concatenating the activity histograms. The main contributions of this paper are the use of directed network modeling and diffusion in network to dynamic texture characterization. These tend to provide better performance in dynamic textures classification. Experiments with rotation and interference of the motion pattern were conducted in order to demonstrate the robustness of the method. The proposed approach is compared to other dynamic texture methods on two very well known dynamictexture database and on traffic condition classification, and outperform in most of the cases.
Keywords: Dynamic texture | Complex networks | Diffusion | Random walks
CNAVER: A Content and Network-based Academic VEnue Recommender system
CNAVER: سیستم توصیه گر دانشگاهی VEnue مبتنی بر شبکه و محتوا-2019
The phenomenon of rapidly developing academic venues poses a significant challenge for researchers: how to recognize the ones that are not only in accordance with one’s scholarly interests but also of high significance? Often, even a high-quality paper is rejected because of a mismatch between the research area of the paper and the scope of the journal. Recommending appropriate scholarly venues to researchers empowers them to recognize and partake in important academic conferences and assists them in getting published in impactful journals. A venue recommendation system becomes helpful in this scenario, particularly when exploring a new field or when further choices are required. We propose CNAVER: A Content and Network-based Academic VEnue Recommender system. It provides an integrated framework employing a rank-based fusion of paper-paper peer network (PPPN) model and venue-venue peer network (VVPN) model. It only requires the title and abstract of a paper to provide venue recommendations, thus assisting researchers even at the earliest stage of paper writing. It also addresses cold start issues such as the involvement of an inexperienced researcher and a novel venue along with the problems of data sparsity, diversity, and stability. Experiments on the DBLP dataset exhibit that our proposed approach outperforms several state-of-the-art methods in terms of precision, nDCG, MRR, accuracy, F − measuremacro, average venue quality, diversity, and stability.
Keywords: Venue recommender system | Social network analysis | Meta-path analysis | Random walk with restart (RWR) | Graph clustering | Rank-based fusion
DISL: Deep Isomorphic Substructure Learning for network representations
DISL: یادگیری زیرساختار ایزومورفیک عمیق برای بازنمایی شبکه ها-2019
The analysis of complex networks based on deep learning has drawn much attention recently. Generally, due to the scale and complexity of modern networks, traditional methods are gradually losing the analytic efficiency and effectiveness. Therefore, it is imperative to design a network analysis model which caters to the massive amount of data and learns more comprehensive information from networks. In this paper, we propose a novel model, namely Deep Isomorphic Substructure Learning (DISL) model, which aims to learn network representations from patterns with isomorphic substructures. Specifically, in DISL, deep learning techniques are used to learn a better network representation for each vertex (node). We provide the method that makes the isomorphic units self-embed into vertex-based subgraphs whose explicit topologies are extracted from raw graphstructured data, and design a Probability-guided Random Walk (PRW) procedure to explore the set of substructures. Sequential samples yielded by PRW provide the information of relational similarity, which integrates the information of correlation and co-occurrence of vertices and the information of substructural isomorphism of subgraphs. We maximize the likelihood of the preserved relationships for learning the implicit similarity knowledge. The architecture of the Convolutional Neural Networks (CNNs) is redesigned for simultaneously processing the explicit and implicit features to learn a more comprehensive representation for networks. The DISL model is applied to several vertex classification tasks for social networks. Our results show that DISL outperforms the challenging state-of-the-art Network Representation Learning (NRL) baselines by a significant margin on accuracy and weighted-F1 scores over the experimental datasets.
Keywords: Deep learning | Network representations | Isomorphic substructures | Probability-guided random walk | Convolutional neural networks
Efficient heterogeneous proximity preserving network embedding model
پیش ذخیره مجاورت ناهمگن کارآمد حفظ مدل توکاری شبکه-2019
We study the problem of representation learning in heterogeneous information networks. Its unique chal- lenges come from the existence of multiple types of vertices and edges. Existing proximity-based net- work embedding techniques ignore the type information when evaluating the proximity and limits their usage in heterogeneous scenario. In this paper, we propose a heterogeneous proximity preserving net- work embedding model via meta path guided random walk, which is capable of capturing the high-order proximity between vertices specified by the given path. To improve the learning efficiency, we introduce a sampling based learning strategy which can incrementally learn representations. We conduct experi- ments on two real world heterogeneous information networks. Experimental results on several mining tasks prove the effectiveness of our approach over many competitive baselines. The model is very effi- cient and is able to learn embeddings for large networks both in offline and online scenarios. Besides, for expert system, our approach can be applied to improve the representation of knowledge entities by depicting the knowledge base as a heterogeneous information network.
Keywords: Network embedding | Heterogeneous information network | Random walk
Event recommendation in social networks based on reverse random walk and participant scale control
توصیه های رویداد در شبکه های اجتماعی بر اساس پیاده روی تصادفی معکوس و کنترل مقیاس شرکت کننده-2017
With the merging of cyber world and physical world, event-based social networks have been playing an important role in promoting the spread of offline social events through online channels. Event recommendation in social networks, which is to recommend a list of upcoming events to a user according to his preference, has attracted a lot of research interests recently. In this paper, we study the event recommendation problem based on the graph theory. We first construct a heterogeneous graph to represent the interactions among different types of entities in an event-based social network. Based on the constructed graph, we propose a novel event scoring algorithm called reverse random walk with restart to obtain the user–event recommendation matrix. In practice, the participant capacity of an event may be constrained to a limited number of users. Then based on the user–event recommendation matrix, we further propose two participant scale control algorithms to coordinate unbalanced user arrangements among events. After the rearrangement, each user will be assigned a list of recommended events, which considers both local user preference and global event capacity. Experiment results on Meetup dataset show that the proposed method outperforms the state-of-art algorithms in terms of higher recommendation precision and larger recommendation coverage.
Keywords: Event recommendation | Reverse random walk | Participant scale control | Event-based social networks