Text Mining Based on Tax Comments as Big Data Analysis Using SVM and Feature Selection
متن کاوی براساس نکات مالیاتی به عنوان تجزیه و تحلیل داده های بزرگ با استفاده از SVM و انتخاب ویژگی-2018
The tax gives an important role for the contributions of the economy and development of a country. The improvements to the taxation service system continuously done in order to increase the State Budget. One of consideration to know the performance of taxation particularly in Indonesia is to know the public opinion as for the object service. Text mining can be used to know public opinion about the tax system. The rapid growth of data in social media initiates this research to use the data source as big data analysis. The dataset used is derived from Facebook and Twitter as a source of data in processing tax comments. The results of opinions in the form of public sentiment in part of service, website system, and news can be used as consideration to improve the quality of tax services. In this research, text mining is done through the phases of text processing, feature selection and classification with Support Vector Machine (SVM). To reduce the problem of the number of attributes on the dataset in classifying text, Feature Selection used the Information Gain to select the relevant terms to the tax topic. Testing is used to measure the performance level of SVM with Feature Selection from two data sources. Performance measured using the parameters of precision, recall, and Fmeasure.
Keywords: Text Mining; Tax Comments; Support Vector Machine; Feature Selection
Research trends on Big Data in Marketing: A text mining and topic modeling based literature analysis
تحولات تحقیق بر روی داده های بزرگ در بازاریابی: متن کاوی و تحلیل آماری مبتنی بر مدل سازی موضوع-2018
Given the research interest on Big Data in Marketing, we present a research literature analysis based on a text mining semi-automated approach with the goal of identifying the main trends in this domain. In particular, the analysis focuses on relevant terms and topics related with five dimensions: Big Data, Marketing, Geographic location of authors’ affiliation (countries and continents), Products, and Sectors. A total of 1560 articles published from 2010 to 2015 were scrutinized. The findings revealed that research is bipartite between technological and research domains, with Big Data publications not clearly aligning cutting edge techniques toward Marketing benefits. Also, few inter-continental co-authored publications were found. Moreover, findings show that research in Big Data applications to Marketing is still in an embryonic stage, thus making it essential to develop more direct efforts toward business for Big Data to thrive in the Marketing arena.
Keywords: Big data ، Marketing ، Literature analysis ، Research trends ، Text mining
Awareness, determinants and value of reputation risk management: Empirical evidence from the banking and insurance industry
آگاهی، عوامل تعیین کننده و ارزش مدیریت خطر شهرت: شواهد تجربی از بانکداری و صنعت بیمه-2018
The aim of this paper is to empirically study reputation risk management in the US and European banking and insurance industry, which has become increasingly important in recent years. We first use a text mining approach and find that the awareness of reputation risk (management) as reflected in annual reports has increased during the last ten years and that it has gained in importance relative to other risks. Furthermore, we provide the first empirical study of the determinants and value of reputation risk management. Our results show that larger firms, as well as firms that are located in Europe and have a higher awareness of their reputation, are significantly more likely to implement a reputation risk management program. Finally, we obtain initial indications of the value-relevance of reputation risk management.
keywords: Reputation risk management |Reputation risk |Corporate reputation |Text mining
When guests trust hosts for their words: Host description and trust in sharing economy
وقتی که مهمانها به میزبان ها برای کلماتشان اعتماد می کنند: توصیف میزبان و اعتماد در اقتصاد مشترک-2018
In order to better understand the dynamics of user behavior in the sharing economy platform, a multi-stage study was conducted on how Airbnb hosts articulate themselves online and how consumers respond to different host self-presentation patterns. First, using text mining techniques on a large dataset consisting descriptions of Airbnb hosts in 14 major cities in the United States, two patterns of host self-presentation were identified. Hosts generally present themselves online as (1) a well-traveled individual, eager to meet new people or (2) an individual of a certain profession. This contributes to the conceptualization of profile as promise framework for online self-presentation in mixed-mode interactions involving peer-to-peer accommodation platform. Second, consumers respond to the two host self-presentation strategies differently, demonstrating higher levels of perceived trustworthiness in and intention to book from well-traveled hosts. This has direct strategic implications for effective self-marketing of “amateur” tourism players as well as for the role of residents as resources in tourism destinations.
keywords: Airbnb |Sharing economy |Peer-to-peer accommodation |Host self-presentation |Self-marketing |Trustworthiness
Big Data and forensics: An innovative approach for a predictable jurisprudence
داده های بزرگ و پزشکی قانونی: رویکرد نوآورانه برای فقه قابل پیش بینی-2018
Nowadays, it is easy to trace a large amount of information on the web, to access docu ments and produce a digital storage. The current work is submitted as an introduction to an innovative system for the inves tigation about notoriety of web data which is based on the evaluation of judicial sentences and it is implemented to reduce the duration of all processes. This research also aims to open some new conjoint debates about the study and ap plication of statistical and computational methods to web data on new forensics topics: text mining techniques enable us to obtain information which may be helpful to establish a statistical index in order to describe the quality and the efficiency in terms of law. It is also possible to develop an intelligent system about facts and judgments.
Keywords: Quality in law ، Efficiency in law ، Big data ، Literal text similarity ، Semantic text similarity
Utilitarianism and knowledge growth during status seeking: Evidence from text mining of online reviews
منفعت گرایی و رشد دانش درطی پیگیری وضعیت: شواهدی از واکاوی متن بازدیدهای آنلاین-2018
Websites with user-generated content (UGC) usually adopt incentive hierarchies to encourage users to contribute content continuously and to realize increasingly higher status in the online community through achieving increasingly more difficult goals. Yet the literature remains largely unclear on how these incentive hierarchies affect user behavior during status seeking. Empirical findings drawn from the data of 19,674 TripAdvisor members suggest that 1) at a lower status or earlier stage, members are more eager for quick promotion and utilitarianism results in fewer words per review; and 2) members’ knowledge grows as their status rises. This study concludes by offering theoretical and managerial implications for both research and practice.
keywords: UGC |Goals |Utilitarianism |Status |Knowledge
Topic analysis of online reviews for two competitive products using latent Dirichlet allocation
تحلیل موضوعی بازدیدهای آنلاین برای دو محصول رقابتی با استفاده از تخصیص نهان دیریکله-2018
The voice of the customer plays an important role in product competition. Traditional methods in the area have largely focused on market research and questionnaire surveys to obtain customer preferences. However, online product reviews have provided a good and reliable channel for not only understanding customers needs for one product or service but also analyzing products’ competition in the market. In this paper, we propose a new framework of applying online product reviews to analyze customer preferences for two competitive products. We extract the key topics of online reviews for two specific competitive products via a text mining approach of latent Dirichlet allocation (LDA). Topic difference analysis demonstrates the unique topics of the two products. The relative importance and topic heterogeneity analyses identify the competitive superiorities and weaknesses of both products. Two case studies that are presented demonstrate the efficacy of the proposed framework. The method also provides valuable managerial implications for product designers and e-commerce companies.
keywords: Competitive analysis |Latent Dirichlet allocation |Online product reviews |Product competition |Text mining |Topic analysis
Hierarchical topic modeling with automatic knowledge mining
مدل سازی موضوع سلسله مراتبی با دانش کاوی -2018
Traditional topic modeling has been widely studied and popularly employed in expert systems and in formation systems. However, traditional topic models cannot discover structural relations among topics, thus losing the chance to explore the data more deeply. Hierarchical topic modeling has the capability of learning topics, as well as discovering the hierarchical topic structure from text data. But purely un supervised models tend to generate weak topic hierarchies. To solve this problem, we propose a novel knowledge-based hierarchical topic model (KHTM), which can incorporate prior knowledge into topic hi erarchy building. A key novelty of this model is that it can mine prior knowledge automatically from the topic hierarchies of multiple domains corpora. In this paper, the knowledge is represented as the word pairs which satisfy the requirement of frequent co-occurrence, and knowledge is organized in form of hi erarchical structure. We also propose an iterative learning algorithm. For evaluation, we crawled two new multi-domain datasets and conducted comprehensive experiments. The experimental results show that our algorithm and model can generate more coherent topics, and more reasonable hierarchical structure.
Keywords: Hierarchical topic modeling ، Text mining ، Knowledge mining ، Non-parametric Bayesian learning ، Gibbs sampling
Web Media and Stock Markets : A Survey and Future Directions from a Big Data Perspective
رسانه های وب و بازار سهام: مرور و دستورالعمل های آینده از یک چشم انداز داده های بزرگ-2018
Stock market volatility is influenced by information release, dissemination, and public acceptance. With the increasing volume and speed of social media, the effects of Web information on stock markets are becoming increasingly salient. However, studies of the effects of Web media on stock markets lack both depth and breadth due to the challenges in automatically acquiring and analyzing massive amounts of relevant information. In this study, we systematically reviewed 229 research articles on quantifying the interplay between Web media and stock markets from the fields of Finance, Management Information Systems, and Computer Science. In particular, we first categorized the representative works in terms of media type and then summarized the core techniques for converting textual information into machine-friendly forms. Finally, we compared the analysis models used to capture the hidden relationships between Web media and stock movements. Our goal is to clarify current cutting-edge research and its possible future directions to fully understand the mechanisms of Web information percolation and its impact on stock markets from the perspectives of investors cognitive behaviors, corporate governance, and stock market regulation
Index Terms: Computing methodologies, text mining, financial market, stocks, big data, social media, news
Capturing distinctions while mining text data: Toward low-tech formalization for text analysis
گرفتن تمایزات در حین استخراج داده های متنی: به سوی رسمی سازی تکنولوژی کم برای تجزیه و تحلیل متن-2018
In this article we consider some low-tech approaches to text mining. Our goal is to articulate a RiCH (Reader in Control of Hermeneutics) style of text analysis that takes advantage of the digital affordances of modern reading practices and easily deployable computational tools while also preserving the primacy of the interpretive lens of the human reader. In the article we offer three analytical interventions that are suitable to the low-tech formalizations we propose: the first and most developed intervention tracks the (normally computationally ignored) “stop” words; the second identifies the use of strategic anxiety terms in the texts; and the third (less developed in this article) introduces the grammatical features of modality (including modalization statements of probability and usuality, and modulation statements regarding degrees of obligation and in clination). All three analytical interventions provide a productive tracking of various modes and degrees of strategic decisiveness, contradiction, uncertainty and indeterminacy in a corpus of recent U.S. National Security Strategy reports.
Keywords: Text mining ، Hermeneutics ، National security ، Computational sociology ، Big data ، Close reading