Bias reduction in the population size estimation of large data sets
کاهش تمایل در برآورد اندازه جمعیت مجموعه داده های بزرگ-2020
Estimation of the population size of large data sets and hard to reach populations can be a significant problem. For example, in the military, manpower is limited and the manual processing of large data sets can be time consuming. In addition, accessing the full population of data may be restricted by factors such as cost, time, and safety. Four new population size estimators are proposed, as extensions of existing methods, and their performances are compared in terms of bias with two existing methods in the big data literature. These would be particularly beneficial in the context of time-critical decisions or actions. The comparison is based on a simulation study and the application to five real network data sets (Twitter, LiveJournal, Pokec, Youtube, Wikipedia Talk). Whilst no single estimator (out of the four proposed) generates the most accurate estimates overall, the proposed estimators are shown to produce more accurate population size estimates for small sample sizes, but in some cases show more variability than existing estimators in the literature.
Keywords: Relative bias | Twitter | Size estimator | Youtube | Random walk sampling
TUTORIAL: AI research without coding: The art of fighting without fighting: Data science for qualitative researchers
آموزش: تحقیقات هوش مصنوعی بدون رمزگذاری: هنر مبارزه بدون جنگ: علم داده برای محققان کیفی-2020
In this tutorial, we show how to scrape and collect online data, perform sentiment analysis, social network analysis, tribe finding, and Wikidata cross-checks, all without using a single line of programming code. In a stepby- step example, we use self-collected data to perform several analyses of the glass ceiling. Our tutorial can serve as a standalone introduction to data science for qualitative researchers and business researchers, who have avoided learning to program. It should also be useful for experienced data scientists who want to learn about the tools that will allow them to collect and analyze data more easily and effectively.
Keywords: Twitter | Data scraping | Sentiment analysis | Tribe finding | Wikidata
Geo-semantic-parsing: AI-powered geoparsing by traversing semantic knowledge graphs
تجزیه جغرافیایی-معنایی: تجزیه و تحلیل ژئوپارسی با هوش مصنوعی با عبور از نمودارهای دانش معنایی-2020
Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this limitation by introducing a novel geoparsing and geotagging technique called Geo-Semantic- Parsing (GSP). GSP identifies location references in free text and extracts the corresponding geographic coordinates. To reach this goal, we employ a semantic annotator to identify relevant portions of the input text and to link them to the corresponding entity in a knowledge graph. Then, we devise and experiment with several efficient strategies for traversing the knowledge graph, thus expanding the available set of information for the geoparsing task. Finally, we exploit all available information for learning a regression model that selects the best entity with which to geotag the input text. We evaluate GSP on a well-known reference dataset including almost 10 k event-related tweets, achieving F1=0.66. We extensively compare our results with those of 2 baselines and 3 state-of-the-art geoparsing techniques, achieving the best performance. On the same dataset, competitors obtain F1 ≤ 0.55. We conclude by providing in-depth analyses of our results, showing that the overall superior performance of GSP is mainly due to a large improvement in recall, with respect to existing techniques.
Keywords: Geoparsing | Geotagging | Artificial intelligence | Knowledge graphs | Twitter
Can twitter analytics predict election outcome? An insight from 2017 Punjab assembly elections
آیا تحلیل های توییتر می توانند نتیجه انتخابات را پیش بینی کنند؟ بینشی از انتخابات مجلس پنجم 2017-2020
Since the beginning of this decade, there has seen an exponential growth in number of internet users using social media, especially Twitter for sharing their views on various topics of common interest like sports, products, politics etc. Due to the active participation of large number of people on Twitter, huge amount of data (i.e. big data) is being generated, which can be put to use (after refining) to analyze real world problems. This paper takes into consideration the Twitter data related to the 2017 Punjab (a state of India) assembly elections and applies different social media analytic techniques on collected tweets to extract and unearth hidden but useful information. In addition to this, we have employed machine learning algorithm to perform polarity analysis and have proposed a new seat forecasting method to accurately predict the number of seats that a political party is likely to win in the elections. Our results confirmed that Indian National Congress was likely to emerge winner and that in fact was the outcome, when results got declared.
Keywords: Analytics | Election prediction | Social media | Natural language processing | Machine learning | Sentiment analysis | Twitter
Eco-friendliness and fashion perceptual attributes of fashion brands: An analysis of consumers’ perceptions based on twitter data mining
سازگاری با محیط زیست و ویژگی های ادراکی مد برندهای مد: تحلیلی از درک مصرف کنندگان براساس داده کاوی توییتر-2020
This study explores if there is a convergence between the concepts of fashion and eco-friendliness in consumer perception of a fashion brand.We assume that increased eco-friendly perception will influence the brand image positively, with this impact being much higher for luxury than for high and fast fashion brands. The hypotheses are tested using data collected from Twitter. We analyzed the fashion clothing brands with the highest number of followers on the Socialbakers list and applied a novel social network mining methodology that allows measuring the relationship between each brand and two perceptual attributes (fashion and eco-friendliness). The method is based on attribute exemplarsdthat is, Twitter accounts that represent a perceptual attribute. Our exemplars catalyze social media conversations on fashion (identified in our research by the keywords “fashion,” “glamour,” and “style”) and ecofriendliness (keywords “environment” and “ethical business”). Based on social network analysis theory, we computed a similarity function between the followers of the exemplars and those of the brand. The results suggest that there is a correlation between the fashion and the eco-friendliness perceptual attributes of a brand; however, this correlation is far stronger for luxury brands than for high and fast fashion brands. The difference in the correlations confirms the recent tendency of fashion luxury brand to increasingly consider treating environmental issues as part of their core business and not just as added value to the brand’s offer.
Keywords: Fashion brands | Twitter | Consumer perception | Environment | Ethical business | Brand image | Big data
User engagement for mobile payment service providers : introducing the social media engagement model
تعامل کاربر برای ارائه دهندگان خدمات پرداخت تلفن همراه: معرفی مدل تعامل رسانه های اجتماعی-2020
Twitter is being used by mobile wallet firms for customer acquisition, relationship management, marketing and promotional purposes. This study examines service advertisement and promotional tweets by mobile wallet firms on Twitter. For this study, timeline data of top four mobile wallet firms of India, Paytm, MobiKwik, Freecharge and Oxigen Wallet were extracted from their Twitter screen (firm generated tweets). The user generated tweets were also extracted, using the search terms as firms name. This study proposes a Social Media Engagement model for understanding user dynamics. The study provides three interesting inputs for promotional marketing tweets, firstly, firm should post mix of the tweets with respect to content type (i.e. informational, entertainment, remuneration and social). Secondly, a periodic campaigning is needed by the firms; and lastly, firms should focus on increasing their network size. The implications of these findings can help firms managers and marketers in planning effective social media marketing campaigns.
Keywords: Social media marketing | Digital payments | Twitter analytics | Mobile wallets | Customer engagement
Discourses of exclusion on Twitter in the Turkish Context: #ülkemdesuriyeliistemiyorum (#idontwantsyriansinmycountry)
محرومیت گفتگوها در توییتر در متن ترکی: # ülkemdesuriyeliistemiyorum (#idontwantsyriansinmycountry)-2020
The new communicative affordances of online spaces have transformed the ways and domains we build and negotiate meaning. At the same time, they have introduced diverse channels to produce and disseminate animosity. This article explores online discourses as new communicative environments characterized by their unique textual and semiotic features to unfold the discursive constructions of hate and hostility towards Syrian refugees in Turkey. Building on the principles of the Social Media Critical Discourse Studies (SM-CDS) framework proposed by KhosraviNik (2018) and the Discourse-Historical Approach (DHA) by Reisigl and Wodak (2001), the study analyzes a subset of tweets that includes the hashtag #ülkemdesuriyeliistemiyorum (#idontwantsyriansinmycountry) to understand its functions in constructing and proliferating an exclusionary discourse against refugees. The study focuses on referential, argumentation and intensification strategies used in tweets as well as their wider socio-political implications. The results reveal that refugees in Turkey are delineated as threats, invaders, criminals and potential dangers by the users of online media. It is further observed that a sharper rhetoric and a more intense negative-other representation emerge in Twitter as an online public space compared to print media discourses. While scrutinizing the (re)construction and representation of refugees, our analysis has also uncovered that hate and hostility discourses towards refugees constantly operate to build a collective nationalist identity. This interlocking relationship between constructing refugees through stereotypical attributes as a homogeneously dangerous group and forming a collective Turkish identity is manifested at each level of our analysis
Keywords: Online hostility | Hashtag | Refugees | Critical discourse analysis | Turkey | Twitter
Uncovering cyberincivility among nurses and nursing students on Twitter: A data mining study
کشف فضای مجازی در بین پرستاران و دانشجویان پرستاری در توییتر: مطالعه داده کاوی-2019
Background: Although misuse of social networking sites, particularly Twitter, has occurred, little is known about the prevalence, content, and characteristics of uncivil tweets posted by nurses and nursing students. Objective: The aim of this study was to describe the characteristics of tweets posted by nurses and nursing students on Twitter with a focus on cyberincivility. Method: A cross-sectional, data-mining study was held from February through April 2017. Using a data-mining tool, we extracted quantitative and qualitative data from a sample of 163 self-identified nurses and nursing students on Twitter. The analysis of 8934 tweets was performed by a combination of SAS 9.4 for descriptive and inferential statistics including logistic regression and NVivo 11 to derive descriptive patterns of unstructured textual data. Findings: We categorized 413 tweets (4.62%, n=8934) as uncivil. Of these, 240 (58%) were related to nursing and the other 173 (42%) to personal life. Of the 163 unique users, 60 (36.8%) generated those 413 uncivil posts, tweeting inappropriately at least once over a period of six weeks. Most uncivil tweets contained profanity (n=135, 32.7%), sexually explicit or suggestive material (n=37, 9.0%), name-calling (n=14, 3.4%), and discriminatory remarks against minorities (n=9, 2.2%). Other uncivil content included product promotion, demeaning comments toward patients, aggression toward health professionals, and HIPAA violations. Conclusion: Nurses and nursing students share uncivil tweets that could tarnish the image of the profession and violate codes of ethics. Individual, interpersonal, and institutional efforts should be made to foster a culture of cybercivility.
Keywords: Civility | Cyberincivility | Education | Incivility | Nurses | Nursing | Nursing students | Social media | Social networking sites | Twitter
Identifying and monitoring the development trends of emerging technologies using patent analysis and Twitter data mining_ The case of perovskite solar cell technology
شناسایی و نظارت بر روند توسعه فن آوری های نوظهور با استفاده از آنالیز ثبت اختراع و داده کاوی توییتر: مورد فناوری سلول خورشیدی پروسکایت-2019
Monitoring the emergence of emerging technologies helps managers and decision makers to identify development trends in emerging technologies is crucial for government research and development (R&D), strategic planning, social investment, and enterprise practices. Researchers usually use academic papers and patent data to identify and monitoring the trends of emerging technologies from a technological perspective, but they rarely make use of social media data (e.g., such as Twitter data) related to emerging technologies. Analysis of this social media data is of great significance to understand the emergence of emerging technologies and gain insight into development trends. Therefore, this paper proposes a framework that uses patent analysis and Twitter data mining to monitoring the emergence of emerging technologies and identify changing trends of these emerging technologies. The perovskite solar cell technology is selected as a case study. In this case, we used patent analysis to monitoring the evolutionary path of perovskite solar cell technology. We applied Twitter data mining to analyze Twitter users sense of, response to, and expectations for this perovskite solar cell technology. We also identified the professional types of Twitter users and examined changes in their topics of interest over time to track the emergence of perovskite solar cell technology. We analyzed a comparison of the results of patent analysis and Twitter data mining to identify development trends of perovskite solar cell technology. This paper contributes to our understanding of how technologies emerge and develop, as well as the technology forecasting and foresight methodology, and will be of interest to solar photovoltaic technology R&D experts.
Keywords: Emerging technologies | Technology trends | Patent analysis | Twitter data mining | Technologies emerge | Perovskite solar cell technology
Mining Twitter data for causal links between tweets and real-world outcomes
استخراج داده های توییتر برای پیوندهای علی بین توییتها و پیامدهای دنیای واقعی-2019
The authors present an expert and intelligent system that (1) identifies influential term groups having causal relationships with real-world enterprise outcomes from Twitter data and (2) quantifies the appro- priate time lags between identified influential term groups and enterprise outcomes. Existing expert and intelligent systems, which are defined as computer systems that imitate the ability of human decision making, could enable computers to identify the spread of Twitter users’ enterprise-related feedback au- tomatically. However, existing expert and intelligent systems have limitations on automatically identifying the causal effects on enterprise outcomes. Identifying the causal effects on enterprise outcomes is impor- tant, because Twitter users’ feedback toward enterprise decisions may have real-world implications. The proposed expert and intelligent system can support decision makers’ decisions considering the real-world effects of identified Twitter users’ feedback on enterprise outcomes. In particular, (1) a co-occurrence net- work analysis model is exploited to discover term candidates for generating influential term groups that are combinations of enterprise-related terms, which potentially influence enterprise outcomes. (2) Time series models and (3) a Granger causality analysis model are then employed to identify influential term groups having causal relationships with enterprise outcomes with the appropriate time lags. Case studies involving a real-world internet video streaming and disc rental provider as well as an airline company are used to test the validity of the proposed expert and intelligent system for both predicting enterprise outcomes in a long period and predicting the effects of specific events on enterprise outcomes in a short period.
Keywords: Expert and intelligent system | Social media | Enterprise outcome | Co-occurrence network | Time series analysis | Granger causality analysis