با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
ردیف | عنوان | نوع |
---|---|---|
1 |
Development of a national-scale real-time Twitter data mining pipeline for social geodata on the potential impacts of flooding on communities
توسعه یک خط لوله داده کاوی داده های توییتر در زمان واقعی در مقیاس ملی برای ژئو داده های اجتماعی در مورد اثرات احتمالی سیل بر جوامع-2019 Social media, particularly Twitter, is increasingly used to improve resilience during extreme weather events/
emergency management situations, including floods: by communicating potential risks and their impacts, and
informing agencies and responders. In this paper, we developed a prototype national-scale Twitter data mining
pipeline for improved stakeholder situational awareness during flooding events across Great Britain, by retrieving
relevant social geodata, grounded in environmental data sources (flood warnings and river levels). With
potential users we identified and addressed three research questions to develop this application, whose components
constitute a modular architecture for real-time dashboards. First, polling national flood warning and
river level Web data sources to obtain at-risk locations. Secondly, real-time retrieval of geotagged tweets,
proximate to at-risk areas. Thirdly, filtering flood-relevant tweets with natural language processing and machine
learning libraries, using word embeddings of tweets. We demonstrated the national-scale social geodata pipeline
using over 420,000 georeferenced tweets obtained between 20 and 29th June 2016. Keywords: Flood management | Twitter | Volunteered geographic information | Natural language processing | Word embeddings | Social geodata |
مقاله انگلیسی |
2 |
A Privacy Weaving Pipeline for Open Big Data
خط مشی ساخت و ساز پایپ لاین برای داده های بزرگ باز-2016 The power of big data gives us an unprecedented
chance to understand, analyze, and recreate the world, while open
data ensures that power be shared and widely exploited. Open and
big data has become the emerging topics for researchers and
governments. Thus, the related privacy issues also become an
emerging urgent problem. In this work, we propose a conceptual
framework of privacy weaving pipeline dedicated for producing
open and big data while preserving privacy. Within the processing
pipeline, each step of the process flow considers the privacy
assurance to manipulate datasets. However, the complexity of
process flow is the same as normal data pipeline. The experimental
prototype confirms the feasibility of framework design. We hope
this work will facilitate the development of open and big data
industry.
Keywords: open data | big data | data pipeline | privacy breach |
مقاله انگلیسی |
3 |
Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks
Spark در مقابل Flink: درک عملکرد در چهارچوب های تحلیل داده های بزرگ تجزیه-2016 Big Data analytics has recently gained increasing
popularity as a tool to process large amounts of data on-demand.
Spark and Flink are two Apache-hosted data analytics frameworks that facilitate the development of multi-step data pipelines
using directly acyclic graph patterns. Making the most out
of these frameworks is challenging because efficient executions
strongly rely on complex parameter configurations and on an
in-depth understanding of the underlying architectural choices.
Although extensive research has been devoted to improving and
evaluating the performance of such analytics frameworks, most of
them benchmark the platforms against Hadoop, as a baseline, a
rather unfair comparison considering the fundamentally different
design principles. This paper aims to bring some justice in this
respect, by directly evaluating the performance of Spark and
Flink. Our goal is to identify and explain the impact of the
different architectural choices and the parameter configurations
on the perceived end-to-end performance. To this end, we develop
a methodology for correlating the parameter settings and the
operators execution plan with the resource usage. We use this
methodology to dissect the performance of Spark and Flink
with several representative batch and iterative workloads on up
to 100 nodes. Our key finding is that there none of the two
framework outperforms the other for all data types, sizes and
job patterns. This paper performs a fine characterization of the
cases when each framework is superior, and we highlight how
this performance correlates to operators, to resource usage and
to the specifics of the internal framework design.
Index Terms: Big Data | performance evaluation | Spark | Flink |
مقاله انگلیسی |