با سلام خدمت کاربران در صورتی که با خطای سیستم پرداخت بانکی مواجه شدید از طریق کارت به کارت (6037997535328901 بانک ملی ناصر خنجری ) مقاله خود را دریافت کنید (تا مشکل رفع گردد).
ردیف | عنوان | نوع |
---|---|---|
1 |
Applying big data paradigms to a large scale scientific workflow: Lessons learned and future directions
اعمال پارادایم های داده های بزرگ به یک جریان کار علمی در مقیاس بزرگ: درس های آموخته شده و جهت های آینده-2018 The increasing amounts of data related to the execution of scientific workflows has raised awareness
of their shift towards parallel data-intensive problems. In this paper, we deliver our experience com
bining the traditional high-performance computing and grid-based approaches with Big Data analytics
paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the
suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of
scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming
model, and the underlying infrastructure that integrates storage and computation in each node. We
experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF
HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on
a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2).
The results we obtained were analysed to synthesize the lessons we learned from this experience, while
discussing promising directions for further research.
Keywords: Scientific workflows ، Big data ، Cloud computing ، Apache spark ، Hydrology |
مقاله انگلیسی |
2 |
ClowdFlows: Online workflows for distributed big data mining
ClowdFlows: گردش کار آنلاین برای کاوش داده های بزرگ توزیع شده-2017 The paper presents a platform for distributed computing, developed using the latest software technologies
and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented
as a cloud-based web application with a graphical user interface which supports the construction
and execution of data mining workflows, including web services used as workflow components. As
a web application, the ClowdFlows platform poses no software requirements and can be used from
any modern browser, including mobile devices. The constructed workflows can be declared either as
private or public, which enables sharing the developed solutions, data and results on the web and in
scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to
any number of computing nodes. From a developer’s perspective the platform is easy to extend and
supports distributed development with packages. The paper focuses on big data processing in the batch
and real-time processing mode. Big data analytics is provided through several algorithms, including
novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining
module for continuous parallel workflow execution. The batch mode and real-time processing mode are
demonstrated with practical use cases. Performance analysis shows the benefit of using all available data
for learning in distributed mode compared to using only subsets of data in non-distributed mode. The
ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.
Keywords:Data mining platform|Cloud computing|Scientific workflows|Batch processing|Map-reduce|Big data |
مقاله انگلیسی |
3 |
Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications
تسهیل تجدید پذیری جریان های کاری علمی با مشخصات محیط اجرایی-2017 Scientific workflows are designed to solve complex scientific problems and accelerate scientific
progress. Ideally, scientific workflows should improve the reproducibility of scientific applica
tions by making it easier to share and reuse workflows between scientists. However, scientists
often find it difficult to reuse others’ workflows, which is known as workflow decay. In this
paper, we explore the challenges in reproducing scientific workflows, and propose a framework
for facilitating the reproducibility of scientific workflows at the task level by giving scientists
complete control over the execution environments of the tasks in their workflows and integrating
execution environment specifications into scientific workflow systems. Our framework allows
dependencies to be archived in basic units of OS image, software and data instead of gigantic
all-in-one images. We implement a prototype of our framework by integrating Umbrella, an
execution environment creator, into Makeflow, a scientific workflow system.
To evaluate our framework, we use it to run two bioinformatics scientific workflows, BLAST
and BWA. The execution environment of the tasks in each workflow is specified as an Umbrella
specification file, and sent to execution nodes where Umbrella is used to create the specified
environment for running the tasks. For each workflow we evaluate the size of the Umbrella spec
ification file, the time and space overheads of creating execution environments using Umbrella,
and the heterogeneity of execution nodes contributing to each workflow. The evaluation results
show that our framework improves the utilization of heterogeneous computing resources, and
and the heterogeneity of execution nodes contributing to each workflow.
Keywords: reproducible research | scientific workflows | execution environment specifications |
مقاله انگلیسی |