Data Mining Strategies for Real-Time Control in New York City
استراتژی داده کاوی برای کنترل زمان واقعی در شهر نیویورک-2105
The Data Mining System (DMS) at New York City Department of Transportation (NYCDOT) mainly consists of four database systems for traffic and pedestrian/bicycle volumes, crash data, and signal timing plans as well as the Midtown in Motion (MIM) systems which are used as part of the NYCDOT Intelligent Transportation System (ITS) infrastructure. These database and control systems are operated by different units at NYCDOT as an independent database or operation system. New York City experiences heavy traffic volumes, pedestrians and cyclists in each Central Business District (CBD) area and along key arterial systems. There are consistent and urgent needs in New York City for real-time control to improve mobility and safety for all users of the street networks, and to provide a timely response and management of random incidents. Therefore, it is necessary to develop an integrated DMS for effective real-time control and active transportation management (ATM) in New York City. This paper will present new strategies for New York City suggesting the development of efficient and cost-effective DMS, involving: 1) use of new technology applications such as tablets and smartphone with Global Positioning System (GPS) and wireless communication features for data collection and reduction; 2) interface development among existing database and control systems; and 3) integrated DMS deployment with macroscopic and mesoscopic simulation models in Manhattan. This study paper also suggests a complete data mining process for real-time control with traditional static data, current real timing data from loop detectors, microwave sensors, and video cameras, and new real-time data using the GPS data. GPS data, including using taxi and bus GPS information, and smartphone applications can be obtained in all weather conditions and during anytime of the day. GPS data and smartphone application in NYCDOT DMS is discussed herein as a new concept. © 2014 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of Elhadi M. Shakshu Keywords: Data Mining System (DMS), New York City, real-time control, active transportation management (ATM), GPS data
استفاده از رسانه های اجتماعی برای شناسایی جذابیت گردشگری در شش شهر ایتالیا
سال انتشار: 2019 - تعداد صفحات فایل pdf انگلیسی: 7 - تعداد صفحات فایل doc فارسی: 18
تکامل فناوری و گسترش شبکه های اجتماعی به افراد اجازه داده است که مقادیر زیادی داده را در هر روز تولید کنند. شبکه های اجتماعی کاربرانی را فارهم می کند که به اطلاعات دسترسی دارند. هدف این مقاله تعیین جذابیت های شهرهای مختلف گردشگری ازطریق بررسی رفتار کاربران در شبکه های اجتماعی می باشد. پایگاه داده ای شامل عکس های جغرافیایی واقع شده در شش شهر می باشد که به عنوان یک مرکز فرهنگی و هنری در ایتالیا عمل می کنند. عکس ها از فلیکر که یک بستر به اشتراک گذاری داده می باشد دانلود شدند. تحلیل داده ها با استفاده از دیدگاه مدلهای یادگیری ریاضی و ماشینی انجام شد. نتایج مطالعه ما نشانگر نقشه های شناسایی رفتار کاربران، گرایش سالانه به فعالیت تصویری در شهرها و تاکید بر سودمند بودن روش پیشنهادی می باشد که قادر به تامین اطلاعات مکانی و کاربری است. این مطالعه تاکید می کند که چگونه تحلیل داده های اجتماعی می تواند یک مدل پیشگویانه برای فرموله کردن طرح های گردشگری خلق کند. در انتها، راهبردهای عمومی بازاریابی گردشگری مورد بحث قرار می گیرند.
|مقاله ترجمه شده|
Interactive visualization and analysis of antihypertensive prescriptions using National Health Insurance claims data
بصری سازی تعاملی و تحلیل نسخه های ضد فشار خون با استفاده از ادعاهای بیمه ملی بهداشت و درمان-2018
Interactive visualization is an important approach to help to understand and to explain large amounts of data, particularly in light of decision support. Although data visualization have been introduced in healthcare and clinical fields, analytics has often been performed by data experts, focused on specific subjects, or insufficient statistical evidence. Therefore, this study suggests the procedures of effective and efficient visualization of big data for general healthcare researchers. Specifically, the procedure includes conventional regression analyses followed by interactive data visualization for prescription patterns of antihypertensive drugs. Methods: As a large-scale nationally representative prescription data, the Korean National Health Insurance claims data were collected. Conventional descriptive and regression analyses were conducted for therapy decision and prescription patterns using the software R. Then, based on the statistically significant findings, dashboards were developed to visualize interactively the patterns of prescriptions using the software Tableau. Results: Major characteristics (genders, age groups, healthcare institutions, and comorbidities) explained the differences in therapy and the average number of drugs prescribed as well as differences among most commonly prescribed drug classes. Two interactive dashboards were created for visualizing prescription patterns with incorporation of horizontal bar charts, packed bubble charts, treemaps, filled maps, radar charts, box and whisker plots, and filters. Conclusion: In the current big data era, interactive data visualization offers substantial opportunities to have comprehensive view, extract insights and evidence from the flood of vast amounts of data. This study’s interactive visualizations can provide healthcare professionals insight into prescription patterns and demonstrate the value of creating interactive dashboards to support informed and timely decision-making. Exploring big data using interactive visualization is expected to deliver many future benefits in healthcare fields.
Keywords: Prescriptions; National Health Insurance Claims database; Hypertension; Interactive Visualization
“What is the problem represented to be?” Two decades of research on Roma and education in Europe
مسئله به چه شکلی باید بیان شود؟ دو دهه تحقیق روی رم و آموزش در اروپا-2018
This review article offers an analysis of research on Roma and education. A total of 151 peer-reviewed research articles were sampled through systematic searches in four databases, covering the period 1997–2016. Inspired by critical approaches in policy analysis, we draw on the concept of problem representations to identify dominant discourses in the research material. The analysis identifies nine problem representations; absence from school, academic achievement, socioeconomic issues, cultural differences, invisibility, teachers’ competencies, hostility, segregation and misguided policy and action. The content of these problem representations suggests that Roma is often framed as either victims or problems in educational research, and that cultural differences are much more dominant as a problem representation in the field than structural aspects such as socioeconomic issues. This critical review can contribute to raise awareness regarding how we frame research questions in the field of Roma and education.
keywords: Roma |Gypsy |Traveller |Intercultural education |Problem representations
Big data requirements in current and next fusion research experiments
الزامات داده های بزرگ در آزمایش های فعلی و بعدی همجوشی-2018
The present and future data management requirements for fusion experiments are presented along with the currently adopted solutions. Even if the presented solution fulfil the requirements of the current experiments, the next generation fusion devices are likely to produce/require an unpreceded amount of data. For this reason, the solutions adopted nowadays, and also foreseen for the experiments under construction, might prove not enough scalable. Information Technology already provides efficient solutions for big data management, successfully employed for large cloud applications and social media. In particular, MongoDB, Cassandra and Hadoop represent promising candidates for the next generation experiments because their combined usage covers the specific data requirements for fusion research.
Keywords: Big Data ; Nuclear Fusion Experiment ; Data Acquisition ; Databases
A unique feature extraction using MRDWT for automatic classification of abnormal heartbeat from ECG big data with Multilayered Probabilistic Neural Network classifier
استخراج ویژگی منحصر به فرد با استفاده از MRDWT برای طبقه بندی خودکارضربان قلب غیر طبیعی از داده های بزرگ ECG با چند لایه طبقه بندی احتمالی شبکه عصبی-2018
This paper employs a novel adaptive feature extraction techniques of electrocardiogram (ECG) signal for detection of cardiac arrhythmias using multiresolution discrete wavelet transform from ECG big data. In this paper, five types ECG arrhythmias including normal beats have been classified. The MIT-BIH database of 48 patient records is utilized for detection and analysis of cardiac arrhythmias. Proposed feature extraction utilizes Daubechies as wavelet function and extracts 21 feature points which include the QRS complex of ECG signal. The Multilayered Probabilistic Neural Network (MPNN) classifier is pro posed as the best-suited classifier for the proposed feature. Total 1700 ECG betas were tested using MPNN classifier and compared with other three classifiers Back Propagation (BPNN), Multilayered Perceptron (MLP) and Support Vector Machine (SVM). The system efficiency and performance have been evaluated using seven types of evaluation criteria: precision (PR), F-Score, positive predictivity (PP), sensitivity (SE), classification error rate (CER) and specificity (SP). The overall system accuracy, using MPNN technique utilizing the proposed feature, obtained is 99.53% whereas using BPNN, MLP and SVM provide 97.94%, 98.53%, and 99%. The processing time using MPNN classifier is only 3 s which show that the proposed techniques not only very accurate and efficient but also very quick.
Keywords: Signal processing ، Artificial intelligence ، Pattern recognition ، Soft computing ، Wavelet transform
Small values in big data: The continuing need for appropriate metadata
مقادیر کوچک در داده های بزرگ: نیاز مداوم برای ابرداده مناسب-2018
Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data – observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data — e.g., deletion or substituting fixed values — result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using ‘big data’ datasets, that censored data can be effectively handled with modern quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.
Disadvantage in English seaside resorts: A typology of deprived neighbourhoods
ایراد در اقامتگاههای کنار دریای انگلیس: یک نوع شناسی از همسایگی های محروم-2018
Socio-economic disadvantage experienced by residents of English seaside resorts has been growing over the last decade, and academic and practice-based research is providing better insights into the causes, internal dynamics and appropriate policy responses to these issues in coastal communities. This paper examines the nature and extent of disadvantage in English seaside resorts through analysis of a specially devised spatial and temporal database, which draws together various publicly available sources beyond the population census and Index of Multiple Deprivation. Using univariate, bivariate and multivariate analyses of this database, a new typology of highly deprived resort neighbourhoods has been devised, with clear implications for the formulation of more targeted policy responses. The results also indicate the persistence, complexity and distinct spatial clustering of deprivation, which establishes a case for a much stronger geographical emphasis in future research and policy agendas, including third sector partnerships.
keywords: Disadvantage| Deprivation| Seaside resorts| Neighbourhoods| Typology| UK
An Ensemble Signature-based Approach for Performance Diagnosis in Big Data Platform
یک رویکرد مبتنی بر امضای گروه برای تشخیص کارایی در پلت فرم داده های بزرگ-2018
The big data platform always suffers from performance problems due to internal impairments (e.g. software bugs) and external impairments (e.g. resource hog). And the situation is exacerbated by the properties of velocity, variety and volume (3Vs) of big data. To recovery the system from performance anomaly, the first step is to find the root causes. In this paper, we propose a novel signature-based performance diagnosis approach to rapidly pinpoint the root causes of performance problems in big data platforms. The performance diagnosis is formalized as a pattern recognition problem. We leverage Maximum Information Criterion (MIC) to express the invariant relationships amongst the performance metrics in the normal state. Each performance problem occurred in the big data platform is signified by a unique binary vector named signature, which consists of a set of violations of MIC invariants. The signatures of multiple performance problems form a signature database. If the Key Performance Indicator (KPI) of the big data application exhibits model drift, our approach can identify the real culprits by retrieving the root causes which have similar signatures to the current performance problem. Moreover, considering the diversity of big data applications, we establish an ensemble approach to treat each application separately. The experiment evaluations in a controlled big data platform show that our approach can pinpoint the real culprits of performance problems in an average 84% precision and 87% recall when one fault occurs, which is better than several state-of-the-art approaches.
Keywords: performance analysis; data analysis; distributed computing; software performance
An improved distributed storage and query for remote sensing data
ذخیره سازی توزیع شده بهبود یافته و پرس و جو برای داده های حسی راه دور-2018
With the rapid development of information technology, the amount of remote sensing data is increasing at an unprecedented scale. In the presence of massive remote sensing data, the traditional processing methods have the problems of low efficiency and lack of scalability, so this paper uses open source big data technology to improve it. Firstly, the storage model of remote sensing image data is designed by using the distributed storage database HBase. Then, the grid index and the Hibert curve are combined to establish the index for the image data. Finally, the method of MapReduce parallel processing is used to write and query remote sensing images. The experimental results show that the method can effectively improve the data writing and query speed, and has good scalability.
Keywords: remote data; distribute storage; data query; HBase; mapreduce