Data mining for energy systems: Review and prospect

Data mining for energy systems: Review and prospect

This paper reviews some machine learning techniques for power big data mining, such as deep learning, transfer learning, randomized learning, granular computing and multi‐source data fusion. Some typical applications, such as load forecasting and modelling, integrated power and transportation system, and electricity market forecasting, are discussed.


Abstract

An in‐depth study on big data mining is urgently needed for the next‐generation energy systems, which are characterized by a deep integration of cyber, physical, and social components. This paper presents an initial discussion on big data mining and its applications in intelligent energy systems. New progress in big data mining, such as deep learning, transfer learning, randomized learning, granular computing, and multisource data fusion, is introduced first. Some applications of data mining in energy systems, such as load forecasting and modeling, integrated power and transportation system, and electricity market forecasting and simulation, are discussed then. Moreover, some research problems in energy system data mining, such as cyber–physical–social system modeling and super‐resolution perception for smart meter data, which require further attention in the future, are also discussed.

This article is categorized under: Application Areas > Business and Industry

Privacy preserving classification over differentially private data

Privacy preserving classification over differentially private data

Privacy preserving classification over differentially private data.


Abstract

Privacy preserving data classification is an important research area in data mining field. The goal of a privacy preserving classification algorithm is to protect the sensitive information as much as possible, while providing satisfactory classification accuracy. Differential privacy is a strong privacy guarantee that enables privacy of sensitive data stored in a database by determining the ratio of sensitive information leakage with respect to an ɛ parameter. In this study, our aim is to investigate the classification performance of the state‐of‐the‐art classification algorithms such as C4.5, Naïve Bayes, One Rule, Bayesian Networks, PART, Ripper, K*, IBk, and Random tree for performing privacy preserving classification. To preserve privacy of the data to be classified, we applied input perturbation technique coming from differential privacy, and observed the relationship between the ɛ parameter values and accuracy of the classifiers. To our best knowledge, this article is the first study that analyzes the performances of the well‐known classification algorithms over differentially private data, and discovers which datasets are more suitable for privacy preserving classification when input perturbation is applied to provide data privacy. The classification algorithms are compared by using the differentially private versions of the well‐known datasets from the UCI repository. According to the experimental results, we observed that, as ɛ parameter value increases, better classification accuracies are achieved with lower privacy levels. When the classifiers are compared, Naïve Bayes classifier is the most successful method. The ɛ parameter should be greater than or equal to 2 (i.e., ɛ ≥2) to achieve cloud server is malicious and untrusted, sensitive data will satisfactory classification accuracies.

This article is categorized under: Commercial, Legal, and Ethical Issues > Security and Privacy Technologies > Classification

Predicting the ratings of Amazon products using Big Data

Predicting the ratings of Amazon products using Big Data

Big Data is non‐expensive frameworks that can store a large variety of dataset and process it as parallel and disturbed systems. The paper aims to apply several machine learning models to the massive dataset in the area of e‐commerce from Amazon to analyze and predict “ratings” and to recommend products.


Abstract

This paper aims to apply several machine learning (ML) models to the massive dataset present in the area of e‐commerce from Amazon to analyze and predict ratings and to recommend products. For this purpose, we have used both traditional and Big Data algorithms. As the Amazon product review dataset is large, we present Big Data architecture suitable massive dataset for storing and computation, which is not possible with the traditional architecture. Furthermore, the dataset contains 15 attributes and has about 7 million records. With the dataset, we develop several models in Oracle Big Data and Azure Cloud Computing services to predict the review rating and recommendation for the items at Amazon. We present a comparative conclusion in terms of the accuracy as well as the efficiency with Spark ML—the Big Data architecture, and Azure ML—the traditional architecture.

This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Technologies > Prediction