First Workshop on Data Science for Internet of Things

The First Workshop on Data Science for Internet of Things [] happened on October 10th, 2016 in conjunction with the IEEE International Conference on Mobile Ad Hoc and Sensor Systems (IEEE MASS 2016) []. The workshop was organized by Gabriel Martins Dias (DTIC, Pompeu Fabra University), Pedro Luiz Pizzigatti Corrêa (University of São Paulo), and Boris Bellalta (DTIC, Pompeu Fabra University). Supporting the general chairs, the program committee was composed of 10 researchers from different research centers in Italy, Great Britain, Brazil, China, Spain, and France.
Initially, this workshop was motivated by the growth in the number of Internet of Things (IoT) devices. In numbers, according to Gartner, there will be nearly 6.4 billion IoT devices connected by the end of this year, and 21 billion in 2020 []. The increasing number of connected devices ends up on new use cases that integrate heterogeneous hardware and utilize several criteria to handle and tolerate failures. Thanks to such heterogeneity, IoT applications range from monitoring and automating personal environments (such as homes, cars, and hospitals), industrial buildings and also environmental monitoring.
Besides these differences, IoT has a typical data collection scenario, where data collected by "things" is transmitted to a local storage device with Internet access. Intuitively, this flow fits the Data Science key steps:
  • Data collection;
  • Data management; and
  • Data Analysis.
Therefore, respecting the computational constraints of the IoT devices (a.k.a. "things"), it may be possible to explore some of the most powerful Data Science techniques in IoT environments.
The data collection consists in extracting and transmitting the data to a central point. At this point, sensors and devices may fail, report wrong values, or overuse communication resources available for this task. Thus, data science techniques are often used to detect inconsistencies in the set of collected values, such as missing values, wrong numbers and other information that may be simply incoherent with the current environmental conditions.
The second key point, the data management, is important to make the collected data further "discoverable" by other users and systems, as well as protecting sensitive information from attackers and ill-intentioned users. Privacy issues may also be handled at this step, for instance, all the data collected in an automated house could be analyzed to infer and predict personal information about the inhabitants. Therefore, algorithms and strategies may be adopted to anonymize collected data, before making it accessible to the others.
Finally, the third key point consists of the data analysis. At this point, predictive methods may be applied to extract knowledge from raw data, infer missing values and learn the best system operation using machine learning algorithms. Indeed, this step is often intended to be repeated innumerous times. For example, a set of data collected one day ago may be analyzed using several predictive methods that provide different perspectives on the information retrieved. In other words, it is mandatory to maintain a good data management to provide proper conditions for the reproducibility of the observed environments.
Indeed, Data Science for IoT can go beyond and perform Data Science at different levels. For example:
  1. data management and analysis may be performed in the things; or
  2. in the local storage device; or
  3. in a distributed way, i.e., in the things and the local storage device concurrently.
From the five papers presented at the First Workshop on Data Science for IoT,  the work of Tekeoglu and Tosun addressed security and privacy issues observed in personal environments; other three submissions (Silva et al., Batista et al., and Cardozo et al.) addressed management and visualization in environmental monitoring; finally, Umar Ahsan presented a review of Big Data Analysis in the scope of IoT.

Daniel Silva received the Best Paper Award for his paper "Data Provenance in Environmental Monitoring" The paper describes an architecture to retrieve and store metadata from sensory information, for example, environment information collected by sensors in an agricultural field. The authors claimed in the paper that their approach is effective in collecting and storing provenance metadata and that it also allows querying provenance of data products. The features proposed by the authors are fundamental for tests' reproducibility because they enable discovery and visualization of raw data, processes, and scientists involved in the whole process of data collection.
References from the workshop:
[1] Ahsan U, Bais A. A Review on Big Data Analysis and Internet of Things. 
[2] Silva D, Batista A, Correa PLP. Data Provenance in Environmental Monitoring.
[3] Tekeoglu A, Tosun AS. A Testbed for Security and Privacy Analysis of IoT Devices.
[4] Batista A, Correa PLP, Palanisamy G. Visual Analytics improving Data Understandability in IoT Projects: An Overview of the U.S. DOE ARM Program Data Science Tools.
[5] Cardozo A, Yamin A, Davet P, Souza R, Lopes JL, Geyer C. Sensing And Actuation in IoT: an Autonomous Rule Based Approach.