ADS: Rich Data

The session ADS: Rich Data will be held on thursday, 2019-09-19, from 16:20 to 18:00, at room 0.001. The session chair is Luis Galarraga.

Talks

16:20 - 16:40
LSTM encoder-predictor for short-term train load forecasting (352)
Kevin Pasini (Université Paris-Est; IRT SystemX), Mostepha Khouadjia (Université Paris-Est), Allou Samé (IRT SystemX), Fabrice Ganansia (SNCF- Innovation & Recherche), Latifa Oukhellou (IRT SystemX)

The increase in the amount of data collected in the transport domain can greatly benefit mobility studies and help to create high value-added mobility services for passengers as well as regulation tools for operators. The research detailed in this paper is related to the development of an advanced machine learning approach with the aim of forecasting the passenger load of trains in public transport. Predicting the crowding level on public transport can indeed be useful for enriching the information available to passengers to enable them to better plan their daily trips. Moreover, operators will increasingly need to assess and predict network passenger load to improve train regulation processes and service quality levels. The main issues to address in this forecasting task are the variability in the train load series induced by the train schedule and the influence of several contextual factors, such as calendar information. We propose a neural network LSTM encoder-predictor combined with a contextual representation learning to address this problem. Experiments are conducted on a real dataset provided by the French railway company SNCF and collected over a period of one and a half years. The prediction performance provided by the proposed model are compared to those given by historical models and by traditional machine learning models. The obtained results have demonstrated the potential of the proposed LSTM encoder-predictor to address both one-step-ahead and multi-step forecasting and to outperform other models by maintaining robustness in the quality of the forecasts throughout the time horizon.

17:40 - 18:00
Characterization and Early Detection of Evergreen News Articles (419)
Yiming Liao (Pennsylvania State University), Shugang Wang (The Washington Post), Eui-Hong (Sam) Han (Marriott International), Jongwuk Lee (Sungkyunkwan University), Dongwon Lee (Pennsylvania State University)

Although the majority of news articles are only viewed for days or weeks, there are a small fraction of news articles that are read across years, thus named as evergreen news articles. Because evergreen articles maintain a timeless quality and are consistently of interests to the public, understanding their characteristics better has huge implications for news outlets and platforms yet there are few studies that have explicitly investigated on evergreen articles. Addressing this gap, in this paper, we first propose a flexible parameterized definition of evergreen articles to capture their long-term high traffic patterns. Using a real dataset from the Washington Post, then, we unearth several distinctive characteristics of evergreen articles and build an early prediction model with encouraging results. Although less than 1

16:40 - 17:00
Player Vectors: Characterizing Soccer Players' Playing Style from Match Event Streams (701)
Tom Decroos (KU Leuven), Jesse Davis (KU Leuven)

Transfer fees for soccer players are at an all-time high. To make the most of their budget, soccer clubs need to understand the type of players they have and the type of players that are on the market. Current insights in the playing style of players are mostly based on the opinions of human soccer experts such as trainers and scouts. Unfortunately, their opinions are inherently subjective and thus prone to faults. In this paper, we characterize the playing style of a player in a more rigorous, objective and data-driven manner. We characterize the playing style of a player using a so-called `player vector' that can be interpreted both by human experts and machine learning systems. We demonstrate the validity of our approach by retrieving player identities from anonymized event stream data and present a number of use cases related to scouting and monitoring player development in top European competitions.

17:20 - 17:40
A Semi-Supervised and Online Learning Approach for Non-Intrusive Load Monitoring (844)
Hajer Salem (Institut Mines-Télécom Lille Douai; Manouba University), Moamar Sayed-Mouchaweh (Institut Mines-Télécom Lille Douai)

Non-Intrusive Load Monitoring (NILM) approaches aim at identifying the consumption of a single appliance from the total load provided by smart meters. Several research works based on Hidden Markov Models (HMM) were developed for NILM where training is performed offline. However, these approaches suffer from different issues: First, they fail to generalize to unseen appliances with different configurations or brands than the ones used for training. Second, obtaining data about all active states of each appliance requires long time, which is impractical for residents. Third, offline training requires storage of huge amount of data, yielding to share resident consumption data with external servers and causing privacy issues. Therefore, in this paper, a new approach is proposed in order to tackle these issues. This approach is based on the use of a HMM conditioned on discriminant contextual features (e.g., time of usage, duration of usage). The conditional HMM (CHMM) is trained online using data related to a single appliance consumption extracted from aggregated load in order to adapt its parameters to the appliance specificity's (e.g., brand, configuration, etc.). Experiments are performed using real data from publicly available data sets and comparative evaluation are performed on a publicly available NILM framework.

17:00 - 17:20
Compact Representation of a Multi-dimensional Combustion Manifold Using Deep Neural Networks (863)
Sushrut Bhalla (University of Waterloo), Matthew Yao (University of Waterloo), Jean-Pierre Hickey (University of Waterloo), Mark Crowley (University of Waterloo)

The computational challenges in turbulent combustion simulations stem from the physical complexities and multi-scale nature of the problem which make it intractable to compute scale-resolving simulations. For most engineering applications, the large scale separation between the flame (typically sub-millimeter scale) and the characteristic turbulent flow (typically centimeter or meter scale) allows us to evoke simplifying assumptions -such as done for the flamelet model- to pre-compute all the chemical reactions and map them to a low-order manifold. The resulting manifold is then tabulated and looked-up at run-time. As the physical complexity of combustion simulations increases (including radiation, soot formation, pressure variations etc.) the dimensionality of the resulting manifold grows which impedes an efficient tabulation and look-up. In this paper we present a novel approach to model the multi-dimensional combustion manifold. We approximate the combustion manifold using a neural network function approximator and use it to predict the temperature and composition of the reaction. We present a novel training procedure which is developed to generate a smooth output curve for temperature over the course of a reaction. We then evaluate our work against the current approach of tabulation with linear interpolation in combustion simulations. We also provide an ablation study of our training procedure in the context of over-fitting in our model. The combustion dataset used for the modeling of combustion of H2 and O2 in this work isreleased alongside this paper.

Parallel Sessions