POSTERS

TitlePosterIDIDDayLast NameFirst NameAbstractPDFComment
Disease as collider: a new method to validate environmental factors using genetic risk estimation in cases of complex diseases001DS3-010TuesdayBALAZARDFélixBackground: Genetic risk estimation can quantify some of the predisposition of an individual to a disease. The identification of environmental factors presents more challenges. Collider bias appears between two causes (e.g. gene and environment) when conditioning on a shared consequence (the collider, disease).

Methods: We introduce Disease As Collider (DAC), a new methodology to validate environmental factors using genetic risk in cases. Here we consider disease as a collider between genetic and environmental factors. Under reasonable assumptions, studying the association in cases only between genetic risk and environment provides a signature of an environmental risk factor. Simulation of disease occurrence in a source population allows to estimate the statistical power of DAC as a function of prevalence of the disease, predictive accuracy of genetic risk and sample size. We illustrate DAC in 831 type 1 diabetes (T1D) patients.

Results: The power of DAC increases with sample size, prevalence and accuracy of genetic risk estimation. For a prevalence of 1% and realistic genetic risk estimation, power of 80% is reached for a sample size under 3000. Power was low in our case study as the prevalence of T1D in children is low (0.2%).

Conclusions: DAC could provide a new line of evidence for environmental factors of complex diseases. We discuss the circumstances needed for DAC to participate in the triangulation of environmental causes of disease. We highlight the link with the case-only design for gene environment interaction.
Deep-learning for emotion recognition003DS3-016TuesdayETIENNECarolineThe recent progresses in cognitive science allow new types of human-machine interactions. We can now ask questions to our smartphones or computer, soon we will be able to ask our car to drive us to a destination of our choice etc. Nevertheless, machines can still be improved in their understanding of human emotions. The task is all the more difficult when the nature of the interaction does not permit to get all the channels through which humans express their emotions. The aim of this work is to improve state of the art emotion classification in speech using deep learning algorithms.
Particle Swarm Optimization for algorithmic trading005DS3-041TuesdayBENHAMOUEricAutomated trading systems make decisions on how to invest in financial markets. More precisely, these algorithms decide when to trade (timing), in which direction (long or short), on which market (underlying), with sometimes predetermined level of risk (stop loss level) and rewards (profit target) and in which quantity. These decisions depend on a variety of parameters that must be optimized to maximize returns and overall profits while minimizing risk. In this research, we investigate the use of various optimization algorithms from the simple gradient descent to more heuristic techniques like particle swarm optimization and provide some hints on which method works best according to our experience.
Multimodal Popularity Prediction of Brand-related Social Media Posts007DS3-078TuesdayMAZLOOMMasoudBrand-related user posts on social networks are growing at a staggering rate, where users express their opinions about
brands by sharing multimodal posts. However, while some posts become popular, others are ignored. In this work, we
present an approach for identifying what aspects of posts determine their popularity. We hypothesize that brand-related posts may be popular due to several cues related to factual information, sentiment, vividness and entertainment parameters about the brand. We call the ensemble of cues engagement parameters. In our approach, we propose to use these parameters for predicting brand-related user post popularity. Experiments on a collection of fast food brand-related user posts crawled from Instagram show that: visual and textual features are complementary in predicting the popularity of a post; predicting popularity using our proposed engagement parameters is more accurate than predicting popularity directly from visual and textual features; and our proposed approach makes it possible to understand what drives post popularity in general as well as isolate the brand specific drivers.
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing009DS3-085TuesdayLEMinhError propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this work, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its efficiency. We investigate the portion of errors which are the result of error propagation and confirm that reinforcement learning reduces the occurrence of error propagation.
Randomized Numerical Linear Algebra for Big GLMs011DS3-093TuesdayLANGERobertThis research project digs into the potential of randomized algorithms for tall data analysis. In particular, I focus on the implementation of fast approximations to the statistical leverage scores and the asymptotic properties of resulting estimators. These so-called algorithmic leveraging algorithms can be used to speed up computation time by effectively reducing the dimensionality of the underlying normal equations problem. I illustrate my results in a generalized linear model (GLM) setting where the number of observations (n) is much larger than the number of features (d).

Important questions to answer include the following: How do static fast estimator approximations generalize to iterative estimation procedures such as iterative weighted least squares (IWLS)? What are the statistical properties of the resulting estimator (analysis of variance and robustness)? How does correlation structure affect the design of an optimal sampling scheme? I provide answers and further questions by the means of Monte Carlo experiments and establishing concentration bounds for the resulting estimator.
Structured dropout: a generalization of dropout technique.013DS3-100TuesdayKHALFAOUIBeyremDropout has been proposed as a technique preventing overfitting while training neural networks. We propose a generalization of dropout taking in account prior knowledge about data (available for example in computational biology). We show that this can enhance dropout's performance in some benchmarks and real data.
A Markov Random Field Model for Entity-Relationship Retrieval015DS3-120TuesdaySALEIROPedroThis work is concerned with effective retrieval of entity relationships from large corpora of unstructured texts. We consider entities of any type, i.e., characterized by context terms instead of a predefined category, and retrieve entity tuples based on specified relationships. Recent approaches to ad-hoc entity retrieval have demonstrated that using Markov Random Field (MRF) models to incorporate term dependencies can improve the search performance. That suggests that MRF could be used to model dependencies among entities and facilitate relationship retrieval over unstructured texts. Thus, we create an Entity-Relationship Dependency Model (ERDM) and an index of entity and relationship context vectors that allow us to implement several retrieval methods. Experiments with a large Web collection (ClueWeb-09-B) and 267 relationship queries show that ERDM consistently outperforms other relevant baseline methods, including the language models.
Acquiring Human-Robot Interaction Skills with Transfer Learning Techniques017DS3-129TuesdayMOHAMMEDOmarHuman-robot interaction (HRI) is the study of the relation between humans and robots, and how to enable robots to communicate more effectively with humans. One of the challenges of HRI is to build multimodal behavioral models, involving coordination between input and output modalities such as speech, facial expression, gaze, head movement, hand gesture, etc. Several machine learning models for HRI have been developed over the years, but one key limitation of these models is that they are task-specific, and they perform poorly once the task slightly changes. In order to transfer knowledge learnt in one task to a new one, we introduce the idea of ‘skills’: they are the elementary building units of interactions that represent a wide range of HRI situations, similar to what strokes are for letters, or phonemes for words. In this poster, we show our primary results in extracting those skills for non interactive multimodal tasks using deep neural networks.
Long-term forecasting despite Data Shortages019DS3-134TuesdayZULUAGAMaria AlejandraForecasting plays a critical role within the travel industry. Revenue management, flight price tracking and compensation estimation systems are among the many applications which require accurate forecasting of future behavior by inferring from observations of the past. A common requirement of these systems is to make accurate long-term forecasts based on a stochastic model of the data at hand, which are often limited. Such a scenario is challenging for any prediction algorithm. In this study, we benchmark different methods that range from classic statistical approaches to state-of-the-art Long Short-Term memory (LSTM) networks in the task of long-term price prediction with limited training data. The ultimate goal of this study is to establish the scenario that best suits each method and to determine empirical limits on the training data requirements and the forecast horizon of each method.
Object Video Segmentation using Adversarial Networks and Mathematical Morphology021DS3-147TuesdayFEHRIAminAdversarial training has been shown to produce state of the art results for generative image modeling. In this work we propose an adversarial training approach to train video object segmentation models. We train a fully convolutional segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network, in order to detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net. Mathematical morphology filters are also applied as a post-treatment to further enhance results. Preliminary results on several examples are presented to illustrate the pertinence of this approach.
Computational Deconvolution of Complex Mixtures in Biological Samples023DS3-153TuesdayCZERWINSKAUrszulaSome biological systems are characterized by high complexity. This a case of tumor microenvironment which includes distinct cell types that critically impact tumor development and response to treatment. Genetic information, represented in the number of transcripts, from the microenvironment represents a complex mixture that can be described by linear model: AX = B. Where B is data matrix of one biological sample, X are mixing proportions and A is the matrix of expression of genes in each cell type. Several methods have been proposed to estimate X, such as: least squares regression (Abbas et al., 2009) and more recently, non-negative least squares regression (Qiao et al., 2012), quadratic programming (Gong et al., 2011; Zhong, Wan, Pang, Chow, & Liu, 2013) and supported vector regression (Newman et al., 2015). However, all those methods are quite prone to overfitting and they show potential sensibility to molecular noise. They are also sensitive to establish 'ground truth' signatures of cell types while highly specific signatures may not exist in real. Cell types could be characterized and differentiable by a weighted vector of expression. In our work, we propose to apply an unsupervised method that will decompose mixture into independent sources based uniquely on data structure and without any prior knowledge. We are applying Independent Component Analysis (ICA) (Hyv, Karhunen, & Oja, 2001) in order to solve blind source separation problem. As a result of ICA, deconvolution data matrix X can be approximated: X \approx AS, where X is a matrix of data of size m x n, and A is a m x k matrix, k << m. The rows of the A matrix can be named components (m-dimensional vectors), and the columns of the S matrix projections of data vectors onto the components (a k-dimensional vector for each of n data points) (Zinovyev et al., 2013).

Results
In our strategy, we apply ICA iteratively to separate signals with higher and higher resolutions and therefore get signals for immune cells or ideally immune cell subtypes. Through this application of ICA algorithm on bulk tumor data of brest carcinoma, we isolated meaningful groups of cell types. However, validation framework is under development. In order to address problem of 'ground truth', we are working on developing of upsampling method based on Generative Adversary Networks (unsupervised deep learning) in order to extend existing datasets with preservation of the correlation structure though simulation of two dimensional non-gaussian distribution. In our up-sampling design, we are based on single-cell data that represent one source that we will then mix to approach real existing data of bulk tumor that will serve as testing and validation framework for ICA-based deconvolution.

Perspectives
In case of success, the project will provide important insights into the complex organization of the immune component of TME, which can be directly used in diagnosis, and treatment of cancer, especially in cancer immunotherapy. At the methodological level, novel methods for signal deconvolution will be developed and implemented that can be applied transversally in other domains with similar problems. Also, the obtained interaction network would lead to a more detailed deterministic mathematical model of cell-cell communication between immune-related cells in the TME thus identifying novel drug targets.
Multi-target learning: toward a general learning framework025DS3-160TuesdayMOURASimonWe propose a general formal framework for machine learning problems involving multiple interdependent and heterogeneous tasks and explain how and why it is relevant in numerous application. We also provide a new public dataset which fit the multi-target learning framework and baselines for this dataset. We believe that this real dataset application will contribute to research in the domain of multitask learning.
Automatic Dynamic Correlation Template Tracking of Inner Lips based on CLNF027DS3-168TuesdayLIULiIn this work, a novel automatic approach to extract the inner lips contour of speakers without using artifices is proposed. This method is based on a recent facial contour extraction model developed in computer vision, called Constrained Local Neural Field (CLNF), which provides 8 characteristic points (landmarks) defining the inner lips contour. However, directly applied to our visual data including Cued Speech (CS) data, CLNF failed in about 50% of cases. We propose a Modified CLNF to estimate inner lips contour based on original CLNF landmarks. A dynamic template using the first derivative of smoothed luminance variation is explored in this new model. This method gives precise estimation of aperture for inner lips. It is evaluated on 4800 images of three French speakers. The proposed method corrects 95% CLNF errors and total RMSE of one pixel (i.e., 0.05cm in average) is reached, instead of four pixels using original CLNF.
Exploration-Exploitation in MDPs with Options029DS3-177TuesdayFRUITRonanThe option framework [Sutton et al., 1999] is a simple yet powerful model to introduce temporally-extended actions and hierarchies in reinforcement learning [Sutton and Barto, 1998]. An important feature of this framework is that Markov decision process (MDP) planning and learning algorithms can be easily extended to accommodate options, thus obtaining algorithms such as option value iteration and Q-learning [Sutton et al., 1999], LSTD [Sorg and Singh, 2010], and actor-critic [Bacon and Precup, 2015]. While options may significantly improve the performance w.r.t. learning with primitive actions, a theoretical understanding of their actual impact on the learning performance is still fairly limited. Notable exceptions are the sample complexity analysis of approximate value iteration with options [Mann and Mannor, 2014] and the PAC-MDP analysis by Brunskill and Li [2014]. In this work, we derive the first regret analysis of learning with options. Relying on the fact that using options in an MDP induces a semi-Markov decision process (SMDP), we first introduce a variant of the UCRL algorithm [Jaksch et al., 2010] for SMDPs and we upper bound its regret.

While this result is of independent interest for learning in SMDPs, its most interesting aspect is that it can be translated into a regret bound for learning with options in MDPs and it provides a first understanding on the conditions sufficient for a set of options to reduce the regret w.r.t. learning with primitive actions.
Does Convolutional Network need to be Deep for Text Understanding ?031DS3-191TuesdayLEThien HoaConvolutional Network now becomes ubiquitous on many Image Classification tasks because it can retrieve the state-of-the-art performance when it goes very deeply. The same effect has been observed in Speech Recognition but is it always the case for Text Classification ? There are a lot of results against this suspect. In this presentation, we will provide the first empirical demonstration to support this fact. The direct consequence will result in subsequent study of the deep network structure for text and its application in many NLP tasks.
Unsupervised Outlier Detection in High-Dimensional Data Streams033DS3-203TuesdayFOUCHÉEdouardOutlier detection has the goal to reveal unusual patterns in data. Typical scenarios in predictive maintenance are the identification of failures, sensor malfunctions or intrusions. This is a challenging task, especially when the data is high-dimensional, because outliers become “hidden” and are visible only in particular subspaces. Also, Predictive maintenance data is often available as a stream. By nature, data streams are infinite; they are evolving over time and can be aggregated at multiple time scales. Furthermore, in real-time applications, assumptions about the aspect of future and unknown anomalies are unrealistic, so the problem should be considered unsupervised. Most existing methods for outlier detection are supervised and only apply either to static or to low-dimensional data, so this problem remains largely unaddressed. In this poster, we introduce a novel anytime algorithm for unsupervised outlier detection in high-dimensional data streams. (This is a work in progress)
Learner profiling and behavioral prediction beyond the MOOC platform035DS3-213TuesdayCHENGuanliangLarge-scale learning analytics is commonly based on data traces learners generate within a Massive Open Online Course (MOOC) platform such as edX during the running of a MOOC. As MOOCs typically last between five and ten weeks and many learners are rather passive consumers of the offered learning activities, this exclusive use of MOOC platform data traces severely limits the insights we can gain about our learners. This lack of data leads to coarse-grained learner profiles which in turn limit our ability to provide adaptive and personalized online learning experiences. The social Web (where platforms such as Twitter and LinkedIn have hundreds of millions of users) potentially offers a rich source of data to supplement the MOOC platform data traces, as many learners are likely to be active on one or more social Web platforms. This poster aims to demonstrate the benefits of profiling learners by looking beyond the MOOC platform, including 1) gathering more user attributes (e.g., demographics) that are relevant to learning to construct a more accurate and complete learner model and 2) predicting both in-course and after-course behavior with high accuracy.
Evolutionary Algorithms Used to Estimated Subject Specific Change Points on a Longitudinal Data Set037DS3-222TuesdayGARCIA CRUZEhidy KarimeThe Change Point problem arises in many applied situations. The Change Point problem has been studied by several authors. It goes from the change point problem in piecewise regression through classical techniques to Change Point estimation in linear mixed models by using a dynamic programming algorithm. The objective of this proposal is estimating each subject specific change point by using Evolutionary algorithms when we consider the data come from a longitudinal setting and using linear mixed models as a solution to this problem. The results will be showed based on a simulation study, varying some specific conditions on the parameters associated to the LMM and amount of subjects that can be taken into account into the study. Additionally, we illustrate the first solution with a real problem about dried Cypress wood slats in which this methodology is useful to predict the time of dried associated to a specific slat thickness. This is done as a generalization on the calibration problem. In this case, once the change points have been gotten through EA, a calibration curve can be fitted to these change points according with their own thickness. It will allow us to predict the specific change point.

Keywords. Change Point; Evolutionary Algorithms; Linear Mixed Models; Calibration Function; Paralleling Programming
Temporal Decision Trees039DS3-440TuesdaySHALAEVAVeraMy work falls within the domain of machine learning and aims at designing Decision Tree algorithms adapted to handle large temporal dataset. On the one hand, time series are observed in a growing number of domains. On the other hand, Decision Trees are an interesting approach providing a decision model with high level of interpretability for users. My goal is to improve the Temporal Decision Tree in term of computational complexity, performance and interpretability.
Finding key biological features for cancer diagnosis from histopathology slides041DS3-249TuesdayNAYLORPeterCancer diagnosis involves complex interpretation of a multitude of heterogeneous data, such as genomic, transcriptomic and image data. The image data used in this context corresponds to thin slices of the tumor and of the surrounding tissue, stained with agents in order to highlight specific structures, such as cell nuclei or collagen. A medical practitioner will routinely check the patients histopathology image data in order to decide the next step in the patient's treatment. Histopathology slides can thus be very informative of the cancer subtype and/or of how the patient's immune system is reacting to the cancer. We wish to discover appropriate tools to quantify the huge amount of data found in histopathology slides. On the long run, such a quantification scheme would fit in a work pipeline that would investigate the most informative physiological features and the link to genomic and transcriptomic features. Our strategy to identify the important features is to first segment the important elements in histopathology slides (such as cells, tumor and stromal tissue, necrotic regions, etc.), second to define physiologically interpretable features for each of these elements and third to build a prediction model in order to assess the importance of each of these features. Uncovering information from histopathology slides is a difficult task as one uncompressed slide can easily be over 65 GB (200000 x 100000 pixels). Identifying the important features from such a huge amount of data, is a difficult endeavour, which requires the use of prior knowledge brought in by pathologists. Supervised learning is certainly the most powerful strategy for image segmentation for this type of data. In order to segment the important structures in these images we propose a method based on fully convolutional network architectures for image segmentation. Ultimately, this image segmentation will allow us to define biology driven features for predicting clinical variables, such as outcome, subtype or response to treatment. It will also allow us to investigate the link between genomic and transcriptomic features of the tumors and this set of spatially resolved features from image data, that we hope will be complementary.
The payments network of Italian firms043DS3-273TuesdayLETIZIAElisaWe empirically study a large proprietary dataset of payments between Italian firms from a network perspective in order to understand how firms interact with each other. Standard network metrics, such as degree and strength distribution, and components decomposition, highlight non trivial interactions between firms. Finally, communities detection techniques are employed in order to investigate correlations between network-based clustering and an idiosyncratic measure of riskiness for firms.
Density estimation and nonlinear equalization for optical communications using neural networks045DS3-280TuesdayRIOS MÜLLERRafaelThere is an increased interest in compensating nonlinear distortions in optical communications systems. Typically, Volterra nonlinear equalizers are used in optical communications, however those equalizers have limited capacity on responses they can compensate. We investigate nonlinear equalization using neural networks as an alternative to Volterra equalizer. Finally, we investigate maximum a posteriori decoding under nonlinear channels with memory where the channel probability transition function is learned using a neural network.
Learning fuzzy spatial relationships for image semantic analysis with justification047DS3-291TuesdayPIERRARDRégisThe goal is to develop a machine learning algorithm that is able to justify the results it provides using fuzzy spatial relationships.
Model-based multivariate discretization for logistic regression049DS3-294TuesdayEHRHARDTAdrienCredit institutions are interested in the refunding probability of a loan given the applicant’s characteristics in order to assess the worthiness of the credit. For regulatory and interpretability reasons, the logistic regression is still widely used to learn this probability from the data. Although logistic regression handles naturally both quantitative and qualitative data, two pre-processing steps are usually performed: first, continuous features are discretized by assigning factor levels to pre-determined intervals; second, qualitative features, if they take numerous values, are regrouped into variables taking fewer factor levels. In this communication focus will be given on the discretization of continuous variables which is performed for two main reasons: first, it produces a “scorecard” with a direct correspondence from intervals to score “points”; second, it allows do deal with non linearity of the score with respect to the continuous variables. There already exists many discretization algorithms (see the review from Ramírez‐Gallego et al. (2016)). To the best of our knowledge, the few multivariate supervised algorithms are unsatisfactory in our setup mainly because they are not fully automated, their optimized criterion does not produce suitable discretized features for logistic regression and their approach are empirical. By reinterpreting discretized features as latent variables, we are able, through the use of a Stochastic Expectation-Maximization (SEM) algorithm and a Gibbs sampler, to overcome those shortcomings and to find the best discretization scheme w.r.t. the logistic regression loss. The good performances of this approach are illustrated on simulated and real data from Crédit Agricole Consumer Finance.
Integrating structural constraints in multi-locus genome-wide association studies for improved biomarker discovery in breast cancer051DS3-309TuesdayCLIMENTEHéctorGenome-wide association studies (GWAS) are widely used for detecting genetic variants correlated with an observed trait. GWAS compare two sets of patients (usually diseased and healthy) in a two-step experiment: first, the genetic variants of each of the participants are obtained by sequencing; followed by a statistical association analysis of the variants. GWAS target settings where the paradigm common variants-common disease applies (the presence of a variant has a probabilistic and mild impact on the trait). While these studies have provided insights into the pathways underpinning many common diseases, including cancer, the analysis of such very high-dimensional, weakly associated data poses both computational and statistical difficulties. One way of increasing statistical power is using a priori biological knowledge: it is likely that if two variants are associated with a disease, they share a biological context. In particular, we are developing a methodology to efficiently integrate biological networks (gene annotation and physical interactions between proteins) in GWAS. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. While we are applying our models to different settings of simulated data, we plan on applying the methods to a high-quality breast cancer dataset and, potentially, uncover some genes that increase the likelihood of developing the disease.
Causal Consistency of Structural Equation Models053DS3-323TuesdayRUBENSTEINPaulComplex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs.
Wasserstein Dictionary Learning055DS3-339TuesdaySCHMITZMorganOptimal Transport theory enables the definition of a distance across the set of measures on any given space. This Wasserstein distance naturally accounts for geometric warping between measures (including, but not exclusive to, images). We introduce a new, Optimal Transport-based representation learning method in close analogy with the usual Dictionary Learning problem. This approach typically relies on a matrix dot-product between the learned dictionary and the codes making up the new representation. The relationship between atoms and data is thus ultimately linear. We instead use automatic differentiation to derive gradients of the Wasserstein barycenter operator, and we learn a set of atoms and barycentric weights from the data in an unsupervised fashion. Since our data is reconstructed as Wasserstein barycenters of our learned atoms, we can make full use of the attractive properties of the Optimal Transport geometry. In particular, our representation allows for non-linear relationships between atoms and data.
Fast Inference-free Algorithms for CRF Learning057DS3-344TuesdayHUShellWe consider the problem of learning loss-augmented conditional random fields (CRF), which subsume both max-margin and maximum likelihood regimes of parameter estimation of CRF models. Due to the intractable Shannon entropy, we propose a smooth alternative based on Gini entropy and oriented tree-reweighted Bethe approximation [Globerson et al. 2007]. The relaxed learning problem is then formulated in a dual augmented Lagrangian framework and optimized by a proximal block-coordinate method of multipliers algorithm. In a special case where Lagrangian multipliers are fixed to zeros, the algorithm is exactly the same as the stochastic dual coordinate ascent [Shalev-Shwartz and Zhang, 2016]. We further show theoretical advantages of our algorithm by studying its convergence rate. Our experiments show that empirically the proposed algorithm outperforms its counterpart--block coordinate Frank-Wolfe algorithm--in the maximum likelihood regime.
Distributed machine learning infrastructure for medical image segmentation059DS3-352TuesdayHOTOIULucianDistributed machine learning infrastructure for medical image segmentation Lucian Hotoiu *1, Elliot Brion 2, Rudi Labarbe 1 1 Ion Beam Applications SA, Belgium. 2 Universite Catholique de Louvain, Belgium. *Contact : Lucian.Hotoiu@iba-group.com

Introduction: During radiation therapy of cancer patients it may often arrive that the initial prescription has to be adapted as the treatment advances, as a consequence of anatomical changes occurring in the body. In such situations, to accelerate the clinical workflow, machine learning techniques (deep learning) can be employed to facilitate a fast, in-room adaptation of the treatment plan. It is believed that deep learning algorithms can be used to perform a reliably robust, automatic, CT image segmentation that can serve as valuable input to adapt and re-optimize the treatment planning.

Description: A major factor in the success of machine learning algorithms is the quantity and quality of the data available to train them. The data must be sufficiently representative to the problem and must contain satisfactory annotations to guide the training [1]. In the field of medical image segmentation, in this regard things are not any different and, furthermore, they are made increasingly complicated by legal patient confidentiality and hospital specific policies. To achieve acceptable performance in segmenting CT images with supervised deep learning techniques, about 5000 labeled examples (patients) are needed for each pathology [1]. Given the sensitivity of medical records, achieving the amount of necessary dataset is no insignificant thing therefore a system that would allow access to multiple sources is of paramount importance. The infrastructure would facilitate deploying and training a deep-learning neural network on available medical CT images, inside multiple treatment centres, in a completely anonymous fashion. To maximize the amount of said available dataset, the owning institutions could be accessed through a distributed data network, following the hybrid model of Skripcak and al. [2]. The CT images and the annotations remain stored in the host institution which greatly simplifies the legal problem of sharing patient data across sites. The deep-learning neural network algorithms will be distributively deployed from a central repository shared by the participants. The central repository delegates the training of the algorithm to computation hardware located on each site. The network is trained locally and only the learned lessons (optimized algorithm parameters) are gathered and sent back to the central repository. In this manner the machine learning model can “blindly” and reciprocally evolve, practicing on data from all other hospitals involved in the distribution.

Conclusion: Despite the ambitious distributed infrastructure model, the proposed system provides a number of important advantages: • Keeps the data locally, retrieving only intelligence, it therefore alleviates the legal problem of patient data protection. • It is a win-win situation for participating institutions which will gain access to a larger data-pool in exchange of reciprocity. • It distributes the computation load in between the participating institutions. This avoids the need of a massive “massive” central cluster.

Bibliography
[1] Goodfellow, et and al., Deep Learning, MIT Press, 2016.
[2] Skripcak, et and al., "Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymized public datasets," Radiotherapy and Oncology, vol. 113, pp. 303-309, 2014.
joint poster with Rudi LABARBE (DS3-361)
Characterising industrial sites' flexibility with reservoir models061DS3-376TuesdayCUVELIERThibautElectro-intensive industrial sites are very dependent on electricity prices to remain competitive. Nevertheless, they can often tune their processes in order to decrease their electricity consumption during the most critical periods, for example by using decision support systems based on mathematical modelling of their processes. Our goal is to estimate the flexibility potential of a complete site, not to tune each process very precisely.

To this end, we propose a generic paradigm to help conceiving such models: reservoirs are the basic building block, which allows for great expressiveness while being close to the physics. More specifically, we do not need very precise models for our purposes, but ones that can be efficiently included in optimisation models.

Our first results show that the obtained reservoir models can give sufficiently good approximations for metallurgical processes (more precisely, electric-arc and ladle furnaces).
A study of Statistical and Predictive Analysis of US pain Medications063DS3-386TuesdayKAUSHIKShrutiA huge amount of data is available and researchers can exploit this data to discover patterns from the patients’ health records. Big volume of data helps to reveal patterns and associations which can lead to high quality healthcare at reduced cost to all. Using data mining and predictive analytics together; can help us predict trends from the historical data. Predictive solutions and the expert knowledge can make an enormous impact in diagnosing numerous diseases. In our research, we perform frequent data mining analysis to find patterns among patient journeys of patients consuming a particular pain medication manufactured by a U.S. based pharmaceutical company. We looked at the impact of demographic variables on patients’ consumption behavior of pain medication and analyzed their expenditure on the pain medication. This analysis is followed by the prediction based analyses where we predict the behavior of patients using machine learning algorithms (such as Naïve Bayes, Decision tree, Logistic regression and Support vector machines). The main implication of this research is in helping healthcare providers and pharmaceutical companies create targeted treatment measures.
A representer theorem for deep kernel learning065DS3-402TuesdayBOHNBastianWe provide a representer theorem for a chain of linear combinations of kernel functions of reproducing kernel Hilbert spaces and use this result to establish a theoretical foundation for machine learning algorithms based on the concatenation of functions from these spaces. Furthermore, we sketch out how our findings apply to existing deep kernel learning approaches.
Delay prediction for a Machine Learning/Operations research cooperative model067DS3-418TuesdayMILLIET DE FAVERGESMarieThe purpose of this work is to create a cooperative model of data science and operations research to increase the robustness of railway timetabling with real data analysis. We could use predictive models on punctuality to create schedules that would take into account what could happen on operational level.

The study concerns the Montparnasse station in Paris. It is a large station and a bottleneck with a lot of different lines crossing (High Speed lines and regional lines), so the scheduling is difficult to respect: the network is very dense and the slightest delay can spread to the following trains. Identifying in advance delays would avoid their propagation in the station area. However, delays are rare events which are hard to predict with usual machine learning algorithms. Also, their causes are often multiple and complex: the delay may be primary (external cause) or secondary (caused by another delayed train).
Online learning for large structured Gaussian CRF regression models069DS3-430TuesdayJOVANOVICMilosStructured learning is important when we want to model data that has non-independent samples, but has some internal correlation structure, which is typical for spatial/temporal applications, or other structures that could be represented as graphs. Gaussian CRF models are regression version of well known CRF models, but for very large structures learning them require expensive iterations over the whole large graph. We are exploring an online learning algorithm based on a pseudo-likelihood approximation, which learns the parameters of the model on parts of the graph, and might converge without passing through the whole graph. This online structured learning is analogous to stochastic gradient methods in unstructured learning, is previously explored for classification CRFs, and is new for the regression using GCRF models. The approach is empirically evaluated on synthetic data and a real-world dataset.
Online unsupervised Deep Learning of visual features with STDP071DS3-436TuesdayTHIELEJohannesWe present a deep spiking convolutional neural network of integrate-and-fire neurons which performs unsupervised online learning on a stream of images of handwritten digits using spike-timing dependent plasticity (STDP). In recent work it was shown how STDP can be used in a deep convolutional network architecture of IF neurons to extract hierarchical features from natural images. In contrast to previous work, where every layer was trained successively, we show how all layers of the network can be trained simultaneously, which allows approximate online classification already very early during the learning process. Due to the spike-based nature of learning and inference, our architecture uses only a comparably small number of local computations. We show that it is possible to train the network without providing any information about the structure of the input data, such as the number of classes and the duration of image presentation. These properties could make our implementation suitable for energy-efficient, unsupervised learning on a continuously growing, unlabeled database or on continuous video streams.
Resilience of Transmission and Distribution Smart Grids in the South East Region of England “Preliminary mining in real world distribution network historical data”073DS3-441TuesdayHUYGHUES-BEAUFONDNathalieTransmission and distribution networks in the South East region of England face growing power flows volatility. Short-term and real-time network operations and planning operations are becoming even more challenging as wind and PV solar generation continue to develop in the area. UK Power Networks (UKPN), the regional DNO, has initiated the Kent Active System Management (KASM) project to support network operators and planners in managing growing volatility of power flows. KASM integrates a new contingency analysis tool as well as load, wind and PV solar forecasters to ease short term and real-time network operation procedures. Artificial neural network (ANN) and support vector machine (SVM) are used to generate short-term load, wind and PV solar generation predictions. Since artificial intelligence is introduced in the distribution network BAU operations, there is a requirement to capture the risks and benefits associated with making operational decisions based on these technologies. To initiate the assessment and modelling of the potential risks, we will used historical data to analyse load, wind and solar generation profile’s and investigate dependencies between the random variables in the South East region of England.
Bistability, non-ergodicity, and inhibition in pairwise maximum-entropy models075DS3-449TuesdayROSTAMIVahidEncoding of information in the brain is carried out by a large network of neurons. The number of possible interactions among these neurons increases exponentially with the size of the network and makes the statistical modeling of the full joint activity of neurons computationally impossible. Pairwise maximum-entropy models [1-3] which take into account only the single and pairwise time-averaged statistics of neurons as constraints, have therefore become a popular, simple statistical model for modeling such a highly complex system.

Here we ask if the statistical features of the pairwise maximum-entropy model are biologically realistic for large populations of neurons in the cortex. Recent progress in electro-physiology enables us to address this question by recording the parallel spiking activity of large populations of neurons.

We apply the pairwise maximum-entropy model to the spiking activity of a population of 159 neurons recorded from motor cortex of macaque monkey using a Utah array of 96 electrodes (see Riehle et al. [4] for experimental setup). We show that the statistical model predicts a bimodal distribution for the population-averaged activity, and for some population sizes the second mode peaks at high activities, with 90% of the population active within time-windows of few milliseconds. This bimodality has several undesirable consequences: 1. The presence of two modes is unrealistic in view of observed neuronal activity and on neurobiological grounds. 2. Boltzmann learning [5-6], commonly used to solve the inverse problem of finding the parameters of the pairwise model, becomes non-ergodic, hence the pairwise model found by this method is not the maximum entropy distribution; similarly, solving the inverse problem by common variants of mean-field approximations has the same problem. 3. The Glauber dynamics [7] associated with the model is either unrealistically bistable, or does not reflect the distribution of the pairwise model.
As a way to eliminate the bimodality and its ensuing problems, a modified pairwise model is presented, which - most important - has an associated Glauber dynamics. This model avoids bimodality thanks to a minimal asymmetric inhibition. It can be interpreted as a maximum-entropy model with an additional constraint, or as a minimum-relative-entropy model with a particular prior representing our prior information about asymmetric inhibition in real neural networks [8].

References
[1] E. T. Jaynes (1957): Information theory and statistical mechanics. Phys. Rev. 106, 620.
[2] D. S. Sivia (2006): Data Analysis: A Bayesian Tutorial. Oxford University Press.
[3] E. Schneidman, M. J. Berry II, R. Segev, W. Bialek (2006): Weak pairwise correlations imply strongly correlated network states in a neural population. Nat. 440, 1007.
[4] A. Riehle, S. Wirtssohn, S. Grün, T. Brochier (2013) Front Neural Circuits 7:48.
[5] T. Broderick, M. Dudik, G. Tkačik, R. E. Schapire, and W. Bialek (2007): Faster solutions of the inverse pairwise Ising problem. ArXiv:0712.2437.
[6] Y. Roudi, J. Tyrcha, J. Hertz (2009b): Ising model for neural data: Model quality and approximate methods for extracting functional connectivity. Phys. Rev. E 79, 051915.
[7] R. J. Glauber (1963): Time-dependent statistics of the Ising model. J. Math. Phys., 4(2):294–307, 1963.
[8] Vahid Rostami, PierGianLuca Porta Mana and Moritz Helias (2016), Pairwise maximum-entropy models and their Glauber dynamics: bimodality, bistability, non-ergodicity problems, and their elimination via inhibition, arXiv:1605.04740 [q-bio.NC] (under review)
Temporal sleep stage classification from multivariate times series: a end to end deep learning approach077DS3-451TuesdayCHAMBONStanislasSleep stage classification or sleep scoring is of considerable importance in the diagnosis of sleep disorders since it constitutes the preliminary step to any further medical exam. Based on a polysomnography i.e. the record over a night of electroencephalograms (EEG), electro-occulograms (EOG), electrocardiograms (ECG) and electromyograms (EMG) principally, a medical expert assigns to each 30s of signal a sleep stage. Automatic approaches have driven much attention to provide at least an auxiliary help to human scorers. In this work, we introduce the first end to end learning approach that performs temporal sleep stage classification from PSG signals. We build a general architecture which can extract information from both EEG, EOG channels and EMG modalities and pools this information into a learnt softmax classifier. Furthermore, the architecture is light enough to be distributed through time and to grasp the temporal context of the problem. Experiments done on about 60 PSG records, with up to 20 EEG channels, reveal that classification performance measured with balanced accuracy improves as a function of the spatial dimension. Our model which is unique in his ability to make the best from multiple modalities is compared to alternative automatic approaches and delivers state-of-the-art classification performances. On top of that, it reveals the spatial temporal distribution of discriminant neural signatures and offers insights on sleep stage mechanisms.
Predicting the Home Care need for Citizens in Copenhagen079DS3-478TuesdayHANSENCasperThe city of Copenhagen in Denmark collects a rich and continuously increasing repository of data relating to its citizens, which currently remains largely untapped. Analyzing this data can help the city streamline its services, improving overall social welfare. However, despite its considerable potential, this data is not trivial to process, because of its very large scale, non-stationarity, and general lack of structure.

This poster will present ongoing work related to predicting the service level of home care a citizen needs. We have access to unique data identifying each individual service a citizen receive, a digital free-format-text journal on each citizen kept by the visiting personnel, and hospital records. The data consists of more than 40,000 citizens and is obtained starting from April 2013 up to now. The ongoing work is oriented around the utilization of evolving techniques for handling drift in the data using ensemble methods. There is a lot of existing related work on investigating ensemble architectures with different updating strategies, where we focus on structural updating, i.e. how to dynamically add and remove new learners. Early results are promising and we are working actively with the city for utilizing the results in practice.
Feature Selection for Learning Performance Models of Electrical Stimulation for Spinal Cord Injury081DS3-480TuesdayFELDMANEllenEpidural spinal cord stimulation (SCS), in which implanted arrays of electrodes deliver electrical signals to spinal cord neurons, is a promising therapy for spinal cord injury (SCI). This approach enables human paraplegic patients to stand and regain partial control of leg movements, while making gains in lost autonomic function. Several parameters of the stimulation may be modified, including the choice of active electrodes, their polarities (positive, negative, or neutral), and the amplitude, frequency, and pulse width of the pulse trains applied to the active electrodes; these not only must be optimized for every patient individually, but may also vary with time. This work links computational models of epidural SCS to experimental data obtained by testing paraplegic patients’ standing performance under a range of stimulation parameters. Each set of parameters is simulated via finite element analysis to estimate the electrical activity in the spinal cord and surrounding tissues near the implant. Several types of features are then extracted from the simulation results over a range of voxel sizes. Using regression and feature selection techniques such as random forests and elastic nets, we identify the most informative electric field features (i.e. correlated with good patient motor responses) and the most important spinal cord regions to stimulate. In addition, we find that the most informative stimulating features agree with results from nerve fiber theory. Finally, we employ Gaussian process regression together with the simulation results to predict the performance of stimuli that were not tested in the patients. This procedure is applied toward suggesting additional stimulation patterns that have a sizeable probability of yielding high performance in the patients.

Further applications of our work include developing algorithms to optimize stimulation configurations for SCI patients, determining optimal electrode placement, and considering novel electrode array designs. Addressing these problems may require estimating the optimal electric field for a patient; thus, we are investigating generative models to capture the joint probabilistic distribution of the features and patient responses. Stimuli could then be optimized to achieve the electrical field closest to the estimated optimum.
Data analytics for multi-carrier energy systems083DS3-483TuesdayKAFFASHMahtab
Sensor models for an electronic nose085DS3-487TuesdayMAHOPierreA non-selective chemical sensor can interact with a large quantity of molecules. An electronic nose is a bio-inspired device which is composed of several non-selective chemical sensors. Alone, a sensor won’t be informative but together they can produce a unique signature, a bar-code, of an odorant volatile molecule. In practice, some measurements are taken beforehand, in order to create a training set, after which machine learning algorithms are used to efficiently recognize odours.

Aryballe Technologies is a French start-up which develops a new generation of electronic noses. Their device is based on a grid of several dozens chemical sensors whose interactions with odorant molecules are measured using the principle of surface plasmon resonance imaging. This technology visualises odours as images, providing a promising new way to process this kind of data.

In real-life conditions, the detection of volatile molecules is quite a hard task, due to odorant molecules mixtures (several molecules present at the same time and in various concentrations) and environmental turbulences. Thus, the success of an electronic nose is greatly dependent on the development of efficient machine learning algorithms that increase robustness and selectivity. This presentation will introduce these new olfactive data and show our initial work on the development of several sensor models, which seek to increase reproducibility and enable feature selection.
Pathological artificial neurons - learning simple ridge functions can suffer from the curse of dimensionality.087DS3-490TuesdayMAYERSebastianThe building blocks of artificial neural networks and projection pursuit algorithms are ridge functions, which are functions that vary only along one direction in space (given by the weight vector). Though it is apparently possible to learn huge networks of such ridge functions in many practical situations, this poster presents circumstances where it is even intractable to learn only one single ridge function.
Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations089DS3-503TuesdayISCENAhmetQuery expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.
Polite Agent and Impolite Opponents: Natural Language Generation for Chatterbots through Sentiment-based Training by using Twitter data091DS3-506TuesdayKHATUAAparupTech giants such as Microsoft or Facebook as well as data scientists are exploring various semi-supervised learning methods to build conversational agents - commonly known as Chatterbots. However, it is worth noting that in reality outcomes of these efforts fail to match the expectation – especially when the opponent/human is using an impolite tone. This work attempts to address this shortcoming. The main contribution of this work will be developing a model of natural language generation through sentiment-based training. So, in our model, if the opponent/human says something in an impolite manner (say angry, or complaining tone) on a particular topic, then our chatterbot agent will provide a different emotion (say optimistic tone) in response to nullify the amplifications of impoliteness in the conversation. We are considering microblogging platform, such as Twitter, which generates an enormous amount of user-generated contents to train our model because a conversation between two social media users, through tweets and retweets, can be an effective training dataset for our research question. More importantly, conversations on Twitter platform displays all possible ranges of emotions such as anger, sadness, or happiness. Initially, we are developing our model in the political domain by using our Twitter datasets ranging from 2014 Indian election, 2015 Singapore election, 2015 UK election, 2016 Brexit referendum to the ongoing 2017 French election for training purpose. Cumulatively we have around 15 million tweets from these events. We will develop our model using generative architecture for closed domain short conversation.
Moving Least Squares Support Vector Machines for weather temperature prediction093DS3-526TuesdayKAREVANZahraLocal learning methods have been investigated by many researchers. While global learning methods consider the same weight for all training points in model fitting, local learning methods assume that the training samples in the test point region are more influential. In this study, we propose Moving Least Squares Support Vector Machines (M-LSSVM) in which each training sample is involved in the model fitting depending on the similarity between its feature vector and the one of the test point. The experimental results on an application of weather forecasting indicate that the proposed method can improve the prediction performance.
Sequence Modelling For Analysing Student Interaction with Educational Systems095DS3-532TuesdayHANSENChristianThe analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This poster will present initial work which is accepted for the international conference on educational data mining 2017 (EDM), in addition with results on more complex datasets than presented here.
The poster presents an unsupervised clustering method for log data in online systems, which is useable for initial investigation of user behaviors. User behaviors are modelled as a distribution over Markov chains leading to easily interpretable models by humans. The method is applied on extensive log data from the company Edulab, who is the largest provider of online math education in Denmark.
An Approach for Machine Learning-Based Contouring of Daily CBCT with Planning CT as Prior097DS3-534TuesdayBRIONEliottTo illustrate its use in medical imaging, the goal of this poster is to show how deep learning can automatically contour healthy organs and tumors in CT scans. As 3 in 10 Europeans will develop cancer before their 75th birthday, we must improve treatment. Proton therapy is a promising treatment since it kills cancerous cells with high accuracy, leaving the neighboring healthy organs undamaged. However, its wider adoption is still hampered by two challenges: the uncertainty of the proton’s energy deposition along its path (or density changes) and the uncertainty in target’s position (the geometrical changes). The poster focuses on this second challenge; for which fast, robust and autonomous (i.e. with minimal external user intervention) contouring is critical. The good news is that a new set of algorithms called deep learning now allow to do that. Roughly speaking, deep learning works by learning representations of already contoured images with multiple levels of abstraction. We will show how it has been successfully applied in recent research and why the access to labeled data is so crucial.
Information transfer for learning in non-stationary environments099DS3-547TuesdayMURENAPierre-AlexandreTraditional machine learning setting consists in learning a concept from a learning data set and applying the learned concept on a test data set supposed to be independent from the learning data set but equally distributed. In practice, this hypothesis does not always hold and some non-stationary environments introduce changes in the distributions (concept drift). Two classes of problems belong to this non-stationary category: transfer learning and incremental learning. In both of them, the acquired knowledge has to be transferred and slightly modified to fit new environments. We present a framework for learning in non-stationary environment based on the notion of algorithmic complexity introducing the idea of minimal transfer of information.
Certificate Achievement Unlocked: How does MOOC learners' behaviour change?101DS3-554TuesdayZHAOYueMassive Open Online Courses (MOOCs) play an ever more central role in open education. However, in contrast to traditional classroom settings, many aspects of learner / user behavior in MOOCs are not well researched. In this work, we focus on modelling learner behavior in the context of continuous assessments with completion certificates, the most common assessment setup in MOOCs today. Here, learners can obtain a completion certificate once they obtain a required minimal score (typically somewhere between 50-70%) in continuous tests distributed through the duration of a MOOC. In this setting, the course material or tests provided after "passing" do not contribute to earning the certificate, thus potentially affecting learner's behavior. Therefore, we explore how ``passing'' impacts MOOC learners: do learners alter their behaviour after this point? And if so how? While in traditional classroom-based learning the role of assessment and its influence on learning behaviour has been well-established, we are the first to provide answers to these questions in the context of MOOCs, providing valuable insights which can be used to design better courses in the future. As a result, we present a set of core behaviour patterns based on our extensive exploratory analysis of the log traces of more than 4,000 certificate-earning learners across four edX MOOCs.
Sentiment Maps of Art Cities using Weakly Labeled Social Media Egocentric Streams103DS3-560TuesdayVARINIPatriziaIt is introduced an approach to draw dominant sentiment maps of main sites of interest in Cultural or Art Italian Cities, analyzing egocentric streams extracted from Social Media repositories, jointly with texts extracted from audios by automated recognition speech systems. For a specific art city, we first extract from YouTube repository videos captured in that location, using expanded queries on their metadata, and filter only egocentric or hand-held camera captured streams. To classify sentiment patterns from streams and their audios, spatio - temporal features and semantic features are extracted, respectively from video and subtitles from YouTube ASR, and combined in a joint embedding feature space. To extract video features, the activations from last dense layer of a 3D CNN trained on motion and frame visual assessment were exploited, while semantic features were obtained using well known word2vec approach, on a collected dataset of 42 videos with supervised annotations.
Examining Substitution Models on Phylogenies in mtDNA105DS3-574TuesdayLEVINSTEIN HALLAKKerenMitochondrial DNA (mtDNA) is a small fragment of the DNA in eukaryotic cells, located in the mitochondria. It is widely used in many fields such as genetic genealogy, medical genetics and even forensic science. One of its key features is that it is inherited solely from the mother and therefore does not go through recombination. Subsequently, the accumulation of mutations along maternal lineages is the cause for mtDNA sequence variation. This variation can be used for the reconstruction of a phylogenetic tree based on parsimony and maximum likelihood methods. A recent study had constructed an updated comprehensive phylogeny of global human mtDNA variations, based on coding and control region mutations. Even though this highly reliable phylogenetic tree is available, the substitution mechanism in mtDNA is not yet fully understood. We propose to use this comprehensive phylogeny to research different substitution models of mtDNA and answer some open questions on common assumptions. The improvement in the amount of available data increases the power of performed statistical tests allowing to test more complicated models than these previously suggested, but also requires novel approaches (both statistical and computational) for testing statistical hypotheses on large scale data.
Distributed Probabilistic Forecasting for New Energy System Operation107DS3-644TuesdayLE CADREHélèneThis presentation is focused on the role of information which is essential in new energy systems, where a balance has to be constantly found between maintaining privacy and increasing the global system efficiency. We start by introducing the methodological framework which relies on Prediction Interval based Extreme Learning Machines coupled with automatic feature selection based on minimal Redundancy Maximal Relevance, a criterion derived from Mutual Information. We provide a data fusion approach which combines probabilistic forecasters while meeting the Prediction Interval confidence level. Performance of the method is evaluated analytically and illustrated on three case studies: a) distributed solar PV power production forecasting at the regional scale, b) day-ahead market price forecasting, and c) a peer-to-peer model for solar PV power energy trading between microgrids.
The application of Machine Learning for prediction of missile aerodynamic coefficients109DS3-054TuesdayBUDIDETIJyotsnaCurrently complex mathematical models using the principles of Finite Element Methods are most commonly used in Computational Fluid Dynamics to estimate aerodynamic properties for a given model. And it is well known that these estimates have a margin of error largely due to the ideal assumptions considered and negation of few practical features that have an impact on the actual performance of a missile. Hence, we propose an alternative approach to calculating the aerodynamic coefficients using Machine learning methods on data generated from wind tunnel tests, geometrical data, and historical data. Here we propose a Neural Networks approach. The aim to to generate reasonably accurate results at a much less time compared to existing CFD techniques.
Representations, Regularization and Visualization in text data111DS3-204TuesdaySKIANISKonstantinosHarnessing the full potential in text data has always been a key task for the Data Science community. The properties hidden under the inherent high dimensionality of text are of major importance in numerous tasks such as text categorization, question answering or conversational agents. In this poster we are going to present how we can extract rich text representations that can a) be visualized and show interesting properties, b) used as better features for machine learning tasks and last c) used as good structures for group regularization.
Aircraft dynamics identification115DS3-407TuesdayROMMELCédricIt is well-known that one of the main goals of civil aviation operators is to reduce aircraft fuel consumption as much as possible. One option for doing so is to optimize flight trajectories with respect to the aircraft performance. Our work focuses on the problem of minimizing fuel consumption during climb trajectories of civil aircraft, which can be mathematically modeled as an optimal control problem. Such problem involves the aircraft dynamical behavior, which motivates the search for accurate dynamical systems identification techniques, the main topic of this work. According to the literature, the most widely used approaches for aircraft parameters estimation are the Output-Error Method and Filter-Error Method, based on the main ideas of measurement error minimization and state dynamics re-estimation (see for example R. V. Jategaonkar 2006 and R. E. Maine and K. W. Iliff 1986). Recent advances include using neural networks for the state estimation part (see N. K. Peyada and A. K. Ghosh 2009). On the other hand, renewed interest for the older Equation-Error Method has also been observed (see E. A. Morelli 2006). We propose in this work a variation of the later. Adopting a statistical point of view, we state a regression formulation of our problem and solve it using a Maximum Likelihood based technique. We illustrate our method with numerical results based on real flight data.
ConvSCCS: a convolutional self-controlled case series model for lagged adverse event detection in large databases117DS3-190TuesdayMORELMaryanWith the increased availability of large electronic health records databases comes the chance of enhancing health risks screening. Machine learning could lead to major improvements in postmarketing adverse drug effect detection, as the current process rely on physicians' spontaneous reports. However, the complexity of this task requires new statistical models. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features on a rare longitudinal outcome. Our model is based on a conditional Poisson model known as self-controlled case series (SCCS). SCCS models are computationally efficient in rare events settings and robust to non-longitudinal confounders. While the original SCCS model requires to specify a priori risk periods, we propose to learn it with flexible regularized step functions. Its simple formulation allow us to use fast stochastic proximal algorithms to learn the parameters efficiently. Simulations show that we outperform competing models in terms of mean squared error. We applied the new method to a large dataset of diabetic patients from the famous French national health insurance system SNIIRAM database, and show that we are able to detect a well known drug adverse effect.
Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging119DS3-051TuesdayALAVERDYANZaruhiComputer aided diagnosis (CAD) systems are designed to assist clinicians in various tasks, including highlighting abnormal regions in a medical image. A common approach consists in training a voxel-level binary classifier on a set of feature vectors extracted from normal and pathological areas in patients' scans. However, many pathologies (such as epilepsy) are characterized by lesions that may be located anywhere in the brain, have various shapes, sizes and texture. An adequate representation of such a heterogeneity requires a significant amount of annotated data which is a major issue in the medical domain. Therefore, we built on a previously proposed approach that considers epilepsy lesion detection task as a voxel-level outlier detection problem. It consists in building a oc-SVM classifier for each voxel in the brain volume using a small number of clinically-guided features.

Our goal now is to make a step forward by replacing the handcrafted features with automatically learnt representations using neural networks. We propose a novel version of siamese networks trained on patches extracted from healthy patients' scans only. This network, composed of stacked autoencoders as subnetworks, is regularized by the reconstruction error of the patches. It is designed to learn representations that bring patches centered at the same voxel localization 'closer' with respect to the chosen metric (i.e. cosine). Finally, the middle layer representations of the subnetworks are fed to oc-SVM classifiers at voxel-level. The method is validated on 3 patients' MRI scans with confirmed epilepsy lesions and shows a promising performance.
On the Troll-Trust Model for Edge Sign Prediction in Social Networks002DS3-014WednesdayLE FALHERGéraudIn the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.
The development of mirror neurons representing facial expression: a computer modelling study004DS3-035WednesdaySALARISIleniaPrevious research has shown that humans automatically and spontaneously show facial response patterns that are congruent to viewed emotional facial expressions. Moreover, similar neural substrates seem to be recruited and co-active in the production as well as the observation of emotional facial expressions. It has been suggested that the computational mechanisms that may underlie these vicarious emotional activations could be based on the development of mirror-like neurons that emerge through associative learning mechanisms. In this work we model the development of mirror neurons that encode facial expressions in the infant brain during interaction with either of its parents. We show how temporally correlated imitation of facial expressions in early social interactions could drive the development of mirror neurons in the infant using Hebbian learning. Here we present an overarching, self-organised neural network model that incorporates a visual module composed of a hierarchical model of successive neuronal layers and a motor module that represents the current facial expression of the infant. The simulations show that after training, the output neurons in our network learn to respond selectively to a preferred facial expression (e.g. Happy or Sad) regardless of whether the infant generates the expression or the infant sees the parent displaying that expression. More importantly, we explore the development of such neuronal responses across varying degrees of correlation and temporal lags between the seen and produced facial expressions.
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing006DS3-074WednesdaySOZZIAlessandraNowadays the Web represents a medium through which corporations can effectively disseminate and demonstrate their efforts to incorporate sustainability practices into their business processes. This led to the idea of using the Web as a source of data to measure how UK companies are progressing towards meeting the new sustainability requirements recently stipulated by the United Nations. The project involves the development of a web scraping program able to collect sustainability-related web pages from websites of a sample of the 100 largest UK private companies (ranked by their latest sales) and the use of Latent Dirichlet Allocation (LDA) to identify common topics from the data collected.
A deep learning forecast for spot electricity markets008DS3-083WednesdayLAGO GARCIAJesusIn recent years, renewable energy sources have gained a large share of the world’s energy production. While they largely contribute to build a more sustainable world, they also pose a great challenge to the grid stability. In particular, as massive storage of electric energy is economically unfeasible, electricity price is adjusted according to the real-time demand and supply; then, since the production from renewable sources depends on weather conditions and is generally quite uncertain, electricity prices become unpredictable, the energy market more volatile, and the grid more unstable.

A possible way to prevent this and to safeguard the profitability of renewable sources is to implement smart bids in the spot energy market. In particular, by forecasting energy prices in advance, market players trade energy to maximize profit, i.e. buyers purchase when prices are low (low demand) and sellers sell when prices are high (high demand), resulting in turn in a self-balanced market.
Bayesian Computation for Semi-continuous Longitudinal Outcomes with Non-ignorable Missing Data010DS3-092WednesdayJIANGDepengMany missing data in behavioral, medical, social, and psychological research are nonignorable in the sense that the missing data depend on the observed data and the missing data themselves. This study proposes a Bayesian computation methods for handling nonignorable missing data (m-part) with semi-continuous outcomes in a longitudinal study. In the Bayesian approach, we employ the useful strategy that combines the idea of data augmentation and application of MCMC methods. The proposed Bayesian SEM approach was applied to a longitudinal study of workers with work-relevant musculoskeletal disorders, to show how the new approaches can overcome the problems of current available statistical methods and help to identify the distinct trajectories of worker productivity loss and the associated prognostic factors. Nonignorable missing data models have been developed on the basis of both likelihood method and Bayesian approaches. The computational advantages of Bayesian over likelihood method will be discussed.
Convolutional Neural Networks for Galaxy parameters decomposition012DS3-095WednesdayTUCCILLODiegoThe characterization of the structure of galaxies as inferred from their photometrical brightness profiles is a powerful tool in astronomy. Having the parameters decomposition of large data-samples of galaxies with different cosmic ages, allow a pletora of studies on galaxy evolution and relationship between different components. The era of the big data in astronomy is marked by the numerous current and future large area surveys like EUCLID, the Large Synoptic Survey Telescope (LSST), the Wide Field Infrared Survey Telescope (WFIRST). These surveys will decuple in a few years the volume of data that can be exploited for galaxy morphology studies, offering a unique opportunity to constrain models and infer properties of galaxies. The fully potencial of these surveys can be unlocked only with the development of automated, fast and reliable softwares to describe the galaxy structures.

We present a Convolutional Neural Network that we developed for profile fitting of one and two component galaxies. Our code is able to retrive a complete set of galaxy parameters like: radius, magnitude, Sercic index, position angle, ellipticity, B/T of Bulge and Disk of the galaxy. Comparison with other profile fitting code demostrate that our machine is faster and reliable. Making it ideal for large dataset-studies.
Inference in Multi-Layer Graphical Models014DS3-109WednesdayMANOELAndreWe propose a message-passing algorithm for inference in multi-layer graphical models, a recurrent task in so-called Bayesian deep learning. We analyze this algorithm in the Bayes-optimal setting and show it is able to attain optimal performance for some choices of parameters, while for others it remains stuck in metastable minima of the free energy it seeks to minimize. A theoretical analysis relying on tools from statistical physics allows us to compare the algorithm performance to the optimal achievable one after a given number of samples has been provided.
Community Detection On Large Graphs016DS3-124WednesdayMIASNIKOFPierreA community of individuals can be studied through their interactions, which we can model through shared features. We built a model in which individuals are represented by a unique identifying categorical variable and their shared features represent predictor variables. For example, a community between a bank's customers can be established through their first and last names, addresses, telephone numbers, email addresses, etc. For clarity and compactness, these interactions are modeled with a bi-partite graph, where the first set of vertices represents individual customers and the second represents their features. An edge joins an individual to a feature, if he/she possesses that feature.

Our ultimate goal is to identify the most efficient and scalable way to cluster individuals into communities, using their shared features as predictors. In our evaluation, numerical implementation is critical, as we are motivated by applications where individuals are numbered in the millions.

We begin by transforming our bipartite graph into a non-bipartite weighted graph, to exploit graph connectivity, allow the use of graph-based techniques and reduce dimensionality. We posit that connection frequencies through a given feature are a distinguishing factor that contributes to establish the overall strength of a link between two individuals. In our banking example, a connection through a name, through which only a few customers are connected, establishes a much stronger connection than a connection through place of work. We assign a higher weight to features that connect just a few individuals than features that connect many. Weight of a feature is defined as the complement of the empirical probability that two individuals are connected through that given feature. To compute an individual-to-individual weight, we sum the weights of all features connecting two individuals.

We then attempt to find communities via one deterministic and one stochastic technique and compare their resulting performances. While many different community detection approaches are presented in the literature, we compare two on the basis of clustering performance and scalability. Because we are dealing with unsupervised learning, defining the quality of the community clustering provided by each technique is pivotal. The quality of community clustering is measured via a simple t-test on (weighted) mean inter and intra community connections. We also compare the number of resulting communities to the number suggested by the eigengap heuristic. Finally, we assess scalability to graph size, by repeating our comparisons on subsets of varying size of our data. In the process, we also developed an approximation to the eigengap, to make its computation tractable and speed up computations.

While our study has just recently begun, we are collaborating with a large bank and anticipate applicable results shortly.
Facial Information Decomposition through Deep Learning. Application: Joint Spontaneous and Dynamic Facial Expression and Identity Recognition.018DS3-130WednesdayAL CHANTIDawoodFaces convey a wealth of social signals. Although it is a single object, it conveys many socially important characteristics such as Identity, Age, Sex, Expression, Lip-speech. Problems involving sets of mutually related information called multi-modal signals. Building a multi-modal model that is jointly capable of revealing information that is otherwise hidden when considering the different modalities independently can be exploited and intuitively used. In order to efficiently represent and decompose multi-modal data, we advocate the use of deep learning approach based on convolutional neural network, autoencoder, sparse representation and long short term memory to decompose and extract salient information jointly which is dedicated to the following application: spontaneous and dynamic facial expression recognition, Identity recognition via extracting the neutral part from expressive images, age and sex recognition. Moreover, the model has to exploit the spatiotemporal information. Our approach will be compared with the classical computer vision techniques which are based on extracting hand-crafted features through spatiotemporal descriptors, for instance, 3DSIFT, 3DHOG and GIST, and the use of the bag of visual word approach for video representation.
Embedded Bandits for Large-Scale Black-Box Optimization020DS3-145WednesdayAL-DUJAILIAbdullahRandom embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This work proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional---equal to the effective dimension of the problem---search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low.
Better Boosting with Bandits022DS3-151WednesdayNIKOLAOUNikolaosProbability estimates generated by boosting ensembles are poorly calibrated. The very reason that makes AdaBoost a successful classifier, namely its margin maximization property, is also responsible for its poor performance as a probability estimator, as it forces the ensemble to produce probability estimates that tend towards 0 or 1. Therefore, the outputs of the ensemble need to be properly calibrated before they can be used as probability estimates. In batch learning calibration is achieved by reserving part of the training data for training the calibrator function. In an online setting, a decision needs to be made on on each round: shall the new example be used to update the parameters of the ensemble or those of the calibrator. In this work we resolve this decision with the aid of bandit optimization algorithms. We demonstrate superior performance to uncalibrated, naively-calibrated and cost-sensitive on-line boosting ensembles in probability estimation and cost-sensitive classification tasks.
Random Subspace with Trees for Feature Selection Under Memory Constraints024DS3-062WednesdaySUTERAAntonioDealing with datasets of very high dimension is a major challenge in machine learning. In our work, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.
Irreversible Markov chain Monte Carlo methods for Bayesian inference026DS3-167WednesdayMICHELManonBayesian inference of complex statistical models offer clear assets, such as the possibilities to separate the modelling assumptions from the inference process and to take into account uncertainty. Markov Chain Monte Carlo (MCMC) methods are powerful techniques for implementing the computation of Bayesian estimates. However, these simulation schemes are often challenged by the multimodality and the high-dimensionality nature of the target distribution, resulting in a poor exploration of the latter. Alternatives, as Hamiltonian Monte Carlo, provide more efficient framework but are impeded by the implementation of a still reversible scheme and by the tuning of several parameters.

Building on insightful irreversible Monte Carlo schemes developed in Physics, we propose an original irreversible Markov Chain Monte Carlo (MCMC): the Forward Event-Chain Monte Carlo. This method is rejection-free, parameter-tuning-free and provides a continuum of valid samples. Moreover, numerical experiments demonstrate the efficiency of the proposed approach where accelerations up to several magnitudes compared to state-of-the-art methods are exhibited.
jointly presented with Stephane SENECAL (DS3-169)
Learning causal networks with latent variables from multivariate information in genomic data030DS3-189WednesdaySELLANadirLearning causal networks from large-scale genomic data remains difficult in absence of time series or systematic perturbation data. We have developed and implemented an information theoretic method, that circumvents the robustness and complexity issues of existing methods, in articular, in the presence of latent variables. Starting from a complete graph, it iteratively removes dispensable edges by partitioning their mutual information into significant contributions from indirect paths and orient the remaining edges based on the signature of causality in observational data. This information theoretic approach outperforms earlier methods on a broad range of benchmark networks with or without latent variables and is applied to reconstruct different causal networks from gene expression data in single cells, genomic alterations in tumors or co-evolving residues in protein structures. The methodology have been implemented in a web server and an R package.
Causal structure learning in stochastic chemical reaction networks from single-cell data032DS3-196WednesdayKLIMOVSKAIAAnnaApoptosis is a form of programmed cell death, which plays an important role in development of multicellular organisms. Inability of cancer cells to commit apoptosis is one of the hallmarks of cancer. Improving our understanding of the molecular underpinnings of apoptosis and its pathological aberrations therefore plays an important role in the design of targeted treatment strategies. For example, fractional killing of cancer cells exposed to TNF-related apoptosis-inducing ligand (TRAIL) is such an aberration and was observed in several studies, but the mechanism regulating this effect is not yet understood (Spencer et al. 2009).  Various modeling approaches have been proposed to investigate this phenomenon (Albeck et al. 2008; Hasenauer et al. 2011). However, apoptosis is a process with many components and non-trivial dynamics, which makes its modeling a particularly challenging task; in particular, to date no model has provided a satisfactory explanation of fractional killing (Bertaux et al. 2014).

Studying responses that affect only a part of an apparently homogeneous population - as in the case of fractional killing - requires monitoring cellular properties and characteristics at the single cell level. Recently developed deep high throughput technologies for single cell measurements, such as mass cytometry, allow the monitoring of 30+ proteins at the single cell level for up to millions of individual cells in a single experiment, and thereby enables us to learn detailed descriptions, i.e., models of this process. However, mass cytometry is a destructive technique, which therefore doesn’t provide time series information, but rather time dynamics acquired by disjoint snapshots.

A variety of formal approaches have been proposed to understand the dynamic molecular interactions shaping biochemical processes. While probabilistic graphical models are frequently used to represent the statistical dependencies between molecular components (Friedman et al. 2000), these models don’t contain information about causal dependencies in the data. Chemical reaction models do not suffer from this limitation. The chemical events of signaling cascades such as TRAIL-induced apoptosis can be modeled by ordinary or stochastic differential equations such as the Chemical Master Equation (see details below). They constitute a highly detailed and mechanistic description for biological processes and are qualitatively different from other popular network models in biology such as probabilistic graphical models. However, most approaches for mechanistic modeling assume a known topology of the reaction network, a situation that does not apply to TRAIL-induced apoptosis. State of the art methods for learning the topology of mechanistic chemical reaction networks are limited to small systems (Sunnåker et al. 2014) with less than a dozen components, rendering TIA with its 50+ relevant molecular components inaccessible. We recently reported the reactionet lasso, a regression-based gradient matching approach that is capable of partial structure learning for systems of this size (Klimovskaia et al. 2016). We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the systems of different size and complexity. For our study we assumed that all or most of the relevant molecular components can be measured. However, this approach cannot be applied to the situations when large proportion the system (latent species) cannot be observed in practice. Therefore it is not applicable to elucidate the mechanisms of fractional killing from the experimental data available to us.

Causal models and their inference techniques offer an attractive tradeoff between insights into causal relationships and the flexibility to address complications introduced by latent species. Ordinary or stochastic differential equations which formalize chemical dynamics can encode the parametric and structural form of causal interactions between the components. In many applications, e.g. fractional killing, knowing the mechanistic structure is not essential and the causal structure would be enough (Mooij et al. 2013; Rubenstein et al. 2016). Therefore, we want to investigate how various causal inference techniques (Peters et al. 2015; Rothenhäusler et al. 2015; Peters et al. 2012) might be used for learning causal interactions from snapshot data generated by dynamical systems.
Machine Learning in Astronomy034DS3-210WednesdayASCASOBegoñaMany of the problems faced in Astronomy are not too different to those found in the world of Data Science. Often, Astronomers need to classify galaxies into different morphological types, predict the distance of a galaxy or its composition based on indirect data, detect emergent structures, etc. I will present a summary of some of the Machine Learning and Bayesian Statistics techniques developed to exploit the Astronomical data and solve these problems.
The use of machine learning for the processing of myocardial perfusion MRI data036DS3-214WednesdaySCANNELLCianMyocardial perfusion MRI has been shown to possess huge potential for the diagnosis of coronary artery disease. Quantitative analysis of the data is desirable to reduce the time needed for a diagnosis and to make the process more accurate, reproducible and user-independent. The pre-processing of the data for quantification has long proven to be a bottleneck in the clinical adoption of the method. This work introduces an algorithm for motion correction using robust PCA and manifold learning. Automated anatomy detection is then explored using classification techniques such as an augmented k-means clustering and support vector machines.
Predictive models for chronic care management038DS3-229WednesdayAMADOU BOUBACARHabiboulayeChronic diseases are the leading causes of the diminished quality of life, the rising of hospital costs, and mortality. The recent advances of Machine Learning have enabled substantial progress with attractive results in many domains. We propose to describe a general framework for implementing predictive models using various healthcare data including socio-demographics, tele-monitoring records (vital signs, symptoms self-assessment), hospital data (medical events, lab tests, ...). Despite all the scientific challenges related to the missing of data, the rare events problem, our predictive approach shows promising results to upstream identify high-risk patients, to early detect the deteriorations and to prevent costly hospitalizations. As future work, we are considering scientific research efforts dealing with the lack of clarity about causal factors impacting chronic diseases.
Random Forest for Regression of a Censored Variable040DS3-248WednesdayLE FAOUYohannIn the insurance broker market, commissions received by brokers are closely related to the surrender of the insurance contracts. In order to optimize a commercial process, a scoring of prospects should then take into account this surrender component. We propose a weighted Random Forest model to predict the surrender factor which is part of the scoring. Our model handles censoring of the observations, a classical issue when working on surrender mechanisms. Through careful studies of real and simulated data, we compare our approach with other standard methods which apply in our setting. We show that our approach is very competitive in terms quadratic error to address the given problem.
Simulating reading with dyslexia for personalized intervention042DS3-255WednesdayWOLFHenrySimulating reading with dyslexia for personalized intervention Computational models of human language processing have been important tools in developing hypotheses about the cognitive processes that underlie language use. Models of reading have been used to hypothesize about the nature of language processing in the brain. Simulations based upon these models have shown some success in mimicking the cognitive effects displayed by human participants in behavioral studies (e.g. Plaut, McClelland, Seidenberg, & Patterson, 1996; Harm & Seidenberg, 2004). Due to the use of hand-coded representations, these models were largely limited to modeling one language, generally English. However, learning to read and the related effects differ between languages (Hino, Kusunose, Lupker, & Jared, 2013). In this project, we use convolutional neural networks to generate representations from images of text in multiple writing systems. Hyperparameters are varied to mimic both typical reading and reading with dyslexia. Targeted language interventions can be tested on models of readers with dyslexia before they are used with human children. Additionally, these simulations provide insights into how the brain is influenced by the writing systems themselves.
Learning Macromanagement in StarCraft from Replays and Self-Play using Deep Learning044DS3-279WednesdayJUSTESENNielsThe real-time strategy game StarCraft has proven to be a challenging environment for artificial intelligence techniques, and as a result, current state-of-the-art solutions consist of numerous hand-crafted modules. This poster show how macromanagement decisions in StarCraft can be learned directly from game replays using deep learning. Neural networks have been trained on 789,571 state-action pairs extracted from 2,005 replays of highly skilled players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting the next build action. By integrating the trained network into UAlbertaBot, an open source StarCraft bot, the system can significantly outperform the game's built-in Terran bot, and play competitively against UAlbertaBot with a fixed rush strategy. To our knowledge, this is the first time macromanagement tasks are learned directly from replays in StarCraft. While the best hand-crafted strategies are still the state-of-the-art, the deep network approach is able to express a wide range of different strategies and thus improving the network's performance further with deep reinforcement learning is an immediately promising avenue for future research.
Development of an Electric Grids' Generator using Machine Learning046DS3-282WednesdayDELOROYonatanThe goal of my internship at EDF R&D is to develop a prototype able to generate a wide range of realistic electricity networks (topology, characteristics of cables and loads) either satisfying desired input parameters (size, type of loads, ...) or matching structural properties of given networks samples.

The poster will :
- raise the challenges involved in the generation of information-rich and highly-constrained low-voltage networks,
- feature some state-of-the-art methods to generate synthetic graph topologies given sample data and/or to predict their nodes attributes.
The Mutual Autoencoder: Controlling Information in Latent Code Representations048DS3-293WednesdayBUI THI MAIPhuongVariational Autoencoders (VAEs) learn probabilistic latent variable models by optimizing a bound on the marginal likelihood of the observed data. Beyond providing a good density model a VAE model assigns to each data instance a latent code. In many applications, this latent code provides a useful high-level summary of the observation. However, the usefulness of the code is not enforced by the VAE objective. Instead, it emerges as a side effect and depends on modelling choices such as decoder expressivity, latent dimension, etc. However, the VAE may fail to learn a useful representation when the decoder family is very expressive. Such decoders effectively make the latent structure unnecessary for achieving high log-likelihood values, and so the VAE learns to ignore it. We propose a method for explicitly controlling the amount of information stored in the latent code. We show that our method can learn models with latent codes ranging from independent to nearly deterministic, and is robust to the choice of a decoder and latent dimension.
Shape Prior Generation using GAN for 3D-US Kidney Segmentation​050DS3-295WednesdayBERTRANDHadrienUsing the shape knowledge of an object is a common tool in image segmentation to constrain and guide the segmentation process. It is particularly present in medical imaging as the shape of organs is simple and well-understood. We propose here the construction of a shape prior for kidneys using a Generative Adversarial Networks. The task of segmentation is split into two parts: first a network that transforms the image into the latent representation expected by the generator network, then the generator taking this representation and constructing the segmentation. Both steps are done separately. We show here preliminary results for the generator.
Semi-supervised learning method for inertial-centric indoor localisation based on Activities of Daily Living and Relative Signal Strength052DS3-312WednesdayKOZLOWSKIMichalThis poster outlines the method in which ADL and RSS signatures from sparsely labelled data help infer the position of a person inside a residential house. The context of the activity is taken into account when estimating the gait and pose, whilst the RSS information helps to localise the individual. Fusing the two together helps paint a picture of the current location and activity at any given time. All of this is done by optimisation methods given a poorly labelled data set. Poster initially concentrates on the data collection campaign, before embarking on the outline of the methods used and the respective results obtained.
Tackling Partial Observability in Demand Response Using Reinforcement Learning With Long Short-Term Memory054DS3-328WednesdayRUELENSFrederikA demand response agent must find a near-optimal sequence of decisions based on limited and imperfect sensor measurements of its environment. Extracting a relevant set of features from these raw sensor measurements is a challenging task and may require substantial domain knowledge. One way to tackle this problem is to store sequences of sensor measurements in the state vector, making it high dimensional, and apply techniques from deep learning. This work investigates how a Long Short-Term Memory (LSTM) network, a type of recurrent neural network, can be used to mitigate the curse of partial observability, and thus capturing the long-term temporal dependencies in the state vector. Our simulations demonstrate that an LSTM network can be successfully used as a function approximator within a batch reinforcement learning algorithm to find near-optimal control policy.
Dynamic Analysis of Investor’s Community Sentiment: A Hawkes-Process Framework056DS3-341WednesdayLE NYYoannWe represent the membership of agents in financial-community as a self-exciting hawkes-process. Contrary to other studies that consider the financial community on social media as a static entity, our dynamic approach reduces the size of the community at each time-period. We then extract the sentiment from this community. We show that this approach helps reducing the noise of the sentiment signal extracted and enhances its predictive power over financial markets movements. JEL Classifications: G55; G14 Keywords: Sentiment, Hawkes-Process, Temporal-Graph, Twitter
Social-Network Analysis for Pain Medications: Infuential physicians may not be high-volume prescribers058DS3-347WednesdayCHOUDHURYAbhinavAccording to the Institute of Medicine of the National Academies, more than 100 million Americans suffer from chronic pain related to diabetes, heart disease, and cancer combined. Adoption of pain medications and safe healthcare practices is a major global policy concern. This adoption process is highly influenced by the interpersonal network of physicians prescribing medications to treat pain. However, existing research into physician networks have been hospital-specific, applied to a smaller number of physicians, and dependent upon physicians’ self-reports. In this work, using big-data and data-mining, we overcome these limitations: By using a case of 30+ hospitals spanning across 2000+ physicians, we create a social network containing physicians’ prescription data and adoption behavior of pain medications. The social network assumes that connected physicians work in the same hospital and belong to the same specialty or specialty group. Then, using the centrality measures, degree and eigenvector centrality,we analyze prescription volumes and proportion of adopters of pain medications. We also analyze gender effects. Results revealed that the most influential physicians were not the high-volume prescribers. Males physicians were more influential compared to female physicians; however, females prescribed more volume compared to males. Our results help us identify critical physicians from certain core specialties and specialty groups who may be approached by patients seeking pain relief.
Neural networks for computing power flow in high voltage transmission lines060DS3-353WednesdayDONNOTBenjaminPower flow computations (also called “load-flow”) are widely used by TSO (Transmission System Operators) in charge of managing high voltage and very high voltage power grids. One the critical constraints of TSOs is to maintain the security of power grid materials. For instance lines must not overheat. Current tools to simulate the effect of incidents (such as a tree falling on a line) on the power grid include load-flow simulators, which evaluate the steady-state of the grid for a given productions and consumptions. The state of the grid estimated by load flow calculations include voltage magnitudes, reactive power values, current flowing on lines, etc. Currently for a grid of the size of France 10 000 load-flows are computed every 5 minutes, which is very already computationally demanding. This number will increase by several orders of magnitude in the years to come to accommodate new power planning tools to increase network capacity without building new lines and accommodate renewable energies. We propose a new method, based on machine learning to efficiently compute load-flows, substituting conventional simulators based on differential equation solvers. Our system comprises deep feed-forward neural networks trained with load-flows precomputed by simulation. Our architecture permits to solve the so-called (n-1) problem (in which load flows are evaluated for every possible line disconnection) using a technique bearing similarity with “dropout”, which we named “guided dropout”. We achieve a 300x speedup (using state-of-the-art GPUs) over the proprietary load-flow simulator of RTE (Réseau de Transport d’Électricité – the unique French TSO) in our preliminary simulations carried out on power grid simulations with up to 120 substations. This is achieved with a relative average absolute error of less than 0.01. The speedup gain increases with the size the grid and we are recently working on scaling up our computational tools to handle the full French power grid.
Non parametric multi-task visual attribute ranking062DS3-378WednesdayALAMI MEJJATIYoussef‘’I want to see jackets which I think are stylish, but not too fancy’’
Two very common ways to explore large collections of imagery items, for instance, in online shopping, are to browse a hierarchy of items and to search with textual keywords. The returned results are typically ordered by popularity. However, popularity is defined across all users as one homogeneous attribute. Users cannot sort by their own subjective criteria, e.g. by their own personal style for clothes. Furthermore, there is no way to place items on a continuous scale, where the criteria amount for each item is known, e.g., how ‘stylish’ a particular piece of clothing is to a user. Simply put, there is no easy way for users to explore imagery by their own subjective scales.

Our project aims to develop new techniques which enable users to organize and explore imagery data based in their own subjective criteria at a high semantic level. The crux of the problem is to understand how a user could communicate their own criteria without having to know how that criteria might be formed or described at the data level. We aim to form this knowledge into a new machine learning algorithm and criteria-definition interface, which will help users personally organize data in order to explore it easily.

The success of this application resides, among others, in the user experience offered. In other words, the application won’t be ‘attractive’ if a user has to spend hours, providing some preference labels in order for the algorithm to adapt its parameters. The key is hence, for the algorithm to be able to adapt with a fairly small amount of labelled data from the user. This is one of the reasons why we feel that multi-task learning (MTL) is well suited for this problem. Indeed MTL takes advantage of the task relatedness to perform well even with small amounts of labelled data for each task. The other reason is that MTL works well when the tasks share the same feature space, or live in a shared subspace. This is the case in our setting where each user preference is perceived as a task. In this case it is easy to see how MTL could benefit from task relatedness e.g., users may share the same notion of ‘stylishness’ across the tasks.

This MTL formulation is however problematic since existing methods exploits the dependency between different tasks by enforcing the `similarities’ among the `parameters’ of corresponding predictors. This is not applicable in our situation since we don’t have access to the parametric form of those predictors (the users in our case), we only know their predictions. One of our main motivation is hence to enable exploiting the dependencies between the tasks by formulating a non-parametric MTL instantiated through (unlabeled) data. Our formulation could be seen as an instance of active semi supervised learning, knowing that we use labelled preferences given by users but in the other hand we only use predictors with unknown parameters to tune a new personalized predictor.

The impact of this project is potentially very large: it could change the way in which people organize imagery. Knowledge gained through this project will not only improve fundamental understanding in machine learning and human-computer interaction (HCI), but it will also cultivate future research in their combination: Human-data interaction.
Compositional and multi-relational embeddings064DS3-395WednesdayLACROIXTimothéeA relational database is a set of facts (subject, relation, object) about the world. Representing every entities of such a dataset in a low-dimensional vector space would yield entities embeddings which would be a convenient store of "common knowledge" about the world. To evaluate such an embedding model, a common task is to try to predict missing links in the original training set. For example, given the triples (Dave, brother, Simon) and (Simon, father, Sarah), one could infer the triple (Dave, uncle, Sarah). Recently, simple models have yielded good results for the task of link prediction in subsets of knowledge bases. A theoretical understanding of the representational power of these models, their limits and inner workings is still lacking. Lots of work has been done on structural properties of the model used to represent these knowledge bases. We show that a properly regularized canonical low-rank decomposition gets state of the art results.
Scalable Model-based Cascaded Imputation of Missing Data066DS3-413WednesdayMONTIELJacobMissing data is a common trait of real-world data that can negatively impact interpretability. We present CIM, an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the input data, providing support for: MAR and MCAR mechanisms, numerical and nominal data, and large data sets including highly dimensional data sets. We compare CIM against well-established imputation techniques over a variety of data sets under multiple test configurations to measure the impact of imputation on the classification problem.
Semantics-based Localized Regularization for Interpretive Deep Learning068DS3-424WednesdayXIENingDeep neural networks (DNNs) are well known for high performance in learning on data that has high dimension and size. However, DNNs are opaque learning algorithms, with a multitude of interdependent data transformations that make it difficult to interpret how a particular prediction or classification is made. To enable this interpretation, our research proposes a novel regularization method that encourages localized network activations from semantically related input features. An ontology inference system is used after training with the proposed regularization method to attach semantics to activations in hidden layers based on the semantics of the inputs it depends on for activation. Current results demonstrate how the localized regularization yields DNN models with an easier-to-explain structure, with very modest cost to classification performance, in scene recognition tasks.
Using Reinforcement Learning for Demand Response of Domestic Hot Water Buffers: a Real-Life Demonstration070DS3-434WednesdayDE SOMEROscarThis poster demonstrates a data-driven control approach for demand response in real-life residential buildings. The objective was to optimally schedule the heating cycles of the Domestic Hot Water (DHW) buffer to maximize the self-consumption of the local photovoltaic (PV) production. A model-based reinforcement learning technique was used to tackle the underlying sequential decision-making problem. The proposed algorithm learns the stochastic occupant behavior, predicts the PV production and takes into account the dynamics of the system. A real-life experiment with six residential buildings is performed using this algorithm. The results show that the self-consumption of the PV production is significantly increased, compared to the default thermostat control.
Learning to Generate Sub-problems in Mixed Integer Programming072DS3-439WednesdayMOSSINALucaThis research addresses the resolution of recurrent combinatorial optimization problems, coupling machine learning techniques with branch & bound algorithms and operating under a limited time budget. Assuming such recurrent problems are the realization of an unknown generative process, the results of previous resolutions are collected and used to train a classification model. At first, when solving a new instance, this model will select a subset of decision variables to be set heuristically to some reference values, becoming fixed parameters. The remaining variables are left free and form a smaller sub-problem whose solution, while being an approximation of the optimal solution, can be obtained sensibly faster. Subsequently, if some of the time allocated is available, an iterative process of blocking/unblocking variables takes place, allowing to explore other areas of the solution space. This approach is of particular interest for problems where random perturbations on the instance parameters can occur unexpectedly, requiring a rapid re-optimization of a complex model.
Graph sketching-based Massive Data Clustering074DS3-447WednesdayMORVANAnneIn this work, we address the problem of recovering arbitrary-shaped data clusters from massive datasets. We present DBMSTClu a new density-based non-parametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph G between the N objects to cluster. Unlike k-means, k-medians or k-medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graph-based technique relies on the similarity graph G which costs theoretically O(N^2) in memory. However, our algorithm follows the dynamic semi-streaming model by handling G as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring O(Npoly log(N)) space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of non-convex clusters by recovering an approximate MST from the graph sketch of G. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets.
Change-point detection in human behaviour with application to psychiatry076DS3-450WednesdayMORENO MUNOZPabloPsychiatric patients with affective disorders such as schizophrenia or depression may suffer abrupt transitions in their behaviour. The apparition of these change factors shows the need for ambulatory assessment in presence of mental crisis. We consider it as a change-point detection problem. Our data consist of location traces (latitude-longitude data points), metrics from physical activity (number of paces, distance walked) and communication registers (messages sent, number of calls). All the information is structured as multidimensional time-series with one year duration. We explore Bayesian online models for change-point detection, these allow us to get precision about the personal behaviour of patients and real-time monitoring. Due to complexity of data and the need to accumulate sufficient evidence for reliable detections, we include latent variable models for reducing dimensionality and promoting the apparition of change-points. Results provide new insights in the detection of anomalous behaviour in mental health patients as well as in the accurate prediction of their states.
Knowledge Transfer From Text Data for Improved Unsupervised Word Segmentation078DS3-462WednesdayBÖNNINGHOFFBenediktNatural language, spoken or in written form, offers the possibility of sharing knowledge and exchanging information between various communication partners. For us it is a simple task to break down a spoken sentence into semantic units in order to follow the thoughts of our communication partner. But it is a challenging task to build a technical system that can automatically transform a continuous acoustic signal into a discrete sequence of words. This work deals with the problem of finding linguistic structures extracted from raw audio signals, where no linguistic expertise is used a-priori. We propose a system consisting of three successive stages: Firstly, the acoustic unit discovery (AUD) module based on a Dirichlet process mixture model clusters phoneme-like categories using raw audio signals. Secondly, an acoustic unit-to-letter (A2L) converter maps acoustic units onto letters providing a stochastic evaluation. In the third stage, the word discovery (WD) based on a nested hierarchical Pitman-Yor process is performed as an iterative procedure between word segmentation and language model training. While the AUD system as well as the WD module are fully unsupervised, the training procedure of the A2L conversion needs labeled data. To keep the a-priori knowledge small, we train the model utterance-wise without any information of word boundaries. In addition, we optionally use unrelated word-based text data to initialize the language model of the WD component.

The evaluation is performed on the Wall Street Journal corpus and on a Xitsonga dataset, which is largely spoken in the Limpopo province of the Republic of South Africa. Simulation results without initialization of the language model show that the incorporation of the A2L conversion significantly improves the word segmentation as if we directly apply the acoustic units to the WD module. Experiments for the case of language model initialization further show how a small amount of unrelated text data considerably improves the WD performance.
Learning with feature sided-information080DS3-479WednesdayMAOLAAISHAAminanmuVery often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. In the standard learning scenario, the input is represented as a vector of features and the feature side-information is most often ignored or used only for feature selection prior to model fitting. We believe that feature side-information which carries information about features intrinsic property will help improve model prediction if used in a proper way during the learning process. In this work, we propose a framework that allows for the incorporation of the feature side-information during the learning of very general model families to improve the prediction performance. We control the structures of the learned models so that they reflect features similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information
Block GMCA082DS3-482WednesdayKERVAZOChristopheBlind Source Separation (BSS) is a powerful method to analyze multichannel data in fields that involve processing large-scale data (e.g. astrophysical data, spectroscopic data in medicine and nuclear physics, etc.). Standard methods however fail at correctly tackling BSS problems when the number of sources becomes large, especially when the number of available samples is low. Moreover, they become computationally expensive. Building upon two standard BSS algorithms, namely GMCA (Generalized Morphological Component Analysis) and PALM (Proximal Alternating Linearized Minimization), we investigate the performances of block-coordinate optimization strategies to tackle sparse BSS problems in the large-scale regime. The results reveal that the proposed approach, the block-GMCA algorithm, significantly improves the performances both in terms of computation time and separation quality due to the use of blocks.
Personalization and tagging of the flow of IT system and applicative logs084DS3-484WednesdayLOGETTEPhilippeThe application gathers individual feedback on logs relevance, as well as one or several tags that characterize any log item. From this feedback, each line of log is then scored for user’s relevance, and may be categorized in several user’s tags. The user can then filter new log lines by estimated interest and category. The system creates a bag of words, scores new logs via classification supervised algorithms, and performs multi-label classification for tags prediction. Besides, the initiation of the feedback is accelerated by a clustering unsupervised learning algorithm that confronts the user with as diverse as possible logs. Future work may include log vectorization that goes beyond bag of words, the usage of recommender systems on a reduced space of the logs, as well as sequence of log lines or the detection of rare or abnormal logs.

This application will enable the relevance filtering and classification of new logs, hence alerting users for actions to be taken at the earliest, so that incidents can be avoided and global IT parc quality improved. Indeed, information and weak signals are present in the logs but not always exploited due to the limitation of traditional approaches that require monitoring through pattern definition. Such patterns are laborious to set-up initially and to maintain over time as systems and applications evolve, and due to the diversity of logs, with different formats that more or less standardized.
Learning Representations for Multidimensional Intransitivity086DS3-488WednesdayDUANJiudingIntransitivity is a critical issue in pairwise preference modeling. It refers to the intransitive pairwise preferences between a group of players or objects that potentially form a cyclic preference chain, and has been long discussed in social choice theory in the context of the dominance relationship. However, such multifaceted intransitivity between players and the corresponding player representations in high dimension are difficult to capture. We propose a probabilistic model that joint learns the d-dimensional representation (d >1) for each player and a dataset-specific metric space that systematically captures the distance metric in the embedding space. Interestingly, by imposing additional constraints in the metric space, our proposed model degenerates to former models used in intransitive representation learning. Moreover, we present an extensive quantitative investigation of the wide existence of intransitive relationships between objects in various real-world benchmark datasets. To the best of our knowledge, this investigation is the first of this type. The predictive performance of our proposed method on various real-world datasets, including social choice, election, and online game datasets, shows that our proposed method outperforms several competing methods in terms of prediction accuracy.
Spatial smoothing of semantic labelings of 3D LiDAR points with structured optimization088DS3-493WednesdayLANDRIEULoicWe propose a structured optimization framework for obtaining spatially smooth semantic labeling of 3D LiDAR point clouds. In particular, we show how a fitting choice of fidelity function, regularizer and solving algorithm allows us to efficiently obtain smooth labeling of high precision. Furthermore the probabilistic nature can be retained, allowing to measure the certainty of each affetctaion, a feat lost when using the traditional MAP inference in CRFs.
Applied algorithms for detecting ghost writing in high school assignments090DS3-504WednesdayLORENZENStephanPlagiarism and fraud in written assignments have long been a problem at any education level. Lately, however, we have seen an increase in cheating with the large final written project (known as Studieretningsprojekt or SRP) in Danish high schools. Students hire e.g. university students or teachers to write their paper for them. This kind of fraud, called ghost writing, is not simple copy-paste plagiarism, and thus requires smarter methods to detect. We investigate and apply techniques from machine learning in order to verify authorship of written assignments. Experiments are run on data from Danish high schools.
Structure Learning of Sparse Undirected Graphical Models for Count Data092DS3-524WednesdayNGUYENThi Kim HueIn this work, we introduce a new algorithm for structure learning of Poisson undirected graphical models, called PC-LPGM. In detail, assume that each node conditional distribution given its neighbours follow a Poisson distribution. Some authors proposed neighbourhood selection to recover the underlying structure. In this approach, the neighbourhood of each node is estimated in turn by solving a lasso penalized regression problem, and the resulting local structures stitched together to form the global graph. Nevertheless, models with increasing dimension require more delicate analysis; in particular, simply predicting one fixed variable on all other variables might not lead to accurate inference. We propose to employ the approach exploited in the PC- algorithm coupled with a limitation on the number of variables in the conditional sets. PC-LPGM seems to be very appealing, since it inherits the potential of PC-algorithm that allows to estimate a sparse graph even when the number of variables is in the hundreds or thousands. We provide both theoretical guarantees and simulation results for both low and high dimensional scenarios.
Features that distinguish languages: Insights from deep neural networks094DS3-527WednesdayMONTONicholasIt has recently been shown that convolutional neural nets (CNNs) are able to determine which language a presented input comes from with high accuracy (>90%). The high performance suggests that successful CNNs are able to capitalize on invariances that are cross-linguistically unique. The goal of this study is to take a closer look at the activation maps of successful CNNs and see if the activation patterns reflect behaviorally distinct speech sound tokens that people may use to identify different languages.
Unsupervised deep object discovery for instance recognition096DS3-533WednesdaySIMÉONIOrianeSevere background clutter is challenging to handle in many computer vision tasks, including image retrieval. Local or regional descriptors combined with partial matching is an attractive solution. Yet, focusing only on the relevant regions is essential to control memory, search complexity and most importantly, performance in the presence of distractors. We perform salient region detection in an unsupervised way that captures common and discriminative structures in the dataset. We thereby improve particular object retrieval with or without query bounding box annotations, especially in a large scale dataset containing small objects.
Mining Business Process Activities from Email Logs098DS3-542WednesdayAL JLAILATYDianaDue to its wide use in personal, but most importantly, professional contexts, email represents a valuable source of information that can be harvested for understanding, reengineering and repurposing undocumented business processes of companies and institutions. Few researchers have investigated the problem of extracting and analyzing the process-oriented information contained in emails. In this work, we go forward in this direction by proposing a new method to discover business process activities from email logs. Towards this aim, emails are grouped according to the process model they belong to. This is followed by sub-grouping and labeling the emails of each process model into business activity types. These tasks are applied by deploying an unsupervised mining technique accompanied by semantic similarity measurement methods. Two representative similarity measurement methods are examined: Latent Semantic Indexing (LSA) and Word2vec. These methods are compared to prove that Word2vec provides a better performance than LSA in grouping emails according to what process model they are related to, and in discovering emails belonging to the same activity type. Experimental results are detailed to illustrate and prove our approach contributions.
Variational Autoencoders Endowed with Richer but Still Computationally Efficient Statistical Models100DS3-550WednesdayPEŞTEAlexandraVariational Autoencoders have become one of the most powerful tools for approximate inference in Deep Learning. In the poster we present some promising modifications to the standard algorithms proposed by Kingma et al. (2014) and Rezende et al. (2014), based on the use of richer statistical models for both the distributions of latent and observed variables. In particular, for the latent variables, we discuss the advantages and disadvantages of using a low-dimensional-rank update of a diagonal covariance matrix, and the Cholesky factorization of a full matrix, compared to using a diagonal one. In the case of the conditional distribution of the observations given the latent variables, we propose the use of statistical models able to capture pairwise correlations between adjacent pixels in an image, still maintaining a computational complexity sub-quadratic in the number of variables. We evaluate our approach over different standard data sets, and compare the results with the state of the art in the literature.
Riemannian Methods for the Training of Neural Networks: An Overview and Experimental Comparison102DS3-558WednesdayNICOLAETitusThe use of non-Euclidean gradients for the training of a neural network goes back to the seminal work of Amari (1998) on natural gradient, and it has been recently recovered by several authors in the context of Deep Learning. Similarly, the possibility of adopting Riemannian optimization methods on the space of the weights of a neural network has attracted the attention of researchers working on manifold optimization. In the first part of the poster we review, from a unifying perspective, several approaches based on the adoption of non-Euclidean geometries for neural networks, including both probabilistic models and the modeling of the parameter space with manifold structures. We present a comparison of different algorithms, based on a detailed experimental analysis over multiple datasets and network topologies, and discuss the advantages and trade-offs of the use of Riemannian gradients for the training of deep neural networks, compared to standard Euclidean methods. In particular we evaluate the impact on the convergence and quality of the optimum for different optimization algorithms given by the approximation of the metric tensor in the computation of non-Euclidean gradients, often required high-dimensions. In the second part of the poster we present some novel approaches, which could lead to efficient implementations of Riemannian methods in deep learning.
On Real Time Hyperparameter Optimization104DS3-563WednesdayFRANCESCHILucaThe gradient of a validation error with respect to real-valued hyperparameters can be computed with two different procedures (reverse-mode and forward-mode) which have different trade-offs in terms of running time and space requirements.The forward-mode procedure is suitable for real-time hyperparameter updates (RTHO), which speed up significantly HO on large models such as deep neural networks. The algorithm requires, however, to set a descent procedure for the hyperparameters. This constitute an hyper-hyperparameter whose optimal value might be data and/or model dependent. We present possible strategies to increase the adaptiveness of RTHO and we show applications of this novel HO procedure in different scenarios.
Random Recursive Tree Ensembles: A high energy physics application106DS3-638WednesdayLALCHANDVidhiThe aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Supervised classification at a fundamental level can be defined as the ability to extract rules that discriminate one class from the other. This is done on the basis of training data whose class membership is known with the ultimate objective of classifying new data whose class mapping is unknown. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the \gls{LHCb} at CERN. A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of standard algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called \textit{boosting} and the latter is called \textit{bagging}. The algorithm proposed in this work targets a popular and challenging classification problem in high energy physics - that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilises a unified framework that combines two meta learning techniques - bagging and boosting. The results shows that this combination only works in the presence of a randomization trick in the base learners. The performance of the algorithm is mainly assessed on the basis of a physics inspired significance metric called the \textit{Approximate Median Significance} ($\sigma$). We also show how the algorithm fares compared to the leading machine learning solutions proposed using this dataset.
Clustering financial time series: algorithms, distances, stability and convergence rates108DS3-027WednesdayMARTIGautierFor clustering methods to be useful in online risk and trading systems, they have to be both robust to noise (which is partly achieved by leveraging copulas) and fast converging. Fast convergence of the clustering structures (flat or hierarchical) to the true underlying clusters is required to mitigate the non-stationarity effects of financial multivariate time series. If the methods require long time series to converge, then the underlying economic regime and its associated clustering structure may have changed several times in the meantime. In such case, the clusters dynamics are smoothed out and less useful for online risk and trading systems. At the heart of clustering algorithms is the fundamental notion of distance that can be defined based upon a proper representation of data. Copula-based dependence coefficients allow a better modelling of (non-linear) financial time series dependence than simplistic correlation measures such as the Pearson or Spearman ones. However, we may also consider what impact those novel correlation coefficients have on the convergence rate of the whole clustering methodology: does it speed it up or slow it down? We benchmark the empirical convergence rates of several state-of-the-art dependence-based clustering methods. Baseline results are obtained using a straightforward approach: Pearson’s ρ, Kendall’s τ, Spearman’s ρ_S correlation coefficients.
Low-rank Interaction Contingency Tables110DS3-161WednesdayROBINGenevièveLog-linear models are popular tools to analyze contingency tables, particularly to model row and column effects as well as row-column interactions in two-way tables. We introduce a regularized log-linear model designed for denoising and visualizing count data, which can incorporate side information such as row and column features. The estimation is performed through a convex optimization problem where we minimize a negative Poisson log-likelihood penalized by the nuclear norm of the interaction matrix. We derive an upper bound on the Frobenius estimation error, which improves previous rates for Poisson matrix recovery, and an algorithm based on the alternating direction method of multipliers to compute our estimator. To propose a complete methodology to users, we also address automatic selection of the regularization parameter. A Monte Carlo simulation reveals that our estimator is particularly well suited to estimate the rank of the interaction in low signal to noise ratio regimes.
We illustrate with two data analyses that the results can be easily interpreted through biplot vizualization. The method is available as an R code.
Decision trees optimization for ultrasound detection of fetal abnormalities112DS3-241WednesdayBESSONRémiIn this work we investigate the learning problem of good diagnostic policies in foetal abnormalities search by ultrasound. We start by learning our environment via Bayesian method such that maximum entropy approach and then we write our problem as a Markov Decision Process. Reinforcement learning methods and ideas from algorithms looking for shortest path in a graph such that A*, AO* are adapted in order to find good diagnostic policies.
Fast Incremental Stochastic Version of the EM algorithm114DS3-256WednesdayKARIMIBelhalA wide class of statistical problems involves observed and unobserved data. We can consider, for example, inverse problems concerning deconvolution, source separation, change-points detection, etc. Linear and nonlinear mixed effects models can also be considered as incomplete-data models. Estimation of the parameters of these models is a difficult challenge. In particular, the likelihood of the observations cannot usually be maximized in closed form. The EM algorithm proposed by Dempster, Laird and Rubin led to many variants when the conditional expectation of the complete log-likelihood is intractable. The MCEM (Meng, 1993) and the SAEM (Delyon, 1999) are two of them.

Following Neal, Hinton and Gunawardana efforts in justifying a variant version of the EM algorithm considering an incremental scheme, we decided to focus on the Incremental EM, MCEM and SAEM for continuous random variables.
Online learning and Blackwell approachability with partial monitoring: Optimal convergence rates116DS3-579WednesdayKWONJoonBlackwell approachability is an online learning setup generalizing the classical problem of regret minimization by allowing for instance multi-criteria optimization, global (online) optimization of a convex loss, or online linear optimization under some cumulative constraint. We consider partial monitoring where the decision maker does not necessarily observe the outcomes of his decision (unlike the traditional regret/bandit literature). Instead, he receives a random signal correlated to the decision–outcome pair, or only to the outcome. We construct, for the first time, approachability algorithms with convergence rate of order O(T^(-1/2)) when the signal is independent of the decision and of order O(T^(-1/3)) in the case of general signals. Those rates are optimal in the sense that they cannot be improved without further assumption on the structure of the objectives and/or the signals. (joint work with Vianney Perchet)
Learning to Solve Hydro Unit Commitment Problems in France118DS3-637WednesdayIOMMAZZOGabrieleIt is well known that electricity cannot be stored easily and that one of the most efficient methods for storing it is to turn it into potential energy by pumping water up mountain valleys into water basins. However, a system of interconnected valleys presents several issues when storing and re-using stored electricity. Specifically, scheduling the turbine/pump units in valleys is known as the Hydro Unit Commitment (HUC) problem. In France, hydro plants represent the first source of renewable energy and HUC corresponds to hundreds of difficult Mixed-Integer Nonlinear Programs to solve daily. Even when they are approximated to Mixed-Integer Linear Programming (MILP), they pose formidable challenges. Currently, MILP solution technology cannot even find a feasible solution to these MILPs. A feature of current solvers, namely their extensive configuration possibilities, is not usually exploited to the fullest. We believe, instead, that supervised machine learning techniques can be used as part of an approach that, upon receiving a new instance, can recommend a good solver configuration for the numerical and structural properties of the instance being solved. Such an approach will need to be structured into two main phases. During the first phase a learning model (predictor) that can evaluate the performance of a pair (instance, configuration)will be trained. During the second phase the structure of the predictor will be exploited in order to build an optimization model that can find an optimal configuration. Due to its structure and characteristics, we believe that restricting this idea to the HUC problem could have a strong impact, both as a methodology and practically, towards finding locally optimal solutions of the difficult HUC instances.