**POSTERS**

We are happy to announce a best poster award of 500 EUR; the winner will be selected based on the votes of the participants.

Title | PosterID | ID | Day | Last Name | First Name | Abstract | Comment | |
---|---|---|---|---|---|---|---|---|

Interactions and collider bias in gene-environment case-only data | 001 | DS3-010 | Tuesday | BALAZARD | Félix | Background: Genetic risk estimation can quantify some of the predisposition of an individual to a disease. The identification of environmental factors presents more challenges. Collider bias appears between two causes (e.g. gene and environment) when conditioning on a shared consequence (the collider, disease). Methods: We introduce Disease As Collider (DAC), a new methodology to validate environmental factors using genetic risk in cases. Here we consider disease as a collider between genetic and environmental factors. Under reasonable assumptions, studying the association in cases only between genetic risk and environment provides a signature of an environmental risk factor. Simulation of disease occurrence in a source population allows to estimate the statistical power of DAC as a function of prevalence of the disease, predictive accuracy of genetic risk and sample size. We illustrate DAC in 831 type 1 diabetes (T1D) patients. Results: The power of DAC increases with sample size, prevalence and accuracy of genetic risk estimation. For a prevalence of 1% and realistic genetic risk estimation, power of 80% is reached for a sample size under 3000. Power was low in our case study as the prevalence of T1D in children is low (0.2%). Conclusions: DAC could provide a new line of evidence for environmental factors of complex diseases. We discuss the circumstances needed for DAC to participate in the triangulation of environmental causes of disease. We highlight the link with the case-only design for gene environment interaction. | Download | |

Deep-learning for emotion recognition | 003 | DS3-016 | Tuesday | ETIENNE | Caroline | The recent progresses in cognitive science allow new types of human-machine interactions. We can now ask questions to our smartphones or computer, soon we will be able to ask our car to drive us to a destination of our choice etc. Nevertheless, machines can still be improved in their understanding of human emotions. The task is all the more difficult when the nature of the interaction does not permit to get all the channels through which humans express their emotions. The aim of this work is to improve state of the art emotion classification in speech using deep learning algorithms. | Download | |

Particle Swarm Optimization for algorithmic trading | 005 | DS3-041 | Tuesday | BENHAMOU | Eric | Automated trading systems make decisions on how to invest in financial markets. More precisely, these algorithms decide when to trade (timing), in which direction (long or short), on which market (underlying), with sometimes predetermined level of risk (stop loss level) and rewards (profit target) and in which quantity. These decisions depend on a variety of parameters that must be optimized to maximize returns and overall profits while minimizing risk. In this research, we investigate the use of various optimization algorithms from the simple gradient descent to more heuristic techniques like particle swarm optimization and provide some hints on which method works best according to our experience. | Download | |

Multimodal Popularity Prediction of Brand-related Social Media Posts | 007 | DS3-078 | Tuesday | MAZLOOM | Masoud | Brand-related user posts on social networks are growing at a staggering rate, where users express their opinions about brands by sharing multimodal posts. However, while some posts become popular, others are ignored. In this work, we present an approach for identifying what aspects of posts determine their popularity. We hypothesize that brand-related posts may be popular due to several cues related to factual information, sentiment, vividness and entertainment parameters about the brand. We call the ensemble of cues engagement parameters. In our approach, we propose to use these parameters for predicting brand-related user post popularity. Experiments on a collection of fast food brand-related user posts crawled from Instagram show that: visual and textual features are complementary in predicting the popularity of a post; predicting popularity using our proposed engagement parameters is more accurate than predicting popularity directly from visual and textual features; and our proposed approach makes it possible to understand what drives post popularity in general as well as isolate the brand specific drivers. | Download | |

Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing | 009 | DS3-085 | Tuesday | LE | Minh | Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this work, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its efficiency. We investigate the portion of errors which are the result of error propagation and confirm that reinforcement learning reduces the occurrence of error propagation. | Download | |

RandNLA for GLMs with Big Datasets | 011 | DS3-093 | Tuesday | LANGE | Robert | This research project digs into the potential of randomized algorithms for tall data analysis. In particular, I focus on the implementation of fast approximations to the statistical leverage scores and the asymptotic properties of resulting estimators. These so-called algorithmic leveraging algorithms can be used to speed up computation time by effectively reducing the dimensionality of the underlying normal equations problem. I illustrate my results in a generalized linear model (GLM) setting where the number of observations (n) is much larger than the number of features (d). Important questions to answer include the following: How do static fast estimator approximations generalize to iterative estimation procedures such as iterative weighted least squares (IWLS)? What are the statistical properties of the resulting estimator (analysis of variance and robustness)? How does correlation structure affect the design of an optimal sampling scheme? I provide answers and further questions by the means of Monte Carlo experiments and establishing concentration bounds for the resulting estimator. | Download | |

Structured dropout: a generalization of dropout technique. | 013 | DS3-100 | Tuesday | KHALFAOUI | Beyrem | Dropout has been proposed as a technique preventing overfitting while training neural networks. We propose a generalization of dropout taking in account prior knowledge about data (available for example in computational biology). We show that this can enhance dropout's performance in some benchmarks and real data. | ||

A Markov Random Field Model for Entity-Relationship Retrieval | 015 | DS3-120 | Tuesday | SALEIRO | Pedro | This work is concerned with effective retrieval of entity relationships from large corpora of unstructured texts. We consider entities of any type, i.e., characterized by context terms instead of a predefined category, and retrieve entity tuples based on specified relationships. Recent approaches to ad-hoc entity retrieval have demonstrated that using Markov Random Field (MRF) models to incorporate term dependencies can improve the search performance. That suggests that MRF could be used to model dependencies among entities and facilitate relationship retrieval over unstructured texts. Thus, we create an Entity-Relationship Dependency Model (ERDM) and an index of entity and relationship context vectors that allow us to implement several retrieval methods. Experiments with a large Web collection (ClueWeb-09-B) and 267 relationship queries show that ERDM consistently outperforms other relevant baseline methods, including the language models. | Download | |

Acquiring Human-Robot Interaction Skills with Transfer Learning Techniques | 017 | DS3-129 | Tuesday | MOHAMMED | Omar | Human-robot interaction (HRI) is the study of the relation between humans and robots, and how to enable robots to communicate more effectively with humans. One of the challenges of HRI is to build multimodal behavioral models, involving coordination between input and output modalities such as speech, facial expression, gaze, head movement, hand gesture, etc. Several machine learning models for HRI have been developed over the years, but one key limitation of these models is that they are task-specific, and they perform poorly once the task slightly changes. In order to transfer knowledge learnt in one task to a new one, we introduce the idea of ‘skills’: they are the elementary building units of interactions that represent a wide range of HRI situations, similar to what strokes are for letters, or phonemes for words. In this poster, we show our primary results in extracting those skills for non interactive multimodal tasks using deep neural networks. | Download | |

Long-term forecasting despite Data Shortages | 019 | DS3-134 | Tuesday | ZULUAGA | Maria Alejandra | Forecasting plays a critical role within the travel industry. Revenue management, flight price tracking and compensation estimation systems are among the many applications which require accurate forecasting of future behavior by inferring from observations of the past. A common requirement of these systems is to make accurate long-term forecasts based on a stochastic model of the data at hand, which are often limited. Such a scenario is challenging for any prediction algorithm. In this study, we benchmark different methods that range from classic statistical approaches to state-of-the-art Long Short-Term memory (LSTM) networks in the task of long-term price prediction with limited training data. The ultimate goal of this study is to establish the scenario that best suits each method and to determine empirical limits on the training data requirements and the forecast horizon of each method. | Download | |

Video Object Segmentation Using Adversarial Networks and Mathematical Morphology | 021 | DS3-147 | Tuesday | FEHRI | Amin | Adversarial training has been shown to produce state of the art results for generative image modeling. In this work we propose an adversarial training approach to train video object segmentation models. We train a fully convolutional segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network, in order to detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net. Mathematical morphology filters are also applied as a post-treatment to further enhance results. Preliminary results on several examples are presented to illustrate the pertinence of this approach. | Download | |

Computational Deconvolution of Mixed Signals in Tumor Microenvironment Using Independent Component Analysis | 023 | DS3-153 | Tuesday | CZERWINSKA | Urszula | Some biological systems are characterized by high complexity. This a case of tumor microenvironment which includes distinct cell types that critically impact tumor development and response to treatment. Genetic information, represented in the number of transcripts, from the microenvironment represents a complex mixture that can be described by linear model: AX = B. Where B is data matrix of one biological sample, X are mixing proportions and A is the matrix of expression of genes in each cell type. Several methods have been proposed to estimate X, such as: least squares regression (Abbas et al., 2009) and more recently, non-negative least squares regression (Qiao et al., 2012), quadratic programming (Gong et al., 2011; Zhong, Wan, Pang, Chow, & Liu, 2013) and supported vector regression (Newman et al., 2015). However, all those methods are quite prone to overfitting and they show potential sensibility to molecular noise. They are also sensitive to establish 'ground truth' signatures of cell types while highly specific signatures may not exist in real. Cell types could be characterized and differentiable by a weighted vector of expression. In our work, we propose to apply an unsupervised method that will decompose mixture into independent sources based uniquely on data structure and without any prior knowledge. We are applying Independent Component Analysis (ICA) (Hyv, Karhunen, & Oja, 2001) in order to solve blind source separation problem. As a result of ICA, deconvolution data matrix X can be approximated: X \approx AS, where X is a matrix of data of size m x n, and A is a m x k matrix, k << m. The rows of the A matrix can be named components (m-dimensional vectors), and the columns of the S matrix projections of data vectors onto the components (a k-dimensional vector for each of n data points) (Zinovyev et al., 2013). Results In our strategy, we apply ICA iteratively to separate signals with higher and higher resolutions and therefore get signals for immune cells or ideally immune cell subtypes. Through this application of ICA algorithm on bulk tumor data of brest carcinoma, we isolated meaningful groups of cell types. However, validation framework is under development. In order to address problem of 'ground truth', we are working on developing of upsampling method based on Generative Adversary Networks (unsupervised deep learning) in order to extend existing datasets with preservation of the correlation structure though simulation of two dimensional non-gaussian distribution. In our up-sampling design, we are based on single-cell data that represent one source that we will then mix to approach real existing data of bulk tumor that will serve as testing and validation framework for ICA-based deconvolution. Perspectives In case of success, the project will provide important insights into the complex organization of the immune component of TME, which can be directly used in diagnosis, and treatment of cancer, especially in cancer immunotherapy. At the methodological level, novel methods for signal deconvolution will be developed and implemented that can be applied transversally in other domains with similar problems. Also, the obtained interaction network would lead to a more detailed deterministic mathematical model of cell-cell communication between immune-related cells in the TME thus identifying novel drug targets. | Download | |

Multitask Learning for Twitter Sentiment Analysis | 025 | DS3-160 | Tuesday | MOURA | Simon | We propose a general formal framework for machine learning problems involving multiple interdependent and heterogeneous tasks and explain how and why it is relevant in numerous application. We also provide a new public dataset which fit the multi-target learning framework and baselines for this dataset. We believe that this real dataset application will contribute to research in the domain of multitask learning. | Download | |

Automatic Dynamic Correlation Template Tracking of Inner Lips Based on CLNFs | 027 | DS3-168 | Tuesday | LIU | Li | In this work, a novel automatic approach to extract the inner lips contour of speakers without using artifices is proposed. This method is based on a recent facial contour extraction model developed in computer vision, called Constrained Local Neural Field (CLNF), which provides 8 characteristic points (landmarks) defining the inner lips contour. However, directly applied to our visual data including Cued Speech (CS) data, CLNF failed in about 50% of cases. We propose a Modified CLNF to estimate inner lips contour based on original CLNF landmarks. A dynamic template using the first derivative of smoothed luminance variation is explored in this new model. This method gives precise estimation of aperture for inner lips. It is evaluated on 4800 images of three French speakers. The proposed method corrects 95% CLNF errors and total RMSE of one pixel (i.e., 0.05cm in average) is reached, instead of four pixels using original CLNF. | Download | |

Exploration-Exploitation in MDPs with Options | 029 | DS3-177 | Tuesday | FRUIT | Ronan | The option framework [Sutton et al., 1999] is a simple yet powerful model to introduce temporally-extended actions and hierarchies in reinforcement learning [Sutton and Barto, 1998]. An important feature of this framework is that Markov decision process (MDP) planning and learning algorithms can be easily extended to accommodate options, thus obtaining algorithms such as option value iteration and Q-learning [Sutton et al., 1999], LSTD [Sorg and Singh, 2010], and actor-critic [Bacon and Precup, 2015]. While options may significantly improve the performance w.r.t. learning with primitive actions, a theoretical understanding of their actual impact on the learning performance is still fairly limited. Notable exceptions are the sample complexity analysis of approximate value iteration with options [Mann and Mannor, 2014] and the PAC-MDP analysis by Brunskill and Li [2014]. In this work, we derive the first regret analysis of learning with options. Relying on the fact that using options in an MDP induces a semi-Markov decision process (SMDP), we first introduce a variant of the UCRL algorithm [Jaksch et al., 2010] for SMDPs and we upper bound its regret. While this result is of independent interest for learning in SMDPs, its most interesting aspect is that it can be translated into a regret bound for learning with options in MDPs and it provides a first understanding on the conditions sufficient for a set of options to reduce the regret w.r.t. learning with primitive actions. | Download | |

Do Convolutional Networks need to be Deep for Text Understanding ? | 031 | DS3-191 | Tuesday | LE | Thien Hoa | Convolutional Network now becomes ubiquitous on many Image Classification tasks because it can retrieve the state-of-the-art performance when it goes very deeply. The same effect has been observed in Speech Recognition but is it always the case for Text Classification ? There are a lot of results against this suspect. In this presentation, we will provide the first empirical demonstration to support this fact. The direct consequence will result in subsequent study of the deep network structure for text and its application in many NLP tasks. | Download | |

Unsupervised Outlier Detection in High-Dimensional Data Streams | 033 | DS3-203 | Tuesday | FOUCHÉ | Edouard | Outlier detection has the goal to reveal unusual patterns in data. Typical scenarios in predictive maintenance are the identification of failures, sensor malfunctions or intrusions. This is a challenging task, especially when the data is high-dimensional, because outliers become “hidden” and are visible only in particular subspaces. Also, Predictive maintenance data is often available as a stream. By nature, data streams are infinite; they are evolving over time and can be aggregated at multiple time scales. Furthermore, in real-time applications, assumptions about the aspect of future and unknown anomalies are unrealistic, so the problem should be considered unsupervised. Most existing methods for outlier detection are supervised and only apply either to static or to low-dimensional data, so this problem remains largely unaddressed. In this poster, we introduce a novel anytime algorithm for unsupervised outlier detection in high-dimensional data streams. (This is a work in progress) | not made available (presenter's request) | |

Beyond the MOOC Environment: Enriching Learner Models through the Social Web | 035 | DS3-213 | Tuesday | CHEN | Guanliang | Large-scale learning analytics is commonly based on data traces learners generate within a Massive Open Online Course (MOOC) platform such as edX during the running of a MOOC. As MOOCs typically last between five and ten weeks and many learners are rather passive consumers of the offered learning activities, this exclusive use of MOOC platform data traces severely limits the insights we can gain about our learners. This lack of data leads to coarse-grained learner profiles which in turn limit our ability to provide adaptive and personalized online learning experiences. The social Web (where platforms such as Twitter and LinkedIn have hundreds of millions of users) potentially offers a rich source of data to supplement the MOOC platform data traces, as many learners are likely to be active on one or more social Web platforms. This poster aims to demonstrate the benefits of profiling learners by looking beyond the MOOC platform, including 1) gathering more user attributes (e.g., demographics) that are relevant to learning to construct a more accurate and complete learner model and 2) predicting both in-course and after-course behavior with high accuracy. | Download | |

Identifying change points for linear mixed models: A solution through Evolutionary Algorithms | 037 | DS3-222 | Tuesday | GARCIA CRUZ | Ehidy Karime | The Change Point problem arises in many applied situations. The Change Point problem has been studied by several authors. It goes from the change point problem in piecewise regression through classical techniques to Change Point estimation in linear mixed models by using a dynamic programming algorithm. The objective of this proposal is estimating each subject specific change point by using Evolutionary algorithms when we consider the data come from a longitudinal setting and using linear mixed models as a solution to this problem. The results will be showed based on a simulation study, varying some specific conditions on the parameters associated to the LMM and amount of subjects that can be taken into account into the study. Additionally, we illustrate the first solution with a real problem about dried Cypress wood slats in which this methodology is useful to predict the time of dried associated to a specific slat thickness. This is done as a generalization on the calibration problem. In this case, once the change points have been gotten through EA, a calibration curve can be fitted to these change points according with their own thickness. It will allow us to predict the specific change point. Keywords. Change Point; Evolutionary Algorithms; Linear Mixed Models; Calibration Function; Paralleling Programming | Download | |

Temporal Decision Trees | 039 | DS3-440 | Tuesday | SHALAEVA | Vera | My work falls within the domain of machine learning and aims at designing Decision Tree algorithms adapted to handle large temporal dataset. On the one hand, time series are observed in a growing number of domains. On the other hand, Decision Trees are an interesting approach providing a decision model with high level of interpretability for users. My goal is to improve the Temporal Decision Tree in term of computational complexity, performance and interpretability. | Download | |

Finding key biological features for cancer diagnosis from histopathology slides | 041 | DS3-249 | Tuesday | NAYLOR | Peter | Cancer diagnosis involves complex interpretation of a multitude of heterogeneous data, such as genomic, transcriptomic and image data. The image data used in this context corresponds to thin slices of the tumor and of the surrounding tissue, stained with agents in order to highlight specific structures, such as cell nuclei or collagen. A medical practitioner will routinely check the patients histopathology image data in order to decide the next step in the patient's treatment. Histopathology slides can thus be very informative of the cancer subtype and/or of how the patient's immune system is reacting to the cancer. We wish to discover appropriate tools to quantify the huge amount of data found in histopathology slides. On the long run, such a quantification scheme would fit in a work pipeline that would investigate the most informative physiological features and the link to genomic and transcriptomic features. Our strategy to identify the important features is to first segment the important elements in histopathology slides (such as cells, tumor and stromal tissue, necrotic regions, etc.), second to define physiologically interpretable features for each of these elements and third to build a prediction model in order to assess the importance of each of these features. Uncovering information from histopathology slides is a difficult task as one uncompressed slide can easily be over 65 GB (200000 x 100000 pixels). Identifying the important features from such a huge amount of data, is a difficult endeavour, which requires the use of prior knowledge brought in by pathologists. Supervised learning is certainly the most powerful strategy for image segmentation for this type of data. In order to segment the important structures in these images we propose a method based on fully convolutional network architectures for image segmentation. Ultimately, this image segmentation will allow us to define biology driven features for predicting clinical variables, such as outcome, subtype or response to treatment. It will also allow us to investigate the link between genomic and transcriptomic features of the tumors and this set of spatially resolved features from image data, that we hope will be complementary. | Download | |

Payments networks and risk of firms | 043 | DS3-273 | Tuesday | LETIZIA | Elisa | We empirically study a large proprietary dataset of payments between Italian firms from a network perspective in order to understand how firms interact with each other. Standard network metrics, such as degree and strength distribution, and components decomposition, highlight non trivial interactions between firms. Finally, communities detection techniques are employed in order to investigate correlations between network-based clustering and an idiosyncratic measure of riskiness for firms. | Download | |

Density estimation and nonlinear equalization for optical communications using neural networks | 045 | DS3-280 | Tuesday | RIOS MÜLLER | Rafael | There is an increased interest in compensating nonlinear distortions in optical communications systems. Typically, Volterra nonlinear equalizers are used in optical communications, however those equalizers have limited capacity on responses they can compensate. We investigate nonlinear equalization using neural networks as an alternative to Volterra equalizer. Finally, we investigate maximum a posteriori decoding under nonlinear channels with memory where the channel probability transition function is learned using a neural network. | ||

Learning fuzzy spatial relationships for image semantic analysis with justification | 047 | DS3-291 | Tuesday | PIERRARD | Régis | The goal is to develop a machine learning algorithm that is able to justify the results it provides using fuzzy spatial relationships. | Download | |

Model-based multivariate discretization for logistic regression | 049 | DS3-294 | Tuesday | EHRHARDT | Adrien | Credit institutions are interested in the refunding probability of a loan given the applicant’s characteristics in order to assess the worthiness of the credit. For regulatory and interpretability reasons, the logistic regression is still widely used to learn this probability from the data. Although logistic regression handles naturally both quantitative and qualitative data, two pre-processing steps are usually performed: first, continuous features are discretized by assigning factor levels to pre-determined intervals; second, qualitative features, if they take numerous values, are regrouped into variables taking fewer factor levels. In this communication focus will be given on the discretization of continuous variables which is performed for two main reasons: first, it produces a “scorecard” with a direct correspondence from intervals to score “points”; second, it allows do deal with non linearity of the score with respect to the continuous variables. There already exists many discretization algorithms (see the review from Ramírez‐Gallego et al. (2016)). To the best of our knowledge, the few multivariate supervised algorithms are unsatisfactory in our setup mainly because they are not fully automated, their optimized criterion does not produce suitable discretized features for logistic regression and their approach are empirical. By reinterpreting discretized features as latent variables, we are able, through the use of a Stochastic Expectation-Maximization (SEM) algorithm and a Gibbs sampler, to overcome those shortcomings and to find the best discretization scheme w.r.t. the logistic regression loss. The good performances of this approach are illustrated on simulated and real data from Crédit Agricole Consumer Finance. | Download | |

MARTINI: finding the needle in a genomic haystack using networks | 051 | DS3-309 | Tuesday | CLIMENTE | Héctor | Genome-wide association studies (GWAS) are widely used for detecting genetic variants correlated with an observed trait. GWAS compare two sets of patients (usually diseased and healthy) in a two-step experiment: first, the genetic variants of each of the participants are obtained by sequencing; followed by a statistical association analysis of the variants. GWAS target settings where the paradigm common variants-common disease applies (the presence of a variant has a probabilistic and mild impact on the trait). While these studies have provided insights into the pathways underpinning many common diseases, including cancer, the analysis of such very high-dimensional, weakly associated data poses both computational and statistical difficulties. One way of increasing statistical power is using a priori biological knowledge: it is likely that if two variants are associated with a disease, they share a biological context. In particular, we are developing a methodology to efficiently integrate biological networks (gene annotation and physical interactions between proteins) in GWAS. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly. While we are applying our models to different settings of simulated data, we plan on applying the methods to a high-quality breast cancer dataset and, potentially, uncover some genes that increase the likelihood of developing the disease. | Download | |

Causal Consistency of Structural Equation Models | 053 | DS3-323 | Tuesday | RUBENSTEIN | Paul | Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. | Download | |

Wasserstein Dictionary Learning | 055 | DS3-339 | Tuesday | SCHMITZ | Morgan | Optimal Transport theory enables the definition of a distance across the set of measures on any given space. This Wasserstein distance naturally accounts for geometric warping between measures (including, but not exclusive to, images). We introduce a new, Optimal Transport-based representation learning method in close analogy with the usual Dictionary Learning problem. This approach typically relies on a matrix dot-product between the learned dictionary and the codes making up the new representation. The relationship between atoms and data is thus ultimately linear. We instead use automatic differentiation to derive gradients of the Wasserstein barycenter operator, and we learn a set of atoms and barycentric weights from the data in an unsupervised fashion. Since our data is reconstructed as Wasserstein barycenters of our learned atoms, we can make full use of the attractive properties of the Optimal Transport geometry. In particular, our representation allows for non-linear relationships between atoms and data. | Download | |

An Inexact Dual Augmented Lagrangian Method for Fast CRF Learning | 057 | DS3-344 | Tuesday | HU | Shell | We consider the problem of learning loss-augmented conditional random fields (CRF), which subsume both max-margin and maximum likelihood regimes of parameter estimation of CRF models. Based on variational relaxation of the intractable Shannon entropy and the marginal polytope, we propose a dual augmented Lagrangian formulation for CRF learning without performing global inference. We propose an inexact dual augmented Lagrangian (IDAL) method, which requires only clique-wise updates to estimate the gradient of the Lagrangian multipliers. We show in theory that only a fixed number of clique updates is needed for obtaining gradients with good quality and thus ensure a linear convergence of the entire algorithm. Our experiments show that the proposed algorithm outperforms state-of-the-art baselines in terms of the speed as well as the testing accuracy. | Download | |

Distributed machine learning infrastructure for medical image segmentation | 059 | DS3-352 | Tuesday | HOTOIU | Lucian | Introduction: During radiation therapy of cancer patients it may often arrive that the initial prescription has to be adapted as the treatment advances, as a consequence of anatomical changes occurring in the body. In such situations, to accelerate the clinical workflow, machine learning techniques (deep learning) can be employed to facilitate a fast, in-room adaptation of the treatment plan. It is believed that deep learning algorithms can be used to perform a reliably robust, automatic, CT image segmentation that can serve as valuable input to adapt and re-optimize the treatment planning. Description: A major factor in the success of machine learning algorithms is the quantity and quality of the data available to train them. The data must be sufficiently representative to the problem and must contain satisfactory annotations to guide the training [1]. In the field of medical image segmentation, in this regard things are not any different and, furthermore, they are made increasingly complicated by legal patient confidentiality and hospital specific policies. To achieve acceptable performance in segmenting CT images with supervised deep learning techniques, about 5000 labeled examples (patients) are needed for each pathology [1]. Given the sensitivity of medical records, achieving the amount of necessary dataset is no insignificant thing therefore a system that would allow access to multiple sources is of paramount importance. The infrastructure would facilitate deploying and training a deep-learning neural network on available medical CT images, inside multiple treatment centres, in a completely anonymous fashion. To maximize the amount of said available dataset, the owning institutions could be accessed through a distributed data network, following the hybrid model of Skripcak and al. [2]. The CT images and the annotations remain stored in the host institution which greatly simplifies the legal problem of sharing patient data across sites. The deep-learning neural network algorithms will be distributively deployed from a central repository shared by the participants. The central repository delegates the training of the algorithm to computation hardware located on each site. The network is trained locally and only the learned lessons (optimized algorithm parameters) are gathered and sent back to the central repository. In this manner the machine learning model can “blindly” and reciprocally evolve, practicing on data from all other hospitals involved in the distribution. Conclusion: Despite the ambitious distributed infrastructure model, the proposed system provides a number of important advantages: • Keeps the data locally, retrieving only intelligence, it therefore alleviates the legal problem of patient data protection. • It is a win-win situation for participating institutions which will gain access to a larger data-pool in exchange of reciprocity. • It distributes the computation load in between the participating institutions. This avoids the need of a massive “massive” central cluster. Bibliography [1] Goodfellow, et and al., Deep Learning, MIT Press, 2016. [2] Skripcak, et and al., "Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymized public datasets," Radiotherapy and Oncology, vol. 113, pp. 303-309, 2014. | Download | Lucian Hotoiu^{*1}, Elliot Brion^{2}, Rudi Labarbe^{1}. ^{1}Ion Beam Applications SA, Belgium. ^{2}Universite Catholique de Louvain, Belgium. ^{*}Contact: Lucian.Hotoiu@iba-group.comjoint poster with Rudi LABARBE (DS3-361) |

Characterising industrial sites' flexibility with reservoir models | 061 | DS3-376 | Tuesday | CUVELIER | Thibaut | Electro-intensive industrial sites are very dependent on electricity prices to remain competitive. Nevertheless, they can often tune their processes in order to decrease their electricity consumption during the most critical periods, for example by using decision support systems based on mathematical modelling of their processes. Our goal is to estimate the flexibility potential of a complete site, not to tune each process very precisely. To this end, we propose a generic paradigm to help conceiving such models: reservoirs are the basic building block, which allows for great expressiveness while being close to the physics. More specifically, we do not need very precise models for our purposes, but ones that can be efficiently included in optimisation models. Our first results show that the obtained reservoir models can give sufficiently good approximations for metallurgical processes (more precisely, electric-arc and ladle furnaces). | Download | |

A study of Statistical and Predictive Analysis of US Pain Medications | 063 | DS3-386 | Tuesday | KAUSHIK | Shruti | A huge amount of data is available and researchers can exploit this data to discover patterns from the patients’ health records. Big volume of data helps to reveal patterns and associations which can lead to high quality healthcare at reduced cost to all. Using data mining and predictive analytics together; can help us predict trends from the historical data. Predictive solutions and the expert knowledge can make an enormous impact in diagnosing numerous diseases. In our research, we perform frequent data mining analysis to find patterns among patient journeys of patients consuming a particular pain medication manufactured by a U.S. based pharmaceutical company. We looked at the impact of demographic variables on patients’ consumption behavior of pain medication and analyzed their expenditure on the pain medication. This analysis is followed by the prediction based analyses where we predict the behavior of patients using machine learning algorithms (such as Naïve Bayes, Decision tree, Logistic regression and Support vector machines). The main implication of this research is in helping healthcare providers and pharmaceutical companies create targeted treatment measures. | Download | |

A representer theorem for deep kernel learning | 065 | DS3-402 | Tuesday | BOHN | Bastian | We provide a representer theorem for a chain of linear combinations of kernel functions of reproducing kernel Hilbert spaces and use this result to establish a theoretical foundation for machine learning algorithms based on the concatenation of functions from these spaces. Furthermore, we sketch out how our findings apply to existing deep kernel learning approaches. | Download | |

A Cooperative Model of Machine Learning and Operations Research for Railway Operations | 067 | DS3-418 | Tuesday | MILLIET DE FAVERGES | Marie | The purpose of this work is to create a cooperative model of data science and operations research to increase the robustness of railway timetabling with real data analysis. We could use predictive models on punctuality to create schedules that would take into account what could happen on operational level. The study concerns the Montparnasse station in Paris. It is a large station and a bottleneck with a lot of different lines crossing (High Speed lines and regional lines), so the scheduling is difficult to respect: the network is very dense and the slightest delay can spread to the following trains. Identifying in advance delays would avoid their propagation in the station area. However, delays are rare events which are hard to predict with usual machine learning algorithms. Also, their causes are often multiple and complex: the delay may be primary (external cause) or secondary (caused by another delayed train). | Download | |

Learning from Uncertain Data: A Decision Tree Approach | 069 | DS3-112 | Tuesday | NUNES | Cecília | Noise in medical data arises from distinct diagnostic practices, manufacturer-dependent technology, or distinct modalities available to obtain an insight. Decision trees (DT) are an interpretable learning algorithm, with acknowledged benefits in various domains. Interpretability is key in safety critical contexts such as medicine, where Dts are already used in official guidelines. However, DTs are insensitive to input data variability. Learning from uncertain data yields poor generalization. Evaluating noisy data leads to inaccurate predictions. Several methods were proposed to make DTs robust to uncertainty by weighting the contribution of all child nodes to the predictions, with benefits in tree size and accuracy. Two of the approaches entail high computational costs, and one of them did not consider the uncertainty model during training. Another approach employed a sigmoid model in a multivariate-split tree, therefore limiting interpretability. In this work, we present a probabilistic DT, where the weights of the branches to the class prediction are determined by the noise distribution. The method separates the uncertainty representation in the split search and in the evaluation phases. The merits of the method are evaluated with regard to prediction accuracy and tree size. The main result concerns the reduction is tree size when using the filtered search or soft training propagation. | ||

Fully Event-Driven, Timescale Invariant Online Deep Learning Using Unsupervised STDP | 071 | DS3-436 | Tuesday | THIELE | Johannes | We present a deep spiking convolutional neural network of integrate-and-fire neurons which performs unsupervised online learning on a stream of images of handwritten digits using spike-timing dependent plasticity (STDP). In recent work it was shown how STDP can be used in a deep convolutional network architecture of IF neurons to extract hierarchical features from natural images. In contrast to previous work, where every layer was trained successively, we show how all layers of the network can be trained simultaneously, which allows approximate online classification already very early during the learning process. Due to the spike-based nature of learning and inference, our architecture uses only a comparably small number of local computations. We show that it is possible to train the network without providing any information about the structure of the input data, such as the number of classes and the duration of image presentation. These properties could make our implementation suitable for energy-efficient, unsupervised learning on a continuously growing, unlabeled database or on continuous video streams. | Download | |

Validation Methods for Neural Network Simulation | 075 | DS3-449 | Tuesday | GUTZEN | Robin | Neuroscience as an evolving field is in the quite rare situation that the amount of models and theories about the various functionalities of the brain is contrasted against a constantly growing body of experimental evidence. In this state of research, the role of neural network simulations to link theory and data gains importance. There is a large variety of simulators and simulator frameworks (e.g., NEST, BRIAN, NEURON, SpikeNET) which may differ strongly in their internal models used for of computation and the implications that come with it. Hence there is a high demand for a thorough understanding of these simulation engines that are used to generate simulated network activity data, in particular with respect to their accuracy. For a proper evaluation of simulations, new tools have to be developed in order to perform such validations, in an accessible and readily reproducible fashion. However, the comparison can not simply be done in a spike-to-spike manner for a number of reasons: neuronal spiking is stochastic, and competing implementations of algorithms or differences in the numerical processing may cause deviations in the precise output of the simulations. Instead, the simulations have to be evaluated in a statistical sense and yield quantifiable measures to characterize significant identity or difference of model and experiment or different models. Thus, we deal with the question of how to properly validate neural network simulations? As a test case, we chose the validation of a neuronal network simulated on the neuromorphic hardware SpiNNaker against the same simulation carried out using the NEST simulator software as reference [1]. The NEST simulator is an open source software project developed by the NEST initiative (http://www.nest-initiative.org) and features exact numerical integration of the dynamics. The SpiNNaker system, located in Manchester, UK, is a neuromorphic architecture consisting of millions of cores which can perform efficient network simulations on a hardware level. Since this operation mode is inherently different from conventional software simulations and has some constrictions regarding, e.g., the fine temporal resolution of spikes, the validity of such simulations with respect to NEST is not immediately given. The starting point of the validation of SpiNNaker with NEST are the results of a model simulation of the canonical microcircuit model [2] which was performed on both platforms. The results are given in form of recorded spiking activity. We concentrate on validating the results by comparing measures describing the single neuron statistics (firing rate, coefficient of variation of the interspike intervals (CV)) as well as the correlation structure in the simulated network as measured by the pairwise correlation coefficients between all spike trains. In a first approach, we chose to compare the outcomes in form of the distributions of the measures and tested the suitability of a variety of statistical two sample tests (Kolmogorov-Smirnov-Distance, Mann-Whitney-U, and Kullback-Leibler-Divergence) using the network simulations and complemented by stochastic spike train simulations. However, such an analysis of correlation coefficients alone is not able to give insights about which neurons are involved in the correlations and if there are higher order correlations present. Therefore, we assess the correlation structure using an eigenvalue decomposition of the correlation matrix. We present an approach to use the eigenvalue decomposition to reorder the neurons with respect to their correlation strength to identify groups of highly correlated neurons. The goal of the development and pooling of these validation methods is to provide a flexible toolbox which is not tailored towards one specific application but may be used in a broad group of validation cases. In future work, we aim at describing the dependence of the validation approaches on the type of the network simulation, the number of recorded neurons, the simulation duration, the features of the model, the reference mode, and the scientific question behind the analysis. References: [1] Senk, Johanna, et al. ”A Collaborative Simulation-Analysis Workflow for Computational Neuroscience Using HPC.” Jülich Aachen Research Alliance (JARA) High-Performance Computing Symposium. Springer, Cham, 2016. [2] Potjans, Tobias C., and Markus Diesmann. ”The cell-type specific cortical microcircuit: relating structure and activity in a full-scale spiking network model.” Cerebral Cortex 24.3 (2014): 785-806. | Download | |

A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series | 077 | DS3-451 | Tuesday | CHAMBON | Stanislas | Sleep stage classification or sleep scoring is of considerable importance in the diagnosis of sleep disorders since it constitutes the preliminary step to any further medical exam. Based on a polysomnography i.e. the record over a night of electroencephalograms (EEG), electro-occulograms (EOG), electrocardiograms (ECG) and electromyograms (EMG) principally, a medical expert assigns to each 30s of signal a sleep stage. Automatic approaches have driven much attention to provide at least an auxiliary help to human scorers. In this work, we introduce the first end to end learning approach that performs temporal sleep stage classification from PSG signals. We build a general architecture which can extract information from both EEG, EOG channels and EMG modalities and pools this information into a learnt softmax classifier. Furthermore, the architecture is light enough to be distributed through time and to grasp the temporal context of the problem. Experiments done on about 60 PSG records, with up to 20 EEG channels, reveal that classification performance measured with balanced accuracy improves as a function of the spatial dimension. Our model which is unique in his ability to make the best from multiple modalities is compared to alternative automatic approaches and delivers state-of-the-art classification performances. On top of that, it reveals the spatial temporal distribution of discriminant neural signatures and offers insights on sleep stage mechanisms. | Download | |

Smart City Analytics: Prediction of Citizen Home Care | 079 | DS3-478 | Tuesday | HANSEN | Casper | The city of Copenhagen in Denmark collects a rich and continuously increasing repository of data relating to its citizens, which currently remains largely untapped. Analyzing this data can help the city streamline its services, improving overall social welfare. However, despite its considerable potential, this data is not trivial to process, because of its very large scale, non-stationarity, and general lack of structure. This poster will present ongoing work related to predicting the service level of home care a citizen needs. We have access to unique data identifying each individual service a citizen receive, a digital free-format-text journal on each citizen kept by the visiting personnel, and hospital records. The data consists of more than 40,000 citizens and is obtained starting from April 2013 up to now. The ongoing work is oriented around the utilization of evolving techniques for handling drift in the data using ensemble methods. There is a lot of existing related work on investigating ensemble architectures with different updating strategies, where we focus on structural updating, i.e. how to dynamically add and remove new learners. Early results are promising and we are working actively with the city for utilizing the results in practice. | Download | |

Feature Selection for Learning Performance Models of Electrical Stimulation for Spinal Cord Injury | 081 | DS3-480 | Tuesday | FELDMAN | Ellen | Epidural spinal cord stimulation (SCS), in which implanted arrays of electrodes deliver electrical signals to spinal cord neurons, is a promising therapy for spinal cord injury (SCI). This approach enables human paraplegic patients to stand and regain partial control of leg movements, while making gains in lost autonomic function. Several parameters of the stimulation may be modified, including the choice of active electrodes, their polarities (positive, negative, or neutral), and the amplitude, frequency, and pulse width of the pulse trains applied to the active electrodes; these not only must be optimized for every patient individually, but may also vary with time. This work links computational models of epidural SCS to experimental data obtained by testing paraplegic patients’ standing performance under a range of stimulation parameters. Each set of parameters is simulated via finite element analysis to estimate the electrical activity in the spinal cord and surrounding tissues near the implant. Several types of features are then extracted from the simulation results over a range of voxel sizes. Using regression and feature selection techniques such as random forests and elastic nets, we identify the most informative electric field features (i.e. correlated with good patient motor responses) and the most important spinal cord regions to stimulate. In addition, we find that the most informative stimulating features agree with results from nerve fiber theory. Finally, we employ Gaussian process regression together with the simulation results to predict the performance of stimuli that were not tested in the patients. This procedure is applied toward suggesting additional stimulation patterns that have a sizeable probability of yielding high performance in the patients. Further applications of our work include developing algorithms to optimize stimulation configurations for SCI patients, determining optimal electrode placement, and considering novel electrode array designs. Addressing these problems may require estimating the optimal electric field for a patient; thus, we are investigating generative models to capture the joint probabilistic distribution of the features and patient responses. Stimuli could then be optimized to achieve the electrical field closest to the estimated optimum. | Download | |

Feature Extraction From Sensor Dynamics in an Electronic Nose | 085 | DS3-487 | Tuesday | MAHO | Pierre | A non-selective chemical sensor can interact with a large quantity of molecules. An electronic nose is a bio-inspired device which is composed of several non-selective chemical sensors. Alone, a sensor won’t be informative but together they can produce a unique signature, a bar-code, of an odorant volatile molecule. In practice, some measurements are taken beforehand, in order to create a training set, after which machine learning algorithms are used to efficiently recognize odours. Aryballe Technologies is a French start-up which develops a new generation of electronic noses. Their device is based on a grid of several dozens chemical sensors whose interactions with odorant molecules are measured using the principle of surface plasmon resonance imaging. This technology visualises odours as images, providing a promising new way to process this kind of data. In real-life conditions, the detection of volatile molecules is quite a hard task, due to odorant molecules mixtures (several molecules present at the same time and in various concentrations) and environmental turbulences. Thus, the success of an electronic nose is greatly dependent on the development of efficient machine learning algorithms that increase robustness and selectivity. This presentation will introduce these new olfactive data and show our initial work on the development of several sensor models, which seek to increase reproducibility and enable feature selection. | Download | |

Pathological artificial neurons | 087 | DS3-490 | Tuesday | MAYER | Sebastian | The building blocks of artificial neural networks and projection pursuit algorithms are ridge functions, which are functions that vary only along one direction in space (given by the weight vector). Though it is apparently possible to learn huge networks of such ridge functions in many practical situations, this poster presents circumstances where it is even intractable to learn only one single ridge function. | Download | |

Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations | 089 | DS3-503 | Tuesday | ISCEN | Ahmet | Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlapping image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval. | Download | |

Polite Agent and Impolite Opponents: Natural Language Generation for Chatterbots through Sentiment-based Training by using Twitter data | 091 | DS3-506 | Tuesday | KHATUA | Aparup | Tech giants such as Microsoft or Facebook as well as data scientists are exploring various semi-supervised learning methods to build conversational agents - commonly known as Chatterbots. However, it is worth noting that in reality outcomes of these efforts fail to match the expectation – especially when the opponent/human is using an impolite tone. This work attempts to address this shortcoming. The main contribution of this work will be developing a model of natural language generation through sentiment-based training. So, in our model, if the opponent/human says something in an impolite manner (say angry, or complaining tone) on a particular topic, then our chatterbot agent will provide a different emotion (say optimistic tone) in response to nullify the amplifications of impoliteness in the conversation. We are considering microblogging platform, such as Twitter, which generates an enormous amount of user-generated contents to train our model because a conversation between two social media users, through tweets and retweets, can be an effective training dataset for our research question. More importantly, conversations on Twitter platform displays all possible ranges of emotions such as anger, sadness, or happiness. Initially, we are developing our model in the political domain by using our Twitter datasets ranging from 2014 Indian election, 2015 Singapore election, 2015 UK election, 2016 Brexit referendum to the ongoing 2017 French election for training purpose. Cumulatively we have around 15 million tweets from these events. We will develop our model using generative architecture for closed domain short conversation. | Download | |

Moving Least Squares Support Vector Machines for weather temperature prediction | 093 | DS3-526 | Tuesday | KAREVAN | Zahra | Local learning methods have been investigated by many researchers. While global learning methods consider the same weight for all training points in model fitting, local learning methods assume that the training samples in the test point region are more influential. In this study, we propose Moving Least Squares Support Vector Machines (M-LSSVM) in which each training sample is involved in the model fitting depending on the similarity between its feature vector and the one of the test point. The experimental results on an application of weather forecasting indicate that the proposed method can improve the prediction performance. | Download | |

Sequence Modelling For Analysing Student Interaction | 095 | DS3-532 | Tuesday | HANSEN | Christian | The analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This poster will present initial work which is accepted for the international conference on educational data mining 2017 (EDM), in addition with results on more complex datasets than presented here. The poster presents an unsupervised clustering method for log data in online systems, which is useable for initial investigation of user behaviors. User behaviors are modelled as a distribution over Markov chains leading to easily interpretable models by humans. The method is applied on extensive log data from the company Edulab, who is the largest provider of online math education in Denmark. | Download | |

An Approach for Machine Learning-Based Contouring of Daily CBCT with Planning CT as Prior | 097 | DS3-534 | Tuesday | BRION | Eliott | To illustrate its use in medical imaging, the goal of this poster is to show how deep learning can automatically contour healthy organs and tumors in CT scans. As 3 in 10 Europeans will develop cancer before their 75th birthday, we must improve treatment. Proton therapy is a promising treatment since it kills cancerous cells with high accuracy, leaving the neighboring healthy organs undamaged. However, its wider adoption is still hampered by two challenges: the uncertainty of the proton’s energy deposition along its path (or density changes) and the uncertainty in target’s position (the geometrical changes). The poster focuses on this second challenge; for which fast, robust and autonomous (i.e. with minimal external user intervention) contouring is critical. The good news is that a new set of algorithms called deep learning now allow to do that. Roughly speaking, deep learning works by learning representations of already contoured images with multiple levels of abstraction. We will show how it has been successfully applied in recent research and why the access to labeled data is so crucial. | Download | |

Information transfer for learning in non-stationary environments | 099 | DS3-547 | Tuesday | MURENA | Pierre-Alexandre | Traditional machine learning setting consists in learning a concept from a learning data set and applying the learned concept on a test data set supposed to be independent from the learning data set but equally distributed. In practice, this hypothesis does not always hold and some non-stationary environments introduce changes in the distributions (concept drift). Two classes of problems belong to this non-stationary category: transfer learning and incremental learning. In both of them, the acquired knowledge has to be transferred and slightly modified to fit new environments. We present a framework for learning in non-stationary environment based on the notion of algorithmic complexity introducing the idea of minimal transfer of information. | Download | |

Certificate Achievement Unlocked: How does MOOC learners' behaviour change? | 101 | DS3-554 | Tuesday | ZHAO | Yue | Massive Open Online Courses (MOOCs) play an ever more central role in open education. However, in contrast to traditional classroom settings, many aspects of learner / user behavior in MOOCs are not well researched. In this work, we focus on modelling learner behavior in the context of continuous assessments with completion certificates, the most common assessment setup in MOOCs today. Here, learners can obtain a completion certificate once they obtain a required minimal score (typically somewhere between 50-70%) in continuous tests distributed through the duration of a MOOC. In this setting, the course material or tests provided after "passing" do not contribute to earning the certificate, thus potentially affecting learner's behavior. Therefore, we explore how ``passing'' impacts MOOC learners: do learners alter their behaviour after this point? And if so how? While in traditional classroom-based learning the role of assessment and its influence on learning behaviour has been well-established, we are the first to provide answers to these questions in the context of MOOCs, providing valuable insights which can be used to design better courses in the future. As a result, we present a set of core behaviour patterns based on our extensive exploratory analysis of the log traces of more than 4,000 certificate-earning learners across four edX MOOCs. | Download | |

Sentiment Analysis of Weakly Labeled Social Media -- First Person Vision Streams | 103 | DS3-560 | Tuesday | VARINI | Patrizia | It is introduced an approach to draw dominant sentiment maps of main sites of interest in Cultural or Art Italian Cities, analyzing egocentric streams extracted from Social Media repositories, jointly with texts extracted from audios by automated recognition speech systems. For a specific art city, we first extract from YouTube repository videos captured in that location, using expanded queries on their metadata, and filter only egocentric or hand-held camera captured streams. To classify sentiment patterns from streams and their audios, spatio - temporal features and semantic features are extracted, respectively from video and subtitles from YouTube ASR, and combined in a joint embedding feature space. To extract video features, the activations from last dense layer of a 3D CNN trained on motion and frame visual assessment were exploited, while semantic features were obtained using well known word2vec approach, on a collected dataset of 42 videos with supervised annotations. | Download | |

Factors Identification for Mitochondrial DNA Substitution Models | 105 | DS3-574 | Tuesday | LEVINSTEIN HALLAK | Keren | Mitochondrial DNA (mtDNA) is a small fragment of the DNA in eukaryotic cells, located in the mitochondria. It is widely used in many fields such as genetic genealogy, medical genetics and even forensic science. One of its key features is that it is inherited solely from the mother and therefore does not go through recombination. Subsequently, the accumulation of mutations along maternal lineages is the cause for mtDNA sequence variation. This variation can be used for the reconstruction of a phylogenetic tree based on parsimony and maximum likelihood methods. A recent study had constructed an updated comprehensive phylogeny of global human mtDNA variations, based on coding and control region mutations. Even though this highly reliable phylogenetic tree is available, the substitution mechanism in mtDNA is not yet fully understood. We propose to use this comprehensive phylogeny to research different substitution models of mtDNA and answer some open questions on common assumptions. The improvement in the amount of available data increases the power of performed statistical tests allowing to test more complicated models than these previously suggested, but also requires novel approaches (both statistical and computational) for testing statistical hypotheses on large scale data. | Download | |

Distributed Probabilistic Forecasting for New Energy Systems Operation | 107 | DS3-644 | Tuesday | LE CADRE | Hélène | This presentation is focused on the role of information which is essential in new energy systems, where a balance has to be constantly found between maintaining privacy and increasing the global system efficiency. We start by introducing the methodological framework which relies on Prediction Interval based Extreme Learning Machines coupled with automatic feature selection based on minimal Redundancy Maximal Relevance, a criterion derived from Mutual Information. We provide a data fusion approach which combines probabilistic forecasters while meeting the Prediction Interval confidence level. Performance of the method is evaluated analytically and illustrated on three case studies: a) distributed solar PV power production forecasting at the regional scale, b) day-ahead market price forecasting, and c) a peer-to-peer model for solar PV power energy trading between microgrids. | Download | |

The application of Machine Learning for prediction of missile aerodynamic coefficients | 109 | DS3-054 | Tuesday | BUDIDETI | Jyotsna | Currently complex mathematical models using the principles of Finite Element Methods are most commonly used in Computational Fluid Dynamics to estimate aerodynamic properties for a given model. And it is well known that these estimates have a margin of error largely due to the ideal assumptions considered and negation of few practical features that have an impact on the actual performance of a missile. Hence, we propose an alternative approach to calculating the aerodynamic coefficients using Machine learning methods on data generated from wind tunnel tests, geometrical data, and historical data. Here we propose a Neural Networks approach. The aim to to generate reasonably accurate results at a much less time compared to existing CFD techniques. | ||

Regularizing Text Categorization with Clusters of Words | 111 | DS3-204 | Tuesday | SKIANIS | Konstantinos | Harnessing the full potential in text data has always been a key task for the Data Science community. The properties hidden under the inherent high dimensionality of text are of major importance in numerous tasks such as text categorization, question answering or conversational agents. In this poster we are going to present how we can extract rich text representations that can a) be visualized and show interesting properties, b) used as better features for machine learning tasks and last c) used as good structures for group regularization. | Download | |

Named entity linking with graphical models and deep learning : survey and exploration | 113 | DS3-244 | Tuesday | KHALIFE | Sammy | The task of named entity discovery (NED) in machine learning and natural language processing means the ability for a program to extract some pre-defined sets of words in a vocabulary (called named entities : names, places, locations, ...), and then to identity them by linking to a pre-existing database. The first subtask is called named entity recognition (NER) and is not trivial since we don't have an exhaustive list of this named entities, moreover their text representation can change (For example B. Obama instead of Barack Obama). The second task is named entity linking (NEL). In this works we will focus on NEL and review existing algorithms : the first category gathers graphical models (including graphs and PGMs), and the second one deep learning approaches. To do so, we will propose to see how relevant these algorithms can be for this task, compare their efficiency and propose new algorithms for named entity linking. | Download | |

Aircraft Dynamics Identification | 115 | DS3-407 | Tuesday | ROMMEL | Cédric | It is well-known that one of the main goals of civil aviation operators is to reduce aircraft fuel consumption as much as possible. One option for doing so is to optimize flight trajectories with respect to the aircraft performance. Our work focuses on the problem of minimizing fuel consumption during climb trajectories of civil aircraft, which can be mathematically modeled as an optimal control problem. Such problem involves the aircraft dynamical behavior, which motivates the search for accurate dynamical systems identification techniques, the main topic of this work. According to the literature, the most widely used approaches for aircraft parameters estimation are the Output-Error Method and Filter-Error Method, based on the main ideas of measurement error minimization and state dynamics re-estimation (see for example R. V. Jategaonkar 2006 and R. E. Maine and K. W. Iliff 1986). Recent advances include using neural networks for the state estimation part (see N. K. Peyada and A. K. Ghosh 2009). On the other hand, renewed interest for the older Equation-Error Method has also been observed (see E. A. Morelli 2006). We propose in this work a variation of the later. Adopting a statistical point of view, we state a regression formulation of our problem and solve it using a Maximum Likelihood based technique. We illustrate our method with numerical results based on real flight data. | Download | |

ConvSCCS: a convolutional self-controlled case series model for lagged adverse event detection in large databases | 117 | DS3-190 | Tuesday | MOREL | Maryan | With the increased availability of large electronic health records databases comes the chance of enhancing health risks screening. Machine learning could lead to major improvements in postmarketing adverse drug effect detection, as the current process rely on physicians' spontaneous reports. However, the complexity of this task requires new statistical models. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features on a rare longitudinal outcome. Our model is based on a conditional Poisson model known as self-controlled case series (SCCS). SCCS models are computationally efficient in rare events settings and robust to non-longitudinal confounders. While the original SCCS model requires to specify a priori risk periods, we propose to learn it with flexible regularized step functions. Its simple formulation allow us to use fast stochastic proximal algorithms to learn the parameters efficiently. Simulations show that we outperform competing models in terms of mean squared error. We applied the new method to a large dataset of diabetic patients from the famous French national health insurance system SNIIRAM database, and show that we are able to detect a well known drug adverse effect. | Download | |

Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging | 119 | DS3-051 | Tuesday | ALAVERDYAN | Zaruhi | Computer aided diagnosis (CAD) systems are designed to assist clinicians in various tasks, including highlighting abnormal regions in a medical image. A common approach consists in training a voxel-level binary classifier on a set of feature vectors extracted from normal and pathological areas in patients' scans. However, many pathologies (such as epilepsy) are characterized by lesions that may be located anywhere in the brain, have various shapes, sizes and texture. An adequate representation of such a heterogeneity requires a significant amount of annotated data which is a major issue in the medical domain. Therefore, we built on a previously proposed approach that considers epilepsy lesion detection task as a voxel-level outlier detection problem. It consists in building a oc-SVM classifier for each voxel in the brain volume using a small number of clinically-guided features. Our goal now is to make a step forward by replacing the handcrafted features with automatically learnt representations using neural networks. We propose a novel version of siamese networks trained on patches extracted from healthy patients' scans only. This network, composed of stacked autoencoders as subnetworks, is regularized by the reconstruction error of the patches. It is designed to learn representations that bring patches centered at the same voxel localization 'closer' with respect to the chosen metric (i.e. cosine). Finally, the middle layer representations of the subnetworks are fed to oc-SVM classifiers at voxel-level. The method is validated on 3 patients' MRI scans with confirmed epilepsy lesions and shows a promising performance. | Download | |

On the Troll-Trust Model for Edge Sign Prediction in Social Networks | 002 | DS3-014 | Wednesday | LE FALHER | Géraud | In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled. | Download | |

The development of mirror neurons representing facial expression: a computer modelling study | 004 | DS3-035 | Wednesday | SALARIS | Ilenia | Previous research has shown that humans automatically and spontaneously show facial response patterns that are congruent to viewed emotional facial expressions. Moreover, similar neural substrates seem to be recruited and co-active in the production as well as the observation of emotional facial expressions. It has been suggested that the computational mechanisms that may underlie these vicarious emotional activations could be based on the development of mirror-like neurons that emerge through associative learning mechanisms. In this work we model the development of mirror neurons that encode facial expressions in the infant brain during interaction with either of its parents. We show how temporally correlated imitation of facial expressions in early social interactions could drive the development of mirror neurons in the infant using Hebbian learning. Here we present an overarching, self-organised neural network model that incorporates a visual module composed of a hierarchical model of successive neuronal layers and a motor module that represents the current facial expression of the infant. The simulations show that after training, the output neurons in our network learn to respond selectively to a preferred facial expression (e.g. Happy or Sad) regardless of whether the infant generates the expression or the infant sees the parent displaying that expression. More importantly, we explore the development of such neuronal responses across varying degrees of correlation and temporal lags between the seen and produced facial expressions. | Download | |

Measuring Sustainability Reporting using Web Scraping and Natural Language Processing | 006 | DS3-074 | Wednesday | SOZZI | Alessandra | Nowadays the Web represents a medium through which corporations can effectively disseminate and demonstrate their efforts to incorporate sustainability practices into their business processes. This led to the idea of using the Web as a source of data to measure how UK companies are progressing towards meeting the new sustainability requirements recently stipulated by the United Nations. The project involves the development of a web scraping program able to collect sustainability-related web pages from websites of a sample of the 100 largest UK private companies (ranked by their latest sales) and the use of Latent Dirichlet Allocation (LDA) to identify common topics from the data collected. | Download | |

Deep Learning for Electricity Price Forecasting | 008 | DS3-083 | Wednesday | LAGO GARCIA | Jesus | In recent years, renewable energy sources have gained a large share of the world’s energy production. While they largely contribute to build a more sustainable world, they also pose a great challenge to the grid stability. In particular, as massive storage of electric energy is economically unfeasible, electricity price is adjusted according to the real-time demand and supply; then, since the production from renewable sources depends on weather conditions and is generally quite uncertain, electricity prices become unpredictable, the energy market more volatile, and the grid more unstable. A possible way to prevent this and to safeguard the profitability of renewable sources is to implement smart bids in the spot energy market. In particular, by forecasting energy prices in advance, market players trade energy to maximize profit, i.e. buyers purchase when prices are low (low demand) and sellers sell when prices are high (high demand), resulting in turn in a self-balanced market. | not made available (presenter's request) | |

Bayesian Computation for Semi-continuous Longitudinal Outcomes with Non-ignorable Missing Data | 010 | DS3-092 | Wednesday | JIANG | Depeng | Many missing data in behavioral, medical, social, and psychological research are nonignorable in the sense that the missing data depend on the observed data and the missing data themselves. This study proposes a Bayesian computation methods for handling nonignorable missing data (m-part) with semi-continuous outcomes in a longitudinal study. In the Bayesian approach, we employ the useful strategy that combines the idea of data augmentation and application of MCMC methods. The proposed Bayesian SEM approach was applied to a longitudinal study of workers with work-relevant musculoskeletal disorders, to show how the new approaches can overcome the problems of current available statistical methods and help to identify the distinct trajectories of worker productivity loss and the associated prognostic factors. Nonignorable missing data models have been developed on the basis of both likelihood method and Bayesian approaches. The computational advantages of Bayesian over likelihood method will be discussed. | Download | |

Convolutional Neural Networks for Galaxy Parameter Estimation | 012 | DS3-095 | Wednesday | TUCCILLO | Diego | The characterization of the structure of galaxies as inferred from their photometrical brightness profiles is a powerful tool in astronomy. Having the parameters decomposition of large data-samples of galaxies with different cosmic ages, allow a pletora of studies on galaxy evolution and relationship between different components. The era of the big data in astronomy is marked by the numerous current and future large area surveys like EUCLID, the Large Synoptic Survey Telescope (LSST), the Wide Field Infrared Survey Telescope (WFIRST). These surveys will decuple in a few years the volume of data that can be exploited for galaxy morphology studies, offering a unique opportunity to constrain models and infer properties of galaxies. The fully potencial of these surveys can be unlocked only with the development of automated, fast and reliable softwares to describe the galaxy structures. We present a Convolutional Neural Network that we developed for profile fitting of one and two component galaxies. Our code is able to retrive a complete set of galaxy parameters like: radius, magnitude, Sercic index, position angle, ellipticity, B/T of Bulge and Disk of the galaxy. Comparison with other profile fitting code demostrate that our machine is faster and reliable. Making it ideal for large dataset-studies. | Download | |

Online Adaptive Clustering Algorithm for Load Profiling | 014 | DS3-172 | Wednesday | LE RAY | Guillaume | On the one hand, the increasing share of renewable energy sources engenders an increase intermittency of the production. On the other hand Consumers' behaviors are becoming more complex and dynamic (i.e. Electric vehicles, PVs, Heat pump). In the same time, the production should meet the demand at all time. Typical load profiles have been traditionally used by the DSO and energy providers to get insights on the demand behaviors. However, they have been used as a static tool as a consequence of a lack of data. The deployment of advanced metering infrastructure, generating at a high(er) frequency, changes the paradigm and we transition from a scarcity of data to a profusion. In the context of smart grid, cluster based dynamic load profiles is a suitable tool to provide feedback about the load behavior. The method introduced here combines a consensus clustering and an online adaptive clustering. | Download | |

Graph Clustering Performance | 016 | DS3-124 | Wednesday | MIASNIKOF | Pierre | Graph clustering and network community detection is a topic that has gained much attention recently. Indeed, many graph clustering/community detection algorithms have appeared in the recent literature. However, performance evaluation of these clustering algorithms remains an open problem. Clustering on graphs can be broadly categorized as an unsupervised learning task, for which we do not benefit from the benchmarks provided by pre-labeled or pre-clustered data sets. To address this lack of performance measurements, many authors test their algorithms on ``ground truth'' data sets. These data sets, typically drawn from social networks, are instances where individuals, modeled as graph vertices, have identified their community affiliations (clusters). While this reliance on ``ground truth'' data sets does indeed provide objective reproducible performance measurements, it does not guarantee the algorithm will perform similarly well on an unlabeled data set. Arguably, the quality of a clustering returned by a specific algorithm on an unlabeled data set is only assumed to be accurate, because the algorithm performed well on another data set. We introduce statistical measurements of clustering performance, which can be applied to any unlabeled graph/network data set, with overlapping or non-overlapping clusters. Our suggested measurements allow for the objective comparison of algorithm performance on single data sets and across different data sets. In both cases, they help determine if the clusterings returned by an algorithm are significantly different from a random partitioning of vertices. Estimating the number of clusters (communities) on a graph is another open problem. In fact, many clustering algorithms require the number of clusters as an input parameter. In such cases, the current practice is to begin with an ``educated guess'' and iteratively re-apply the clustering algorithm with different inputs, until reasonable results are obtained. This iterative process is very time-consuming and may be infeasible when dealing with very large data sets. Here, a suitably estimated starting point estimate may be useful. Also, for algorithms that do not require the number of clusters as input parameter, having an estimate of the number of clusters provides an additional benchmark for clustering accuracy. The eigengap heuristic has been suggested as a possible estimate for the number of clusters. Unfortunately, this heuristic approach relies on the spectral decomposition of Laplacian matrices, a very costly operation. We review one spectral approximation technique from the literature and also propose our own. The former, drawn from the literature, makes use of the Gershgorin theorem and only estimates bounds on eigenvalues, without explicitly computing them. Our own approximation technique is based on random sampling of the adjacency matrix. Initial results suggest our sampling approach provides a better approximation of the spectra. While our study has just recently begun, we are collaborating with a bank, which has provided us with a very large real-world network data set. We anticipate applicable results shortly. | Download | Pierre Miasnikof^{1*}, Alexander Y. Shestopaloff^{2}, Yuri Lawryshyn^{1}, Anthony J. Bonner^{3}. ^{1}University of Toronto, Dept. of Chemical Engineering and Applied Chemistry, Toronto, ON, Canada. ^{2}The Alan Turing Institute, London, United Kingdom.^{3}University of Toronto, Dept. of Computer Science, Toronto, ON, Canada. ^{*}Offers thanks to Prof Derek Corneil of the University of Toronto Dept. of Computer Science and and Amit Bermanis of the University of Toronto Dept. of Mathematics.Research funded by a MITACS-CIBC Accelerate grant. |

Toward Emotional Intelligence Machines | 018 | DS3-130 | Wednesday | AL CHANTI | Dawood | Faces convey a wealth of social signals. Although it is a single object, it conveys many socially important characteristics such as Identity, Age, Sex, Expression, Lip-speech. Problems involving sets of mutually related information called multi-modal signals. Building a multi-modal model that is jointly capable of revealing information that is otherwise hidden when considering the different modalities independently can be exploited and intuitively used. In order to efficiently represent and decompose multi-modal data, we advocate the use of deep learning approach based on convolutional neural network, autoencoder, sparse representation and long short term memory to decompose and extract salient information jointly which is dedicated to the following application: spontaneous and dynamic facial expression recognition, Identity recognition via extracting the neutral part from expressive images, age and sex recognition. Moreover, the model has to exploit the spatiotemporal information. Our approach will be compared with the classical computer vision techniques which are based on extracting hand-crafted features through spatiotemporal descriptors, for instance, 3DSIFT, 3DHOG and GIST, and the use of the bag of visual word approach for video representation. | Download | |

Embedded Bandits for Large-Scale Black-Box Optimization | 020 | DS3-145 | Wednesday | AL-DUJAILI | Abdullah | Random embedding has been applied with empirical success to large-scale black-box optimization problems with low effective dimensions. This work proposes the EmbeddedHunter algorithm, which incorporates the technique in a hierarchical stochastic bandit setting, following the optimism in the face of uncertainty principle and breaking away from the multiple-run framework in which random embedding has been conventionally applied similar to stochastic black-box optimization solvers. Our proposition is motivated by the bounded mean variation in the objective value for a low-dimensional point projected randomly into the decision space of Lipschitz-continuous problems. In essence, the EmbeddedHunter algorithm expands optimistically a partitioning tree over a low-dimensional---equal to the effective dimension of the problem---search space based on a bounded number of random embeddings of sampled points from the low-dimensional space. In contrast to the probabilistic theoretical guarantees of multiple-run random-embedding algorithms, the finite-time analysis of the proposed algorithm presents a theoretical upper bound on the regret as a function of the algorithm's number of iterations. Furthermore, numerical experiments were conducted to validate its performance. The results show a clear performance gain over recently proposed random embedding methods for large-scale problems, provided the intrinsic dimensionality is low. | Download | |

Better Boosting with Bandits | 022 | DS3-151 | Wednesday | NIKOLAOU | Nikolaos | Probability estimates generated by boosting ensembles are poorly calibrated. The very reason that makes AdaBoost a successful classifier, namely its margin maximization property, is also responsible for its poor performance as a probability estimator, as it forces the ensemble to produce probability estimates that tend towards 0 or 1. Therefore, the outputs of the ensemble need to be properly calibrated before they can be used as probability estimates. In batch learning calibration is achieved by reserving part of the training data for training the calibrator function. In an online setting, a decision needs to be made on on each round: shall the new example be used to update the parameters of the ensemble or those of the calibrator. In this work we resolve this decision with the aid of bandit optimization algorithms. We demonstrate superior performance to uncalibrated, naively-calibrated and cost-sensitive on-line boosting ensembles in probability estimation and cost-sensitive classification tasks. | Download | |

Random Subspace with Trees for Feature Selection Under Memory Constraints | 024 | DS3-062 | Wednesday | SUTERA | Antonio | Dealing with datasets of very high dimension is a major challenge in machine learning. In our work, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach. | Download | |

Forward Event-Chain Monte Carlo -- Fast sampling by randomness control in irreversible Markov chains | 026 | DS3-167 | Wednesday | MICHEL | Manon | Bayesian inference of complex statistical models offer clear assets, such as the possibilities to separate the modelling assumptions from the inference process and to take into account uncertainty. Markov Chain Monte Carlo (MCMC) methods are powerful techniques for implementing the computation of Bayesian estimates. However, these simulation schemes are often challenged by the multimodality and the high-dimensionality nature of the target distribution, resulting in a poor exploration of the latter. Alternatives, as Hamiltonian Monte Carlo, provide more efficient framework but are impeded by the implementation of a still reversible scheme and by the tuning of several parameters. Building on insightful irreversible Monte Carlo schemes developed in Physics, we propose an original irreversible Markov Chain Monte Carlo (MCMC): the Forward Event-Chain Monte Carlo. This method is rejection-free, parameter-tuning-free and provides a continuum of valid samples. Moreover, numerical experiments demonstrate the efficiency of the proposed approach where accelerations up to several magnitudes compared to state-of-the-art methods are exhibited. | Download | jointly presented with Stephane SENECAL (DS3-169) |

Zap Q-learning | 028 | DS3-642 | Wednesday | DEVRAJ | Adithya | The Zap Q-learning algorithm is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases. | Download | |

MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data | 030 | DS3-189 | Wednesday | SELLA | Nadir | We present a web server running the MIIC algorithm, a network learning method combining constraint-based and information-theoretic frameworks to reconstruct causal, non-causal or mixed networks from non-perturbative data, without the need for an a priori choice on the class of reconstructed network. Starting from a fully connected network, the algorithm first removes dispensable edges by iteratively subtracting the most significant information contributions from indirect paths between each pair of variables. The remaining edges are then filtered based on their confidence assessment or oriented based on the signature of causality in observational data. MIIC online server can be used for a broad range of biological data, including possible unobserved (latent) variables, from single-cell gene expression data to protein sequence evolution, and outperforms or matches state-of-the-art methods for either causal or non-causal network reconstruction. | Download | |

Inferring causal relationships in stochastic chemical reaction networks from single-cell snapshot time series | 032 | DS3-196 | Wednesday | KLIMOVSKAIA | Anna | Virtually all biological processes are driven by biochemical reactions. However, their mechanistic description in terms of stochastic chemical reaction networks is often precluded by the computational difficulty of structure learning, i.e. the identification of biologically active reaction networks among the combinatorially many possible topologies. We recently reported the reactionet lasso , a regression-based gradient matching approach that is capable of partial structure learning in biochemical networks. We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the systems of different size and complexity. For our study we assumed that all or most of the relevant molecular components can be measured. However, this approach cannot be readily applied in situations when large proportion the system (latent species) cannot be observed in practice. Ordinary or stochastic differential equations which formalize chemical dynamics can encode the parametric and structural form of causal interactions between the components. In many applications knowing the mechanistic structure is not essential and the causal structure would be enough. Therefore, we want to investigate how various causal inference techniques might be used for learning causal interactions from snapshot data generated by the dynamical systems. | Download | Anna Klimovskaia^{1,2,3}, Stefan Ganscha^{1,2,3}, Manfred Claassen^{1,2}. ^{1}Institute for Molecular Systems Biology, ETH Zurich, Zurich, Switzerland. ^{2}Swiss Institute of Bioinformatics, Zurich, Switzerland. ^{3}Life Science Zurich Graduate School, Zurich, Switzerland. |

Machine Learning in Astronomy | 034 | DS3-210 | Wednesday | ASCASO | Begoña | Many of the problems faced in Astronomy are not too different to those found in the world of Data Science. Often, Astronomers need to classify galaxies into different morphological types, predict the distance of a galaxy or its composition based on indirect data, detect emergent structures, etc. I will present a summary of some of the Machine Learning and Bayesian Statistics techniques developed to exploit the Astronomical data and solve these problems. | Download | |

Motion Compensation of Free-Breathing Myocardial Perfusion Data using RPCA | 036 | DS3-214 | Wednesday | SCANNELL | Cian | Myocardial perfusion MRI has been shown to possess huge potential for the diagnosis of coronary artery disease. Quantitative analysis of the data is desirable to reduce the time needed for a diagnosis and to make the process more accurate, reproducible and user-independent. The pre-processing of the data for quantification has long proven to be a bottleneck in the clinical adoption of the method. This work introduces an algorithm for motion correction using robust PCA and manifold learning. Automated anatomy detection is then explored using classification techniques such as an augmented k-means clustering and support vector machines. | Download | |

Predictive models for chronic care management | 038 | DS3-229 | Wednesday | AMADOU BOUBACAR | Habiboulaye | Chronic diseases are the leading causes of the diminished quality of life, the rising of hospital costs, and mortality. The recent advances of Machine Learning have enabled substantial progress with attractive results in many domains. We propose to describe a general framework for implementing predictive models using various healthcare data including socio-demographics, tele-monitoring records (vital signs, symptoms self-assessment), hospital data (medical events, lab tests, ...). Despite all the scientific challenges related to the missing of data, the rare events problem, our predictive approach shows promising results to upstream identify high-risk patients, to early detect the deteriorations and to prevent costly hospitalizations. As future work, we are considering scientific research efforts dealing with the lack of clarity about causal factors impacting chronic diseases. | ||

Random Forest for Regression of a Censored Variable | 040 | DS3-248 | Wednesday | LE FAOU | Yohann | In the insurance broker market, commissions received by brokers are closely related to the surrender of the insurance contracts. In order to optimize a commercial process, a scoring of prospects should then take into account this surrender component. We propose a weighted Random Forest model to predict the surrender factor which is part of the scoring. Our model handles censoring of the observations, a classical issue when working on surrender mechanisms. Through careful studies of real and simulated data, we compare our approach with other standard methods which apply in our setting. We show that our approach is very competitive in terms quadratic error to address the given problem. | Download | |

Simulating Individual Differences in Reading Acquisition using Convolutional Neural Networks | 042 | DS3-255 | Wednesday | WOLF | Henry | Simulating reading with dyslexia for personalized intervention Computational models of human language processing have been important tools in developing hypotheses about the cognitive processes that underlie language use. Models of reading have been used to hypothesize about the nature of language processing in the brain. Simulations based upon these models have shown some success in mimicking the cognitive effects displayed by human participants in behavioral studies (e.g. Plaut, McClelland, Seidenberg, & Patterson, 1996; Harm & Seidenberg, 2004). Due to the use of hand-coded representations, these models were largely limited to modeling one language, generally English. However, learning to read and the related effects differ between languages (Hino, Kusunose, Lupker, & Jared, 2013). In this project, we use convolutional neural networks to generate representations from images of text in multiple writing systems. Hyperparameters are varied to mimic both typical reading and reading with dyslexia. Targeted language interventions can be tested on models of readers with dyslexia before they are used with human children. Additionally, these simulations provide insights into how the brain is influenced by the writing systems themselves. | Download | |

Learning Macromanagement in StarCraft from Replays using Deep Learning | 044 | DS3-279 | Wednesday | JUSTESEN | Niels | The real-time strategy game StarCraft has proven to be a challenging environment for artificial intelligence techniques, and as a result, current state-of-the-art solutions consist of numerous hand-crafted modules. This poster show how macromanagement decisions in StarCraft can be learned directly from game replays using deep learning. Neural networks have been trained on 789,571 state-action pairs extracted from 2,005 replays of highly skilled players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting the next build action. By integrating the trained network into UAlbertaBot, an open source StarCraft bot, the system can significantly outperform the game's built-in Terran bot, and play competitively against UAlbertaBot with a fixed rush strategy. To our knowledge, this is the first time macromanagement tasks are learned directly from replays in StarCraft. While the best hand-crafted strategies are still the state-of-the-art, the deep network approach is able to express a wide range of different strategies and thus improving the network's performance further with deep reinforcement learning is an immediately promising avenue for future research. | Download | |

Generating Realistic Electricity Networks using Machine Learning | 046 | DS3-282 | Wednesday | DELORO | Yonatan | The goal of my internship at EDF R&D is to develop a prototype able to generate a wide range of realistic electricity networks (topology, characteristics of cables and loads) either satisfying desired input parameters (size, type of loads, ...) or matching structural properties of given networks samples. The poster will : - raise the challenges involved in the generation of information-rich and highly-constrained low-voltage networks, - feature some state-of-the-art methods to generate synthetic graph topologies given sample data and/or to predict their nodes attributes. | not made available (presenter's request) | |

The Mutual Autoencoder: Controlling Information in Latent Code Representations | 048 | DS3-293 | Wednesday | BUI THI MAI | Phuong | Variational Autoencoders (VAEs) learn probabilistic latent variable models by optimizing a bound on the marginal likelihood of the observed data. Beyond providing a good density model a VAE model assigns to each data instance a latent code. In many applications, this latent code provides a useful high-level summary of the observation. However, the usefulness of the code is not enforced by the VAE objective. Instead, it emerges as a side effect and depends on modelling choices such as decoder expressivity, latent dimension, etc. However, the VAE may fail to learn a useful representation when the decoder family is very expressive. Such decoders effectively make the latent structure unnecessary for achieving high log-likelihood values, and so the VAE learns to ignore it. We propose a method for explicitly controlling the amount of information stored in the latent code. We show that our method can learn models with latent codes ranging from independent to nearly deterministic, and is robust to the choice of a decoder and latent dimension. | Download | |

Shape Prior Generation using GAN for 3D-US Kidney Segmentation | 050 | DS3-295 | Wednesday | BERTRAND | Hadrien | Using the shape knowledge of an object is a common tool in image segmentation to constrain and guide the segmentation process. It is particularly present in medical imaging as the shape of organs is simple and well-understood. We propose here the construction of a shape prior for kidneys using a Generative Adversarial Networks. The task of segmentation is split into two parts: first a network that transforms the image into the latent representation expected by the generator network, then the generator taking this representation and constructing the segmentation. Both steps are done separately. We show here preliminary results for the generator. | Download | |

Data Fusion for Inertial-Centric Indoor Localisation | 052 | DS3-312 | Wednesday | KOZLOWSKI | Michal | This poster outlines the method in which ADL and RSS signatures from sparsely labelled data help infer the position of a person inside a residential house. The context of the activity is taken into account when estimating the gait and pose, whilst the RSS information helps to localise the individual. Fusing the two together helps paint a picture of the current location and activity at any given time. All of this is done by optimisation methods given a poorly labelled data set. Poster initially concentrates on the data collection campaign, before embarking on the outline of the methods used and the respective results obtained. | Download | |

Direct Load Control of Thermostatically Controlled Loads Based on Sparse Observations Using Deep Reinforcement Learning | 054 | DS3-328 | Wednesday | RUELENS | Frederik | A demand response agent must find a near-optimal sequence of decisions based on limited and imperfect sensor measurements of its environment. Extracting a relevant set of features from these raw sensor measurements is a challenging task and may require substantial domain knowledge. One way to tackle this problem is to store sequences of sensor measurements in the state vector, making it high dimensional, and apply techniques from deep learning. This work investigates how a Long Short-Term Memory (LSTM) network, a type of recurrent neural network, can be used to mitigate the curse of partial observability, and thus capturing the long-term temporal dependencies in the state vector. Our simulations demonstrate that an LSTM network can be successfully used as a function approximator within a batch reinforcement learning algorithm to find near-optimal control policy. | Download | |

Dynamic Analysis of Investor’s Community Sentiment: A Hawkes-Process Framework | 056 | DS3-341 | Wednesday | LE NY | Yoann | We represent the membership of agents in financial-community as a self-exciting hawkes-process. Contrary to other studies that consider the financial community on social media as a static entity, our dynamic approach reduces the size of the community at each time-period. We then extract the sentiment from this community. We show that this approach helps reducing the noise of the sentiment signal extracted and enhances its predictive power over financial markets movements. JEL Classifications: G55; G14 Keywords: Sentiment, Hawkes-Process, Temporal-Graph, Twitter | Download | |

Social-Network Analysis for Pain Medications: Infuential physicians may not be high-volume prescribers | 058 | DS3-347 | Wednesday | CHOUDHURY | Abhinav | According to the Institute of Medicine of the National Academies, more than 100 million Americans suffer from chronic pain related to diabetes, heart disease, and cancer combined. Adoption of pain medications and safe healthcare practices is a major global policy concern. This adoption process is highly influenced by the interpersonal network of physicians prescribing medications to treat pain. However, existing research into physician networks have been hospital-specific, applied to a smaller number of physicians, and dependent upon physicians’ self-reports. In this work, using big-data and data-mining, we overcome these limitations: By using a case of 30+ hospitals spanning across 2000+ physicians, we create a social network containing physicians’ prescription data and adoption behavior of pain medications. The social network assumes that connected physicians work in the same hospital and belong to the same specialty or specialty group. Then, using the centrality measures, degree and eigenvector centrality,we analyze prescription volumes and proportion of adopters of pain medications. We also analyze gender effects. Results revealed that the most influential physicians were not the high-volume prescribers. Males physicians were more influential compared to female physicians; however, females prescribed more volume compared to males. Our results help us identify critical physicians from certain core specialties and specialty groups who may be approached by patients seeking pain relief. | Download | |

Neural networks for computing power flow in high voltage transmission lines | 060 | DS3-353 | Wednesday | DONNOT | Benjamin | Power flow computations (also called “load-flow”) are widely used by TSO (Transmission System Operators) in charge of managing high voltage and very high voltage power grids. One the critical constraints of TSOs is to maintain the security of power grid materials. For instance lines must not overheat. Current tools to simulate the effect of incidents (such as a tree falling on a line) on the power grid include load-flow simulators, which evaluate the steady-state of the grid for a given productions and consumptions. The state of the grid estimated by load flow calculations include voltage magnitudes, reactive power values, current flowing on lines, etc. Currently for a grid of the size of France 10 000 load-flows are computed every 5 minutes, which is very already computationally demanding. This number will increase by several orders of magnitude in the years to come to accommodate new power planning tools to increase network capacity without building new lines and accommodate renewable energies. We propose a new method, based on machine learning to efficiently compute load-flows, substituting conventional simulators based on differential equation solvers. Our system comprises deep feed-forward neural networks trained with load-flows precomputed by simulation. Our architecture permits to solve the so-called (n-1) problem (in which load flows are evaluated for every possible line disconnection) using a technique bearing similarity with “dropout”, which we named “guided dropout”. We achieve a 300x speedup (using state-of-the-art GPUs) over the proprietary load-flow simulator of RTE (Réseau de Transport d’Électricité – the unique French TSO) in our preliminary simulations carried out on power grid simulations with up to 120 substations. This is achieved with a relative average absolute error of less than 0.01. The speedup gain increases with the size the grid and we are recently working on scaling up our computational tools to handle the full French power grid. | ||

Non parametric multi-task relative attribute learning | 062 | DS3-378 | Wednesday | ALAMI MEJJATI | Youssef | ‘’I want to see jackets which I think are stylish, but not too fancy’’ Two very common ways to explore large collections of imagery items, for instance, in online shopping, are to browse a hierarchy of items and to search with textual keywords. The returned results are typically ordered by popularity. However, popularity is defined across all users as one homogeneous attribute. Users cannot sort by their own subjective criteria, e.g. by their own personal style for clothes. Furthermore, there is no way to place items on a continuous scale, where the criteria amount for each item is known, e.g., how ‘stylish’ a particular piece of clothing is to a user. Simply put, there is no easy way for users to explore imagery by their own subjective scales. Our project aims to develop new techniques which enable users to organize and explore imagery data based in their own subjective criteria at a high semantic level. The crux of the problem is to understand how a user could communicate their own criteria without having to know how that criteria might be formed or described at the data level. We aim to form this knowledge into a new machine learning algorithm and criteria-definition interface, which will help users personally organize data in order to explore it easily. The success of this application resides, among others, in the user experience offered. In other words, the application won’t be ‘attractive’ if a user has to spend hours, providing some preference labels in order for the algorithm to adapt its parameters. The key is hence, for the algorithm to be able to adapt with a fairly small amount of labelled data from the user. This is one of the reasons why we feel that multi-task learning (MTL) is well suited for this problem. Indeed MTL takes advantage of the task relatedness to perform well even with small amounts of labelled data for each task. The other reason is that MTL works well when the tasks share the same feature space, or live in a shared subspace. This is the case in our setting where each user preference is perceived as a task. In this case it is easy to see how MTL could benefit from task relatedness e.g., users may share the same notion of ‘stylishness’ across the tasks. This MTL formulation is however problematic since existing methods exploits the dependency between different tasks by enforcing the `similarities’ among the `parameters’ of corresponding predictors. This is not applicable in our situation since we don’t have access to the parametric form of those predictors (the users in our case), we only know their predictions. One of our main motivation is hence to enable exploiting the dependencies between the tasks by formulating a non-parametric MTL instantiated through (unlabeled) data. Our formulation could be seen as an instance of active semi supervised learning, knowing that we use labelled preferences given by users but in the other hand we only use predictors with unknown parameters to tune a new personalized predictor. The impact of this project is potentially very large: it could change the way in which people organize imagery. Knowledge gained through this project will not only improve fundamental understanding in machine learning and human-computer interaction (HCI), but it will also cultivate future research in their combination: Human-data interaction. | Download | |

Knowledge Base Representation Learning -- Baselines and Challenges | 064 | DS3-395 | Wednesday | LACROIX | Timothée | A relational database is a set of facts (subject, relation, object) about the world. Representing every entities of such a dataset in a low-dimensional vector space would yield entities embeddings which would be a convenient store of "common knowledge" about the world. To evaluate such an embedding model, a common task is to try to predict missing links in the original training set. For example, given the triples (Dave, brother, Simon) and (Simon, father, Sarah), one could infer the triple (Dave, uncle, Sarah). Recently, simple models have yielded good results for the task of link prediction in subsets of knowledge bases. A theoretical understanding of the representational power of these models, their limits and inner workings is still lacking. Lots of work has been done on structural properties of the model used to represent these knowledge bases. We show that a properly regularized canonical low-rank decomposition gets state of the art results. | Download | |

Scalable Model-based Cascaded Imputation of Missing Data | 066 | DS3-413 | Wednesday | MONTIEL | Jacob | Missing data is a common trait of real-world data that can negatively impact interpretability. We present CIM, an effective and scalable technique for automatic imputation of missing data. CIM is not restrictive on the characteristics of the input data, providing support for: MAR and MCAR mechanisms, numerical and nominal data, and large data sets including highly dimensional data sets. We compare CIM against well-established imputation techniques over a variety of data sets under multiple test configurations to measure the impact of imputation on the classification problem. | Download | |

Monte-Carlo like security analysis of power systems | 068 | DS3-597 | Wednesday | CREMER | Jochen | The increased penetration of the electricity grid by renewable generation rises new challenges while at the same time the digitalization of the grid (e.g., advanced measurement devices such as Phase Measurement Units (PMU) and smart meters) unlocks an enormous potential for better grid operation and planning. For the safe and reliable operation of the transmission grid, the uncertain generation of renewable is very critical and rises one of the major challenge to be faced by the Transmission System Operator (TSO) who is in charge to ensure reliability and safety. More specifically, when the future state is unknown and highly uncertain the security of many possible scenarios must be assessed resulting in high computational requirements. To assess a substantial number of scenarios much faster, classifiers are learned based on historic data. Apart from learning based on historic data, Monte-Carlo sampling is used to learn based on simulation data. This is necessary since the most critical situations might not have occurred yet and therefore must be synthetically generated. In the proposed poster, the common approaches and the ongoing ideas are reviewed that deal with a subset of questions. E.g., how to efficiently generate a database for classifiers evaluating the power systems state and how to develop an effective classifier? | Download | |

Using Reinforcement Learning for Demand Response of Domestic Hot Water Buffers: a Real-Life Demonstration | 070 | DS3-434 | Wednesday | DE SOMER | Oscar | This poster demonstrates a data-driven control approach for demand response in real-life residential buildings. The objective was to optimally schedule the heating cycles of the Domestic Hot Water (DHW) buffer to maximize the self-consumption of the local photovoltaic (PV) production. A model-based reinforcement learning technique was used to tackle the underlying sequential decision-making problem. The proposed algorithm learns the stochastic occupant behavior, predicts the PV production and takes into account the dynamics of the system. A real-life experiment with six residential buildings is performed using this algorithm. The results show that the self-consumption of the PV production is significantly increased, compared to the default thermostat control. | ||

Learning to Generate Sub-problems in Mixed Integer Programming | 072 | DS3-439 | Wednesday | MOSSINA | Luca | This research addresses the resolution of recurrent combinatorial optimization problems, coupling machine learning techniques with branch & bound algorithms and operating under a limited time budget. Assuming such recurrent problems are the realization of an unknown generative process, the results of previous resolutions are collected and used to train a classification model. At first, when solving a new instance, this model will select a subset of decision variables to be set heuristically to some reference values, becoming fixed parameters. The remaining variables are left free and form a smaller sub-problem whose solution, while being an approximation of the optimal solution, can be obtained sensibly faster. Subsequently, if some of the time allocated is available, an iterative process of blocking/unblocking variables takes place, allowing to explore other areas of the solution space. This approach is of particular interest for problems where random perturbations on the instance parameters can occur unexpectedly, requiring a rapid re-optimization of a complex model. | Download | |

Graph sketching-based Massive Data Clustering | 074 | DS3-447 | Wednesday | MORVAN | Anne | In this work, we address the problem of recovering arbitrary-shaped data clusters from massive datasets. We present DBMSTClu a new density-based non-parametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph G between the N objects to cluster. Unlike k-means, k-medians or k-medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graph-based technique relies on the similarity graph G which costs theoretically O(N^2) in memory. However, our algorithm follows the dynamic semi-streaming model by handling G as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring O(Npoly log(N)) space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of non-convex clusters by recovering an approximate MST from the graph sketch of G. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets. | Download | |

Change-point detection in human behaviour with application to psychiatry | 076 | DS3-450 | Wednesday | MORENO MUNOZ | Pablo | Psychiatric patients with affective disorders such as schizophrenia or depression may suffer abrupt transitions in their behaviour. The apparition of these change factors shows the need for ambulatory assessment in presence of mental crisis. We consider it as a change-point detection problem. Our data consist of location traces (latitude-longitude data points), metrics from physical activity (number of paces, distance walked) and communication registers (messages sent, number of calls). All the information is structured as multidimensional time-series with one year duration. We explore Bayesian online models for change-point detection, these allow us to get precision about the personal behaviour of patients and real-time monitoring. Due to complexity of data and the need to accumulate sufficient evidence for reliable detections, we include latent variable models for reducing dimensionality and promoting the apparition of change-points. Results provide new insights in the detection of anomalous behaviour in mental health patients as well as in the accurate prediction of their states. | Download | |

Knowledge Transfer From Text Data for Improved Unsupervised Word Segmentation | 078 | DS3-462 | Wednesday | BÖNNINGHOFF | Benedikt | Natural language, spoken or in written form, offers the possibility of sharing knowledge and exchanging information between various communication partners. For us it is a simple task to break down a spoken sentence into semantic units in order to follow the thoughts of our communication partner. But it is a challenging task to build a technical system that can automatically transform a continuous acoustic signal into a discrete sequence of words. This work deals with the problem of finding linguistic structures extracted from raw audio signals, where no linguistic expertise is used a-priori. We propose a system consisting of three successive stages: Firstly, the acoustic unit discovery (AUD) module based on a Dirichlet process mixture model clusters phoneme-like categories using raw audio signals. Secondly, an acoustic unit-to-letter (A2L) converter maps acoustic units onto letters providing a stochastic evaluation. In the third stage, the word discovery (WD) based on a nested hierarchical Pitman-Yor process is performed as an iterative procedure between word segmentation and language model training. While the AUD system as well as the WD module are fully unsupervised, the training procedure of the A2L conversion needs labeled data. To keep the a-priori knowledge small, we train the model utterance-wise without any information of word boundaries. In addition, we optionally use unrelated word-based text data to initialize the language model of the WD component. The evaluation is performed on the Wall Street Journal corpus and on a Xitsonga dataset, which is largely spoken in the Limpopo province of the Republic of South Africa. Simulation results without initialization of the language model show that the incorporation of the A2L conversion significantly improves the word segmentation as if we directly apply the acoustic units to the WD module. Experiments for the case of language model initialization further show how a small amount of unrelated text data considerably improves the WD performance. | Download | |

Regularizing Non-Linear Models Using Feature Side-Information | 080 | DS3-479 | Wednesday | MAOLAAISHA | Aminanmu | Very often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. In the standard learning scenario, the input is represented as a vector of features and the feature side-information is most often ignored or used only for feature selection prior to model fitting. We believe that feature side-information which carries information about features intrinsic property will help improve model prediction if used in a proper way during the learning process. In this work, we propose a framework that allows for the incorporation of the feature side-information during the learning of very general model families to improve the prediction performance. We control the structures of the learned models so that they reflect features similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information | Download | |

Sparse BSS in the large-scale regime | 082 | DS3-482 | Wednesday | KERVAZO | Christophe | Blind Source Separation (BSS) is a powerful method to analyze multichannel data in fields that involve processing large-scale data (e.g. astrophysical data, spectroscopic data in medicine and nuclear physics, etc.). Standard methods however fail at correctly tackling BSS problems when the number of sources becomes large, especially when the number of available samples is low. Moreover, they become computationally expensive. Building upon two standard BSS algorithms, namely GMCA (Generalized Morphological Component Analysis) and PALM (Proximal Alternating Linearized Minimization), we investigate the performances of block-coordinate optimization strategies to tackle sparse BSS problems in the large-scale regime. The results reveal that the proposed approach, the block-GMCA algorithm, significantly improves the performances both in terms of computation time and separation quality due to the use of blocks. | Download | |

Scoring of system logs to prevent IT incidents | 084 | DS3-484 | Wednesday | LOGETTE | Philippe | Information Systems of many companies still face incidents that could be avoided if the relevant information was analyzed at the right time, which is not possible with traditional alerting systems. Hence, our approach is to build a machine learning system that captures experts knowledge and scores each line of logs so that logs flow can be filtered based on its relevance. Experts solicitation is spared thanks to semi-supervised learning, and relevance scoring relies on natural language processing techniques, as we want the system not to depend on specific logs structure and have the ability to generalize. | Download | |

A Generalized Model for Multidimensional Intransitivity | 086 | DS3-488 | Wednesday | DUAN | Jiuding | Intransitivity is a critical issue in pairwise preference modeling. It refers to the intransitive pairwise preferences between a group of players or objects that potentially form a cyclic preference chain, and has been long discussed in social choice theory in the context of the dominance relationship. However, such multifaceted intransitivity between players and the corresponding player representations in high dimension are difficult to capture. We propose a probabilistic model that joint learns the d-dimensional representation (d >1) for each player and a dataset-specific metric space that systematically captures the distance metric in the embedding space. Interestingly, by imposing additional constraints in the metric space, our proposed model degenerates to former models used in intransitive representation learning. Moreover, we present an extensive quantitative investigation of the wide existence of intransitive relationships between objects in various real-world benchmark datasets. To the best of our knowledge, this investigation is the first of this type. The predictive performance of our proposed method on various real-world datasets, including social choice, election, and online game datasets, shows that our proposed method outperforms several competing methods in terms of prediction accuracy. | Download | |

Structured optimization for point cloud analysis | 088 | DS3-493 | Wednesday | LANDRIEU | Loic | We propose a structured optimization framework for obtaining spatially smooth semantic labeling of 3D LiDAR point clouds. In particular, we show how a fitting choice of fidelity function, regularizer and solving algorithm allows us to efficiently obtain smooth labeling of high precision. Furthermore the probabilistic nature can be retained, allowing to measure the certainty of each affetctaion, a feat lost when using the traditional MAP inference in CRFs. | Download | |

Applied algorithms for detecting ghost writing in high school assignments | 090 | DS3-504 | Wednesday | LORENZEN | Stephan | Plagiarism and fraud in written assignments have long been a problem at any education level. Lately, however, we have seen an increase in cheating with the large final written project (known as Studieretningsprojekt or SRP) in Danish high schools. Students hire e.g. university students or teachers to write their paper for them. This kind of fraud, called ghost writing, is not simple copy-paste plagiarism, and thus requires smarter methods to detect. We investigate and apply techniques from machine learning in order to verify authorship of written assignments. Experiments are run on data from Danish high schools. | Download | |

Structure learning of undirected graphical models for count data | 092 | DS3-524 | Wednesday | NGUYEN | Thi Kim Hue | In this work, we introduce a new algorithm for structure learning of Poisson undirected graphical models, called PC-LPGM. In detail, assume that each node conditional distribution given its neighbours follow a Poisson distribution. Some authors proposed neighbourhood selection to recover the underlying structure. In this approach, the neighbourhood of each node is estimated in turn by solving a lasso penalized regression problem, and the resulting local structures stitched together to form the global graph. Nevertheless, models with increasing dimension require more delicate analysis; in particular, simply predicting one fixed variable on all other variables might not lead to accurate inference. We propose to employ the approach exploited in the PC- algorithm coupled with a limitation on the number of variables in the conditional sets. PC-LPGM seems to be very appealing, since it inherits the potential of PC-algorithm that allows to estimate a sparse graph even when the number of variables is in the hundreds or thousands. We provide both theoretical guarantees and simulation results for both low and high dimensional scenarios. | Download | |

Features that distinguish languages: Insights from deep neural nets | 094 | DS3-527 | Wednesday | MONTO | Nicholas | It has recently been shown that convolutional neural nets (CNNs) are able to determine which language a presented input comes from with high accuracy (>90%). The high performance suggests that successful CNNs are able to capitalize on invariances that are cross-linguistically unique. The goal of this study is to take a closer look at the activation maps of successful CNNs and see if the activation patterns reflect behaviorally distinct speech sound tokens that people may use to identify different languages. | Download | |

Unsupervised deep object discovery for instance recognition | 096 | DS3-533 | Wednesday | SIMÉONI | Oriane | Severe background clutter is challenging to handle in many computer vision tasks, including image retrieval. Local or regional descriptors combined with partial matching is an attractive solution. Yet, focusing only on the relevant regions is essential to control memory, search complexity and most importantly, performance in the presence of distractors. We perform salient region detection in an unsupervised way that captures common and discriminative structures in the dataset. We thereby improve particular object retrieval with or without query bounding box annotations, especially in a large scale dataset containing small objects. | not made available (presenter's request) | |

Mining Business Process Activities from Email Logs | 098 | DS3-542 | Wednesday | AL JLAILATY | Diana | Due to its wide use in personal, but most importantly, professional contexts, email represents a valuable source of information that can be harvested for understanding, reengineering and repurposing undocumented business processes of companies and institutions. Few researchers have investigated the problem of extracting and analyzing the process-oriented information contained in emails. In this work, we go forward in this direction by proposing a new method to discover business process activities from email logs. Towards this aim, emails are grouped according to the process model they belong to. This is followed by sub-grouping and labeling the emails of each process model into business activity types. These tasks are applied by deploying an unsupervised mining technique accompanied by semantic similarity measurement methods. Two representative similarity measurement methods are examined: Latent Semantic Indexing (LSA) and Word2vec. These methods are compared to prove that Word2vec provides a better performance than LSA in grouping emails according to what process model they are related to, and in discovering emails belonging to the same activity type. Experimental results are detailed to illustrate and prove our approach contributions. | Download | |

Variational Autoencoders Endowed with Richer but Still Computationally Efficient Statistical Models | 100 | DS3-550 | Wednesday | PEŞTE | Alexandra | Variational Autoencoders have become one of the most powerful tools for approximate inference in Deep Learning. In the poster we present some promising modifications to the standard algorithms proposed by Kingma et al. (2014) and Rezende et al. (2014), based on the use of richer statistical models for both the distributions of latent and observed variables. In particular, for the latent variables, we discuss the advantages and disadvantages of using a low-dimensional-rank update of a diagonal covariance matrix, and the Cholesky factorization of a full matrix, compared to using a diagonal one. In the case of the conditional distribution of the observations given the latent variables, we propose the use of statistical models able to capture pairwise correlations between adjacent pixels in an image, still maintaining a computational complexity sub-quadratic in the number of variables. We evaluate our approach over different standard data sets, and compare the results with the state of the art in the literature. | Download | |

Riemannian Methods for the Training of Neural Networks: An Overview and Experimental Comparison | 102 | DS3-558 | Wednesday | NICOLAE | Titus | The use of non-Euclidean gradients for the training of a neural network goes back to the seminal work of Amari (1998) on natural gradient, and it has been recently recovered by several authors in the context of Deep Learning. Similarly, the possibility of adopting Riemannian optimization methods on the space of the weights of a neural network has attracted the attention of researchers working on manifold optimization. In the first part of the poster we review, from a unifying perspective, several approaches based on the adoption of non-Euclidean geometries for neural networks, including both probabilistic models and the modeling of the parameter space with manifold structures. We present a comparison of different algorithms, based on a detailed experimental analysis over multiple datasets and network topologies, and discuss the advantages and trade-offs of the use of Riemannian gradients for the training of deep neural networks, compared to standard Euclidean methods. In particular we evaluate the impact on the convergence and quality of the optimum for different optimization algorithms given by the approximation of the metric tensor in the computation of non-Euclidean gradients, often required high-dimensions. In the second part of the poster we present some novel approaches, which could lead to efficient implementations of Riemannian methods in deep learning. | ||

Real-time Hyperparameter Optimization | 104 | DS3-563 | Wednesday | FRANCESCHI | Luca | The gradient of a validation error with respect to real-valued hyperparameters can be computed with two different procedures (reverse-mode and forward-mode) which have different trade-offs in terms of running time and space requirements.The forward-mode procedure is suitable for real-time hyperparameter updates (RTHO), which speed up significantly HO on large models such as deep neural networks. The algorithm requires, however, to set a descent procedure for the hyperparameters. This constitute an hyper-hyperparameter whose optimal value might be data and/or model dependent. We present possible strategies to increase the adaptiveness of RTHO and we show applications of this novel HO procedure in different scenarios. | Download | |

Random Recursive Tree Ensembles: A high energy physics application | 106 | DS3-638 | Wednesday | LALCHAND | Vidhi | The aim of this work is to propose a meta-algorithm for automatic classification in the presence of discrete binary classes. Supervised classification at a fundamental level can be defined as the ability to extract rules that discriminate one class from the other. This is done on the basis of training data whose class membership is known with the ultimate objective of classifying new data whose class mapping is unknown. Classifier learning in the presence of overlapping class distributions is a challenging problem in machine learning. Overlapping classes are described by the presence of ambiguous areas in the feature space with a high density of points belonging to both classes. This often occurs in real-world datasets, one such example is numeric data denoting properties of particle decays derived from high-energy accelerators like the \gls{LHCb} at CERN. A significant body of research targeting the class overlap problem use ensemble classifiers to boost the performance of standard algorithms by using them iteratively in multiple stages or using multiple copies of the same model on different subsets of the input training data. The former is called \textit{boosting} and the latter is called \textit{bagging}. The algorithm proposed in this work targets a popular and challenging classification problem in high energy physics - that of improving the statistical significance of the Higgs discovery. The underlying dataset used to train the algorithm is experimental data built from the official ATLAS full-detector simulation with Higgs events (signal) mixed with different background events (background) that closely mimic the statistical properties of the signal generating class overlap. The algorithm proposed is a variant of the classical boosted decision tree which is known to be one of the most successful analysis techniques in experimental physics. The algorithm utilises a unified framework that combines two meta learning techniques - bagging and boosting. The results shows that this combination only works in the presence of a randomization trick in the base learners. The performance of the algorithm is mainly assessed on the basis of a physics inspired significance metric called the \textit{Approximate Median Significance} ($\sigma$). We also show how the algorithm fares compared to the leading machine learning solutions proposed using this dataset. | Download | |

On Clustering Financial Time Series | 108 | DS3-027 | Wednesday | MARTI | Gautier | For clustering methods to be useful in online risk and trading systems, they have to be both robust to noise (which is partly achieved by leveraging copulas) and fast converging. Fast convergence of the clustering structures (flat or hierarchical) to the true underlying clusters is required to mitigate the non-stationarity effects of financial multivariate time series. If the methods require long time series to converge, then the underlying economic regime and its associated clustering structure may have changed several times in the meantime. In such case, the clusters dynamics are smoothed out and less useful for online risk and trading systems. At the heart of clustering algorithms is the fundamental notion of distance that can be defined based upon a proper representation of data. Copula-based dependence coefficients allow a better modelling of (non-linear) financial time series dependence than simplistic correlation measures such as the Pearson or Spearman ones. However, we may also consider what impact those novel correlation coefficients have on the convergence rate of the whole clustering methodology: does it speed it up or slow it down? We benchmark the empirical convergence rates of several state-of-the-art dependence-based clustering methods. Baseline results are obtained using a straightforward approach: Pearson’s ρ, Kendall’s τ, Spearman’s ρ_S correlation coefficients. | Download | |

Low-rank Interaction Contingency Tables | 110 | DS3-161 | Wednesday | ROBIN | Geneviève | Log-linear models are popular tools to analyze contingency tables, particularly to model row and column effects as well as row-column interactions in two-way tables. We introduce a regularized log-linear model designed for denoising and visualizing count data, which can incorporate side information such as row and column features. The estimation is performed through a convex optimization problem where we minimize a negative Poisson log-likelihood penalized by the nuclear norm of the interaction matrix. We derive an upper bound on the Frobenius estimation error, which improves previous rates for Poisson matrix recovery, and an algorithm based on the alternating direction method of multipliers to compute our estimator. To propose a complete methodology to users, we also address automatic selection of the regularization parameter. A Monte Carlo simulation reveals that our estimator is particularly well suited to estimate the rank of the interaction in low signal to noise ratio regimes. We illustrate with two data analyses that the results can be easily interpreted through biplot vizualization. The method is available as an R code. | Download | |

Decision trees optimization for ultrasound detection of fetal abnormalities | 112 | DS3-241 | Wednesday | BESSON | Rémi | In this work we investigate the learning problem of good diagnostic policies in foetal abnormalities search by ultrasound. We start by learning our environment via Bayesian method such that maximum entropy approach and then we write our problem as a Markov Decision Process. Reinforcement learning methods and ideas from algorithms looking for shortest path in a graph such that A*, AO* are adapted in order to find good diagnostic policies. | Download | |

Fast Incremental Stochastic Version of the EM algorithm | 114 | DS3-256 | Wednesday | KARIMI | Belhal | A wide class of statistical problems involves observed and unobserved data. We can consider, for example, inverse problems concerning deconvolution, source separation, change-points detection, etc. Linear and nonlinear mixed effects models can also be considered as incomplete-data models. Estimation of the parameters of these models is a difficult challenge. In particular, the likelihood of the observations cannot usually be maximized in closed form. The EM algorithm proposed by Dempster, Laird and Rubin led to many variants when the conditional expectation of the complete log-likelihood is intractable. The MCEM (Meng, 1993) and the SAEM (Delyon, 1999) are two of them. Following Neal, Hinton and Gunawardana efforts in justifying a variant version of the EM algorithm considering an incremental scheme, we decided to focus on the Incremental EM, MCEM and SAEM for continuous random variables. | Download | |

Online learning and Blackwell approachability with partial monitoring: Optimal convergence rates | 116 | DS3-579 | Wednesday | KWON | Joon | Blackwell approachability is an online learning setup generalizing the classical problem of regret minimization by allowing for instance multi-criteria optimization, global (online) optimization of a convex loss, or online linear optimization under some cumulative constraint. We consider partial monitoring where the decision maker does not necessarily observe the outcomes of his decision (unlike the traditional regret/bandit literature). Instead, he receives a random signal correlated to the decision–outcome pair, or only to the outcome. We construct, for the first time, approachability algorithms with convergence rate of order O(T^(-1/2)) when the signal is independent of the decision and of order O(T^(-1/3)) in the case of general signals. Those rates are optimal in the sense that they cannot be improved without further assumption on the structure of the objectives and/or the signals. (joint work with Vianney Perchet) | Download | |

Combining ML and Mathematical Optimization to tackle automatic parameter tuning on HUC problems | 118 | DS3-637 | Wednesday | IOMMAZZO | Gabriele | It is a well known fact that one of the most efficient and long-term methods for storing electricity is to turn it into potential energy by pumping water up mountain valleys into natural or artificial water basins. Specifically, scheduling the pumps, basins and valleys is known as the Hydro Unit Commitment (HUC) problem. According to Mathematical Programming taxonomy, HUCs are natively Mixed Integer Nonlinear Programs (MINLP), meaning they involve both continuous and integer decision variables, and both linear and nonlinear terms in their objective functions and constraints. The HUCs that we specifically handle yield correspondingly difficult MINLPs to solve. Even when the nonlinearities are linearized, and the MINLP becomes a Mixed-Integer Linear Program (MILP), MILP solution technology is currently not even advanced enough for finding a feasible solution to these MILPs, let alone optimal ones. There is one feature of current MP solvers, however, which is not usually exploited to its fullest, namely the parameter configuration of the solver: this is usually left to the experience of the user and wounds up being performed by empirical tweaking. We believe instead that supervised Machine Learning (ML) techniques can be used to learn a good configuration of solver's parameters, in function of the numerical and structural properties of the instance being solved. Having generated a number of HUC instances, we solved them (within a fixed time limit) by means of the commercial solver IBM CPLEX. So, for each instance and each parameter configuration considered, we could gather evaluation measures regarding solutions produced by said solver. Next, instance features and parameter settings were translated into simpler data, on which preprocessing techniques could be conveniently applied in order to build a training set for the ML algorithm chosen, i.e. Support Vector Regression. Finally, by training SVR we managed to learn a function that can predict the performance of a pair (instance features, parameter setting). Upon receiving a new instance, we use such a function to recommend the solver configurations that are best suited for that instance. This is accomplished by minimizing said function over the (combinatorially large) space of all configurations; in other words, this is accomplished by solving an optimization problem. Bibliography [1] A. Borghetti, C. D'Ambrosio, A. Lodi, and S. Martello. An milp approach for short-term hydro scheduling and unit commitment with head-dependent reservoir. IEEE Transactions on Power Systems, 23(3):1115-1124, (2008). [2] F. Hutter and Y. Hamadi. Parameter adjustment based on Performance Prediction: Towards an instance-aware problem solver. Technical report, (2005). [3] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825-2830, (2011). [4] A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199-222, (2004). [5] M. Tahanan, W. van Ackooij, A. Frangioni, and F. Lacalandra. Large-scale unit commitment under uncertainty. 4OR, 13:115-171, (2015). | Download | |

Optimal Control Variates for MCMC | 120 | DS3-660 | Wednesday | RADHAKRISHNAN | Anand | The Langevin Diffusion is the grandmother of all MCMC algorithms. Just like the majority of such algorithms, including the celebrated technique of Metropolis and Hastings, this Markov process is reversible, with unique invariant probability measure. A new approach to adaptive control variates for asymptotic variance reduction is presented. The approach is based on a new representation of the asymptotic variance that lends itself to a simple and efficient technique for determining the control variate with minimal asymptotic variance, within a parameterized class. The approach may be regarded as a new approach to TD-learning, and hence is likely to have applications to other problems in statistics, computer science, and control. Numerical results show that in addition to Langevin diffusion, the same algorithm works well for Metropolis Hastings sampling. We also show that minimizing variance may not necessarily minimize the asymptotic variance, which is the quantity of interest in MCMC algorithms. | Download |