2nd mini-workshop on Feb 19, 2014 | Data science in particle physics, astrophysics and cosmology

when:

February 19 at 14h30

where:

Laboratoire de l'Accélérateur Linéaire

UPSud campus map

LAL Auditorium

who:

Vava Gligorov
CERN

Kilian Weinberger
Washington University (St. Louis)
More info

what:

 14h30-15h30:  Vava Gligorov (CERN)
Title: Real time event selection at the LHC
 Abstract: The Large Hadron Collider produces 40 million of proton collisions per second, but only a few thousand can be saved for future analysis. This means that the experiments have to be built around a real time event selection system, or "trigger", which is able to rapidly and efficiently select the most interesting events and discard the others. The design of such triggers involves establishing the cost of obtaining the discriminating features needed by a selection, and for this reason triggers are generally designed as a cascade of selections which gradually use more and more complex information as the event rate is progressively reduced and more time made available. The principles will be illustrated with the example of the trigger system of the LHCb experiment, and potential future developments discussed in the context of multivariate selections.

and

 16h00-17h00: Kilian Weinberger (Washington University). More info.
Title: Learning under Test-time Budgets
 Abstract: As Machine Learning becomes increasingly mature, it is starting to be used in many industrial and medical applications. In real life, machine learning algorithms incur real costs during running time (test time). These costs can be in terms of cpu-time or monetary funds (e.g. costs of feature extraction in medical applications) and are magnified when algorithms are executed many times per day (possibly millions or billions of times). So far little attention has been given to this new setting.
 In this talk I introduce three novel algorithms that my co-authors and I developed to incorporate test-time budgets into the training of machine learning algorithms. The Greedy Miser is a variant of gradient boosted trees (the most common algorithm for web-search engines) that integrates the feature extraction cost into the greedy CART tree construction. Because the feature cost only appears in the CART impurity function, this algorithm inherits all the great properties of gradient boosted trees, such as its scalability, robustness and superb generalization. Cost Sensitive Tree of Classifiers (CSTC) is an algorithm that learns trees of sparse classifiers. During test-time, different inputs traverse along different paths through the tree, which extract different features. This dynamic feature extraction leads to drastic reductions in feature cost. Finally, I will include most recent research from my lab that speeds up the CSTC construction by orders of magnitude utilizing recent results with combinatorial optimization of approximate sub modular functions. I benchmark all three algorithms on real world data sets and show that each one successfully manages the accuracy / cost tradeoff in its own way.