On March 4, I attended the 10th annual Machine Learning Symposium organized by the New York Academy of Sciences. This was the third time I attended. It is a nice day-long event with a 3-4 keynote talks and several posters and 5-minute talks by selected poster presenters. It was at this venue that I first heard David Blei talk about topic modeling and LDA in 2010. Below are the talks I enjoyed most this time.
Alex Graves, Google DeepMind
This was the best talk of the day in my opinion. Alex presented a set of results on the role of specifying which parts of the input to pay attention to in deep neural networks. Deep networks naturally learn where to focus their attention for a particular task (e.g. which parts of an image are responsible for detecting the desired output), but being able to provide some side information to the network pertaining to where to focus can improve learning tremendously. The mathematical development of this idea makes use of smooth, differentiable operators for specifying where to focus attention, so that these operators can be included in gradient descent learning. Alex is famous for his work on handwriting recognition using Recurrent Neural Networks, and has a cool widget on his website where you can translate typed text into cursive handwriting in a chosen style.
Sasha Rakhlin, UPenn
This keynote was more theoretical than Alex’s talk. The setting for Sasha’s talk was predicting attributes of a node in a graph (e.g. a social network), when the input predictors include the graph structure as well as non-graph features of the node, and data about nodes arrives in an online fashion. Training a machine in the offline setting (e.g. a classifier for whether a user in a social network will click on a given ad) using conventional methods runs into a combinatorial explosion. Sasha’s work shows that the online setting allows for a relaxation that allows for polynomial time learning.
Multitask Matrix Completion for Learning Protein Interactions across Diseases
Meghana Kshirsagar, IBM Research
The matrix completion problem, exemplified by the Netflix Challenge is that a large matrix with a low rank has very few entries available and the task is to estimate the missing entries. The problem also arises in the context of protein-protein interactions between the proteins of disease causing germs and proteins of a host organism. While there are several known methods for matrix completion (e.g. nuclear norm minimization, Candes et al. ), this setting can also benefit from multi-task learning because often several diseases have similar responsible proteins. Multi-task learning, which is a general idea (not specific to matrix completion) that basically proposes that when multiple problems have portion of the signal in common, then using a combination of a shared representation and a specific representation can lead to better generalization.
Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS
Boxiang Wang, University of Minnesota
Distance Weighted Discrimination was proposed as a superior alternative to the SVM, because unlike the SVM, the DWD cares about the distances of *all* data points from the separating hyperplane, rather than just the points closest to it. However, the DWD hasn’t yet enjoyed the popularity of the SVM because it require solving a SOCP rather than a quadratic program (SVM). This paper extends theoretical results on DWD (established Bayes risk consistency) and proposes an algorithm faster than second order cone programming.