Archive for May, 2013

Attending ICLR 2013 (Day 3)

Monday, May 6th, 2013

It is Saturday and yes, the conference is going on. It is light and no posters though.

Herded Gibbs Sampling by Luke Bornn from Harvard is very promising. The Gibbs sampling they proposed can achieve convergence rate at O(1/T). He showed the results from their Herded Gibbs sampling and regular Gibbs sampling, in terms of accuracy, they both reach similar level, but the Herded Gibbs sampling is much faster. Worth a try if applicable.

The talk Feature Learning in Deep Neural Networks – A Study on Speech Recognition Tasks by Dong Yu from Microsoft yesterday is also very impressive. He showed that deep networks indeed help quite dramatically in speech recognition.

The Manifold of Human Emotions by Seungyeon Kim from Georgia Tech is very interesting. He used review data to define 32 emotions and then with very intuitive assumptions, he was able to find the manifold of the 32 emotions. I appreciate his passionate and detailed explanation of his work.

Overall, my feeling is that deep learning indeed works exceedingly well on one hand, if judged only by performance. On the other hand, we really don’t know why it works, or, I mean, what knowledge can be gained exclusively. Anyway, better is better, nobody wants their products impress users with more falses. The product is a blackbox to users anyway.

Attending ICLR 2013 (Day 2)

Friday, May 3rd, 2013

The talks today are very close to me overall. For example, A Nested HDP for Hierarchical Topic Models by John Paisley is very interesting approach. It is a hierarchical LDA for topic discoveries, it is a combination of Chinese restaurant processing and LDA. John demonstrated his nested HDP in more than 1 million NY Times articles. He said that it took about 350 iterations in about 10 hours to converge, which is very appealing given that I assume he didn’t use any parallel computing yet.

Zero-Shot Learning Through Cross-Modal Transfer by Richard Socher from Stanford is another interesting one. This is a very appealing concept, Richard showed that he could classify unseen photos without labeled data. Although it is on both text and photo data, one obviously could try it out on text only data. Definitely worth reading more.

Efficient Estimation of Word Representations in Vector Space by Tomas Mikolov from Google is very exciting. They found a word representation, unlike traditional tf or tf-idf vector, they adopted a continuous space so that semantically similar words reserve the distance. He mentioned that he will make the code and the representation available soon after approved. I personally am very interested in the one he got on 100B data.

Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors by Danqi Chen from Stanford is a very ambitious project, especially, this is just her first year at graduate school. In wordnet or other similar ones, they have relationship between entities, or, ontology. We know those ontology is not complete, or near good enough. Her goal is to learn a model that can predict missed entities for a relation, as well as predict a missed relation for entities. The results are fairly good given the daunting difficulty of the goal.

Attending ICLR 2013 (Day 1)

Thursday, May 2nd, 2013

This is my first time attending the International Conference on Learning Representation 2013. It is very revolutionarily cool that this conference invent (or adopts) an open-review system, so that all the accepted papers are online already. I for one very much support such review system, at least for conference publications.

So far, the majority of talks focusing on deep learning in image processing. But as some of them showed, the very same technique can be applied to documents as well. For example, the invited talk from Dr. Ruslan Salakhutdinov. He presented the Deep Boltzmann Machines, and its applications. One application is in image and texts analysis. For example, he built a model to learn from tags of photos, and then the trained model can be used to do either image or text retrieval, which definitely have lots of business interests. The other application is indeed on topical learning. He briefly said the results is not very good, but one interesting point he made is that it indeed has some advantages over LDA.

Another interesting work is from Judy Hoffman. The title of her talk is “Efficient Learning of Domain-invariant Image Representations“. One of the points she made is that there is variant between training data and new data. Once a model is trained, one way to incorporate the new varied data is to do transformation on the new data so that it still be captured by the trained model. I found it is cool and could be used on documents as well. But does such a global transformation exist given the nature of very varied data?

Complexity of Representation and Inference in Compositional Models with Part Sharing by Alan Yuille is quite interesting, Alan is clearly famous of not only his PhD advisor, Stephen Hawking but also his brilliant works. I’m very much impressed with his quote (maybe he quoted someone else) “The world is compositional or God exists”. His is idea is very useful for image processing and document analysis as well.