Archive for January, 2011

Moving average

Wednesday, January 26th, 2011

In statistics and biophysics, a common value is moving average. There is one efficient way and the derivation is:

x_{i+1}=(x_{1}+x_{2}+..+x_{i+1})-(x_{1}+x_{2}+..+x_{i})

It then could be written as:

x_{i+1}= (i+1)*Average_{i+1}-i*Average_{i}

Therefore we have:

Average_{i+1}=\frac{i*Average_{i}+x_{i+1}}{i+1}
or =Average_{i}+\frac{x_{i+1}-Average_{i}}{i+1}.

Support vector machines soft margin primal and dual problems derivation

Saturday, January 22nd, 2011

Support vector machines (SVM) have been used extensively in machine learning. In particular the soft margin case which allows misprediction with a penalty associated has been applied to many problems spanning a variety of fields. Here let’s study how the primal problem is reformulated in the dual way so that it is easier to understand its solution.

Suppose we have x_{i}  inputs and corresponding measurement y_{i}, and solution w should be such that we can separate inputs according to their measurement. The standard SVM primal formulation is:

 min \frac{1}{2}w^{T}w +C \sum_{i}\omega_{i}

s.t. y_{i}(w^{T}x_{i}+b) \geq 1-\omega_{i}

\omega_{i} \geq 0

We introduce Lagrange multiplier so:

min_{(w,b,\omega)} max_{(\alpha, \gamma)} \{ \frac{1}{2}w^{T}w +C \sum_{i}\omega_{i}-\sum_{i}\alpha_{i}(y_{i}(w^{T}x_{i}+b)-1+\omega_{i})-\sum_{i}\gamma_{i}\omega_{i}\}

why maximize \alpha and \gamma because otherwise +\infty is their trivial solution.

and \alpha_{i}\geq 0,\gamma_{i} \geq 0.

The saddle point conditions are:

\frac{\partial L}{\partial w}=0, \frac{\partial L}{\partial b}=0, \frac{\partial L}{\partial \omega}=0.

The solution is:

w=\sum_{i}\alpha_{i}y_{i}x_{i}, \sum_{i}\alpha_{i}y_{i}=0, and 0 \leq \alpha_{i} \leq C.

Let’s rewrite the Lagrange formula:  max_{\alpha} \sum_{i}\alpha_{i}-\frac{1}{2}\sum_{i}\sum_{j}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}^{T}x_{j}

with \sum_{i}\alpha_{i}y_{i}=0, 0 \leq \alpha_{i} \leq C. There are no C and \omega in the maximum function anymore, is that magic?!

Reference:

1, Wikipedia

2, The Top Ten Algorithms in Data Mining by Wu, Xindong

Thermodynamics integration

Sunday, January 16th, 2011

In free energy calculation, thermodynamics integration, which is alternative to free energy perturbation has generated some steam recently. This post is a brief introduction to this method. Free energy is a state function, if there are two states, state 0 and state 1, the free energy of transferring from state 0 to state 1 is written as:

 \Delta G(0 \rightarrow 1) = -k_{B}T\ln \frac{Q_{1}}{Q_{0}}=-k_{B}T\ln\frac{\int{e^{-\beta H_{1}(x,p)}dxdp}}{\int{e^{-\beta H_{0}(x,p)}dxdp}}

Since free energy is independent of actual path, we introduce an alchemical order parameter, \lambda, the Hamiltonian of the system is then H_{\lambda}(x,p), and we need make sure that

H_{\lambda}(x,p)=H_{0}(x,p) when \lambda=0, and H_{1}(x,p) when \lambda=1.

So we have free energy dependent on \lambda,

\Delta G(0 \rightarrow \lambda)=-k_{B}T\ln \frac{Q_{\lambda}}{Q_{0}}. Its derivative is

 \frac{d\Delta G}{d\lambda} = \frac{\int {\frac{\partial H_{\lambda}}{\partial \lambda}e^{-\beta H_{\lambda}(x,p)}dxdp}}{\int e^{-\beta H_{\lambda}dxdp}} =\ll \frac{\partial H_{\lambda}}{\partial \lambda} \gg_{\lambda}

The free energy between 0 and 1 is

\Delta G(0 \rightarrow 1)=\int_{0}^{1}\frac{d\delta G(\lambda)}{d\lambda}d\lambda = \int_{0}^{1}\ll \frac{\partial H_{\lambda}}{\partial \lambda} \gg_{\lambda}d\lambda \approx \sum_{k} \ll \frac{\partial H_{\lambda}}{\partial \lambda} \gg_{\lambda}\Delta \lambda_{k}.

In general, there is a thermodynamics cycle to carry out the actual free energy difference. For example, in solvation free energy calculation, there is such a cycle:

[Solute]_{vacuum} \longrightarrow [solute]_{solvent} : \Delta G_{solvation}

\Downarrow \Delta G^{0}_{annihilation}                 \Downarrow \Delta G^{1}_{annihilation}

[Nothing]_{vacuum} \longrightarrow [Nothing]_{solvent} :\Delta G=0

From the above cycle, we have \Delta G_{solvation}= \Delta G^{0}_{annihilation}-\Delta G^{1}_{annihilation}.

vim settings

Tuesday, January 11th, 2011

Before I forget, some trivial tips for vi editor, modify ~.vimrc file:

set ruler “to tell where the cursor is, line and column number”

set incsearch “where search, it automatically and simultaneously match the current typing in the text”

A good book: Super Crunchers

Friday, January 7th, 2011

I finally finished one book on my list, “Super Crunchers:why thinking-by-numbers is the new way to be smart”. It is indeed a good book, in a sense, it shows that when there accumulate enough data, one’s intuition, though still very important, need to be checked/corrected by data mining. In other words, when there are many past examples, the future prediction should be relied more on statistical learning from the past experience not one’s intuition. This is not about novel discovery per se, it is to reveal the rational behind past experience. In any new field/problem, when there is not enough accumulated data, one then should rely on both intuition and data mining.

In retrospect, to reach that conclusion seems no-brainer at all.  So the real worth point to me is that there are many real life examples of how data crunch made differences. For example, the first example, predicting wine price based on data crunch v.s. some wine experters’ prediction.

Afterwords, I have to say that data crunch is actually a double-edge sword,  on one hand it removes our emotions when rational is needed, but on the other hand, it could get too rational–some time equal to too boring. A good example is that Hollywood uses neural networks model to predict which script will produce a hit movie or even how to modify script, casting,… to make it a hit. After knew that, I realized why so many Hollywood movies are predictable. So I hopelessly enjoyed every new movie. Those suckers knew what I like before filming!