Archive for September, 2010

Noise reduction on funnel-shaped energy landscapes

Monday, September 27th, 2010

This post is due to an excellent paper by Andrew Stumpff-Kane and Michael Feig, which was published on Proteins: Structure, Function, and Bioinformatics 63:155-164(2006). In protein structure prediction field, almost every one has his/her own scoring function to score/rank models for any target sequence,  presumbly, the model closest to the native structure should score highest/lowest. And since the closeness of the model to the native structure is usually measured by RMSD (or GDT_TS as in CASP or TMscore invented by Zhang and Scholnick’s paper), if the scoring function is perfect, there should be strong correlation between the RMSD and the score for all models. However, more often than not we saw very clumsy distruction of RSMD vs scores for CASP models, in other words, there is little or no such hoped correlation. So the authors proposed a statistical solution, correlation based scoring function to reduce the noise from original score functions. The noise, Z of original score function (W) and the score function is assumed to not correlated to the distance between model and native structure. The correlation coefficient

\rho_{r}(d(PP_{r}), W+Z)=\frac{Cov(d(PP_{r}), W+Z)}{\sqrt Var(d(PP_{r})(Var(W)+Var(Z))}

They found that the correlation of \rho to d(PP_{0}) is not dependent on Z anymore, that is

\rho(d(PP_{0}), \rho_{r}(d(PP_{r}), W+Z))=\frac{Cov()}{\sqrt(Var(d(PP_{0}))Var()}

So the proposed score of each model is calculated as:

r_{i}=\frac{N\sum_{j \neq i}^{N}s_{j}d_{ij}-\sum_{j\neq i}^{N}s_{j}\sum_{j\neq i}^{N}d_{ij}}{\sqrt(N^{2} Var(s)Var(d))} where

d_{ij} is the distance between model i and model j, s_{i} is the original score of model i, N is the total number of models.

It works well on 5 data sets they chose. One of the reasons it works is that it uses the assumption that all the models are near by or at the native structure and their distribution is a funnel-like, that is, there is a global minimum. So the correlation score would weight the model with closest to global minimum the largest score. In reality, they found that it is better to use a hybrid of the original score with this correlation based score. That is, to use the correlation based score to select a limited number of models (for example 10), and then use the original scoring function to rank the preselected models. And this hybrid turns out to be better than either.

Again the assumption is that all models form funnel like distribution on energy landscape.

plot error bars in R

Friday, September 10th, 2010

For example, there is a group of measurements, we would like to divide them into consecutive subgroups then plot the mean values, and also plot the associated uncertainties for each mean values. Suppose we have two groups of measurements, d1, and d2. Both have 30 measurements. We want to use 3 points for each group, that is, average every 10 measurements for each group. Here is how to do it in one way:

library(psych)

g=c(rep(1,10), rep(2,10), rep(3,10))

error.bars.by(d1[1:length(g)], g, TRUE, xlab=”Time”, ylab=”Pressure”, main=”W->L: 10ps”,col=2, colors=2,pch=1)

error.bars.by(d2[1:length(g)], g, TRUE, xlab=”Time”, ylab=”Pressure”,col=3, colors=3, pch=2,add=T)
legend(x=1,y=max(c(d1,d2)), legend=c(“Native”, “Mutant”), col=2:3,pch=1:2)

The book about Warren Buffet

Thursday, September 9th, 2010

I heard a while ago about the biography book of Warren Buffet by Alice Shroeder and finally got a chance to read(rather listen to) it. This is the first book I read about the admired Buffet. To me the whole book is just more gossips than what I knew from watching Charlie Rose Show. I mean, after read this book, Buffet is still just in a plane. The book title says his business life, but the author didn’t dig in enough to unveil his business activities from business perspective. The Solomon event is good but still lacking something, well, I guess I’ll never satisfied unless I read some documentation about that collapse–could put in on my wish list.

Anyway it is worth reading if you are interested in the greatest investor’s personal life.

The “Did you know?” video

Sunday, September 5th, 2010

It is quite interesting to watch this video from youtube.