Archive for the ‘scripts & tips’ Category

Simple parallel codes in PERL

Friday, July 23rd, 2010

There are many ways to do parallel jobs in PERL, which way to go is pretty much dependent on the problem. There is one problem like I worked on, it is pretty much serial jobs, but need to be executed on multiple cores, there is no communication between jobs, so the straightforward solution other than repeat running serial jobs is the following:

For example, I need run 8 jobs:

use threads;
my( $t1 ) = threads->create( \&sub1 );
my( $t2 ) = threads->create( \&sub2 );
.. ..

my ($t8)= threads->create( \&sub8 );

sub1(),…sub8() could be up to 8 different functions.

Install perl modules

Wednesday, June 30th, 2010

There are a variety of ways to install perl module. One of them is using cpan and this is a very good howto instruction page.

Get GO terms for proteins with only pdbids

Wednesday, July 22nd, 2009

I found there is a way to get the GO terms for proteins with only pdbids. Though I hope there is a way to simply finish the mapping between GO terms and pdbids. Here is the procedure.

Go to PDB advance search
Select PDB IDs
Paste all query IDs
Click on “Evaluate query”
Select “GO hits” on the result page

Then you will see all the GO terms hits. It still can’t tell you which pdbid is mapped to which GO term. But it serves my needs for the moment.

If anyone know the magic mapping, I’ll be more than happy to hear.

zscore calculation in R

Monday, June 1st, 2009

z <- (x1-mean(x))/sd(x) Cumulative probability of the z-score: pnorm(z)

test the hypothesis of normality in R

Wednesday, March 18th, 2009

library(nortest)
lillie.test(x)
#the null hypothesis is that x is a normal distribution.

finding common elements in R

Thursday, November 6th, 2008

I used to write a lengthy function to locate the commen elements of two arrays. It turns out there is a much simpler solution:

id <- which(array1 %in% array2) if (length(id)>0) {
comm <- array1[id] } else { cat("No common elements!\n") }

R could be paralleled

Saturday, November 1st, 2008

I just learned that R can do parallel calculation. Bingo! Here is the link.

Output references from pubmed

Thursday, October 16th, 2008

There is a quick way to go through all newly published papers from a selected journal and also download the citation.
Go to
http://www.ncbi.nlm.nih.gov/sites/entrez

Then type in for example,

“BMC Bioinformatics” [journal]
“PLOS Computational Biology” [journal]
“Nat Struct Mol Biol” [journal]
“Cell” [journal]

and select sort by pub date, then you will get all the papers published in this journal and sorted by publication date(latest first). Choose display “Abstract” so that you could read their abstracts and select whatever you are interested, and then use display “Medline” and send to “file”, save the file ,and done!

view a certain line of all selected files

Tuesday, October 14th, 2008

Peter taught me a very useful tip to display a certain line of all selected files.

Suppose I want to see the 10th line of all cpp files:

bash
for f in *.cpp; do head -10 $f | tail -1; done

Live to learn. 🙁

Label x and y axis using Hershey symbols in R

Monday, September 22nd, 2008

There are more very good tricks that I just found in R. Such as, plotting symbols for xlab can be done in this way:


x <- seq(0, 2, by=0.01) y <- sin(x*2*pi) plot(x, y, xlab="", ylab="") par(xpd=TRUE) text(floor(mean(x)), -1.5, "\\*w", vfont=c("serif","plain")) par(srt=90) text(-0.4, mean(y), "sin(\\*w)", vfont=c("serif","plain"))