Global vs Local alignment

In bioinformatics, we always talk about alignment, either it is protein/RNA/DNA sequence alignment or sequence-structure alignment. There is always one topic that we need define at first. Which alignment, global or local alignment?

Global alignment by definition is to align two sequential items end to end. Within the context of the two are “similar”, then we look for specific similar/dissimilar parts if not the whole thing. If may be too many arbitrary matches if don’t consider such a requirement(end to end aligning). But it also has down-side, “artificially” forces some match/mismatch to achieve the “global-ness”. Parameters used for measuring the alignment then needs to be fine tuned very well to diminish influence of such force. Naturally, there is another thinking. Why not abandon the idea of global just look for best matched pieces? Then here is the local alignment.

There is subtle but profound difference between their implementation. In the end, they are nothing but Dynamics Programming. So we have a score matrix to measure substitution impact, and as well as gap penalty for gaps in the alignment. The global alignment could be based Needleman-Wunsch algorithm. And the local alignment could be implemented using Smith-Waterman algorithm. The core difference is that in N-W,

F(i,j) = max{F(i-1, j-1)+s(xi, xj), F(i-1,j)-d, F(i, j-1)-d},

but in S-W,

F(i,j) = max{F(i-1, j-1)+s(xi, xj), F(i-1,j)-d, F(i, j-1)-d, 0}.

Notice there is 0 in S-W.

Reference for this article is “Biological sequence analysis” by R.Durbin, S.Eddy, A.Krogh, and G.Mitchison.

So following the above, I’d like to mention that TMalign and CE, two of the programs used in structural biology are global and local alignment, respectively.

Comments are closed.