Home  |  About Us  |  Link To Us  |  FAQ  |  Contact

# Algorithm::NeedlemanWunsch 0.02

Date Added: February 01, 2010  |  Visits: 979

Algorithm::NeedlemanWunsch is a sequence alignment with configurable scoring. SYNOPSIS use Algorithm::NeedlemanWunsch; sub score_sub { if (!@_) { return -2; # gap penalty } return (\$_[0] eq \$_[1]) ? 1 : -1; } my \$matcher = Algorithm::NeedlemanWunsch->new(&score_sub); my \$score = \$matcher->align( @a, @b, { align => &on_align, shift_a => &on_shift_a, shift_b => &on_shift_b, select_align => &on_select_align }); Sequence alignment is a way to find commonalities in two (or more) similar sequences or strings of some items or characters. Standard motivating example is the comparison of DNA sequences and their functional and evolutionary similarities and differences, but the problem has much wider applicability - for example finding the longest common subsequence (that is, diff) is a special case of sequence alignment. Conceptually, sequence alignment works by scoring all possible alignments and choosing the alignment with maximal score. For example, sequences a t c t and t g a t may be aligned sequence A: a t c - t | | | sequence B: - t g a t or sequence A: - - a t c t | | sequence B: t g a t - - (and exponentially many other ways, of course). Note that Needleman-Wunsch considers global alignments, over the entire length of both sequences; each item is either aligned with an item of the other sequence, or corresponds to a gap (which is always aligned with an item - aligning two gaps wouldnt help anything). This approach is especially suitable for comparing sequences of comparable length and somewhat similar along their whole lengths - that is, without long stretches that have nothing to do with each other. If your sequences dont satisfy these requirements, consider using local alignment, which, strictly speaking, isnt Needleman-Wunsch, but is similar enough to be implemented in this module as well - see below for details. In the example above, the second alignment has more gaps than the first, but perhaps your as are structurally important and you like them lined up so much that youd still prefer the second alignment. Conversely, if c is "almost the same" as g, it might be the first alignment that matches better. Needleman-Wunsch formalizes such considerations into a similarity matrix, assigning payoffs to each (ordered, but the matrix is normally symmetrical so that the order doesnt matter) pair of possible sequence items, plus a gap penalty, quantifying the desirability of a gap in a sequence. A preference of pairings over gaps is expressed by a low (relative to the similarity matrix values, normally negative) gap penalty. The alignment score is then defined as the sum, over the positions where at least one sequence has an item, of the similarity matrix values indexed by the first and second item (when both are defined) and gap penalties (for items aligned with a gap). For example, if S is the similarity matrix and g denotes the gap penalty, the alignment sequence A: a a t t c c sequence B: a - - - t c has score S[a, a] + 3 * g + S[c, t] + S[c, c]. When the gap penalty is 0 and the similarity an identity matrix, i.e. assigning 1 to every match and 0 to every mismatch, Needleman-Wunsch reduces to finding the longest common subsequence. The algorithm for maximizing the score is a standard application of dynamic programming, computing the optimal alignment score of empty and 1-item sequences and building it up until the whole input sequences are taken into consideration. Once the optimal score is known, the algorithm traces back to find the gap positions. Note that while the maximal score is obviously unique, the alignment having it in general isnt; this modules interface allows the calling application to choose between different optimal alignments..

 Requirements: No special requirements Platforms: Linux Keyword: Algorithmneedlemanwunsch,  Alignment,  Gap,  Gap Penalty,  Libraries,  Needlemanwunsch,  Programming,  Score,  Sequence,  Sequence Alignment,  Similarity Matrix,  T C Users rating: 0/10