ELE 4120 BioinformaticsTutorial 3Content• Dot (Matrix) Plots• Simple Alignments•Gaps– Simple gap penalties– Orientation and length penaltiesDot Plots (1)• Evaluating similarity between 2 sequences• Window size – number of nucleotides compare each time (usually odd number)• Stringency – the minimum number of nucleotides in the window must be “match”, so that a dot can be placed• Mismatch Limit – the maximum number of nucleotides in the window can be “not match”, so that a dot can still be placed • Mismatch Limit = Window size - StringencyDot Plots (2)• Example 1: Compare the following sequences, with window size = 5, stringency = 3AGAGACTCAGAGTGTGA G A G A C T C A G A G A C T C A G A G A C T CA A AG G GA ● A A ●G G GT T TG GGT TTG GG0 match 4 matches4 matchesDot Plots (3)A G A G A C T CA G A G A C T C A G A G A C T CA A AGG GA A AG ●G GT T TGG GT T TGG G0 match 0 match 3 matchesA G A G A C T CAGA ● ●Final answerG ● ●T ●GTGDot Plots (4)• Example 2: Compare the following sequences and find the regions of similarity between two sequences. (window size = 5, stringency = 3)TGACCATGGGGTACCAGC• Dot plots:region of similarityT G A C C A T G GGGTA ●region of similarityC ●C ●A ●GCSimple Alignments• Homologs – sequences that share a common ancestor• 3 possible changes occur at sequence– MutationChange the – Add one or more positions sequence length– Delete one or more positionsSimple ...
Window size number of nucleotides compare each time (usually odd number)
Stringency the minimum number of nucleotides in the window must be match, so that a dot can be placed
Mismatch Limit the maximum number of nucleotides in the window can be not match, so that a dot can still be placed
Mismatch Limit = Window size - Stringency
Dot Plots (2)
Example 1: Compare the following sequences, with window size = 5, stringency = 3
A G A G T G T G
A G A G A C T C
●
4 matches
AGAGACTC AGAGTGTG
A G A G A C T C A G A G T G T G
0 match
●
A G A G A C T C A G A G T G T G
4 matches
A G A G A C T C A G A G T G T G
0 match
Final answer
Dot Plots (3) A G A G A C T C A G A G A C T C A A G G A A G G T T G G T T G G
AGAGTGTG
0 match
A G A G A C T C
● ● ● ● ●
●
3 matches
Dot Plots (4) Example 2: Compare the following sequences and find the regions of similarity between two sequences. (window size = 5, stringency = 3) TGACCATGG
Dot plots:
region of similarity
G G T A C C A G C
GGTACCAGC
region of similarity
T G A C C A T G G
●
●
●
●
Simple Alignments
Homologs sequences that share a common ancestor
3 possible changes occur at sequence
Mutation
Add one or more positions Delete one or more positions
Changethe
sequence length
Simple Alignments No gap(1)
For 2 sequences with different lengths and no gap is inserted Obtain the optimal alignment by sliding the shorter sequence and scoring each alignment Scoring function e.g match score =1, mismatch score= 0 Example: Two sequences with different lengths
CGTTAGA
CGTAC
How many possible alignments? What is the optimal alignment? If match score =1 mismatch score =0 ,
Simple Alignments No gap(2)
Answer: There are 3 different alignments
CGTTAGA
CGTAC
Score = 3
Optimal alignment
CGTTAGA
CGTAC
Score = 2
CGTTAGA
CGTAC
Score = 0
Gaps
Add gaps to make lengths of 2 sequences equal
More possible alignments than simple alignment
Gap Penalty is added to the scoring function, i.e. gap penalty, match score, mismatch score
Gap penalty usually has lower score than match and mismatch score
Simple Gap Penalty
Example
A T G C C A T
A T -- T C -- --Calculate the score of the above alignment with gaps if gap penalty = -1; match score = 1; Mismatch score = 0; Answer: Number of match = 3 Î 3 scores Number of mismatch = 1 Î 0 score Number of gap = 3 Î -3 scores Score of alignment = 3+0-3 = 0