Bioinformatic Approaches forGenome FinishingPh. D. Thesissubmitted to theFaculty of Technology,Bielefeld University, Germanyfor the degree of Dr. rer. nat.byPeter HusemannJuly, 2011Referees:Prof. Dr. Jens StoyePD Dr. Andreas TauchGedruckt auf alterungsbeständigem Papier nach DIN-ISO 9706.Printed on non-aging paper according to DIN-ISO 9706.To my brother.Contents1 Introduction 12 Sequencing Technologies – Biological Background 52.1 DNA – The Backbone of Life on Earth . . . . . . . . . . . . . . . . . . . 52.2 Technologies to Assess the Sequence of DNA . . . . . . . . . . . . . . . 82.2.1 Traditional: Sanger Sequencing . . . . . . . . . . . . . . . . . . . 82.2.2 The ‘Next’ Generation: Massively Parallel Sequencing . . . . . 92.2.3 Third Single Molecule Sequencing . . . . . . . . . 132.3 Genome Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 Shotgun Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Assembly Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.3 Genome Finishing . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Efficient Matching of Contigs 233.1 Finding Local Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . 243.1.1 Smith-Waterman . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.2 Seed and Extend Heuristics . . . . . . . . . . . . . . . . . . . . . 273.1.3 Search Space Filtering . . . . . . . . . . . . . . . . . . . . . . . . 283.