Architectures Enabling Scalable Internet SearchVon der Fakult¨at fur¨ Mathematik, Informatik und Naturwissenschaften derRheinisch-Westf¨alischen Technischen Hochschule Aachen zur Erlangung desakademischen Grades eines Doktors der Naturwissenschaften genehmigteDissertationvorgelegt vonDiplom-Informatiker Axel Uhlaus Hutten¨ tal, jetzt SiegenBerichter:Universit¨atsprofessor Dr. Horst LichterUniversit¨atsprofessor Dr. Boudewijn HaverkortTag der mundlic¨ hen Prufung:¨ 4.12.2003Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verfugbar.¨AbstractThe vast amount of Internet content becomes manageable mainly by means of search engines thatallow users to enter queries into a web form and receive as result a list of matches that refer toIntenet content elements, such as the URLs identifying matching HTML pages. However, thequality of these search engines suffers from two conceptual problems. The content volume growsfaster than the bandwidth available to index it, and a large and growing share is “hidden” in thedeep web, e.g. behind HTML forms, making it hard to reach and index by search engines.TheworkpresentedhereshowsthattheseproblemscanbeovercomeiftheparadigmofInternetsearchisreversed: contentprovidershavetoassistinmakingtheircontentsearchable. Thisleadstoa distributed architecture that scales better than the central approach that current search enginesimplement, and that makes the deep web searchable.