WikiBench: A distributed, Wikipedia based web application benchmark Master thesis by Erik-Jan van Baaren Student number 1278967 erikjan@gmail.com Under the supervision of: Guillaume Pierre Guido Urdaneta Vrije Univesiteit Amsterdam Department of Computer Science May 13, 2009Abstract Many di erent, novel approaches have been taken to improve throughput and scalability of distributed web application hosting systems and relational databases. Yet there are only a limited number of web application bench- marks available. We present the design and implementation of WikiBench, a distributed web application benchmarking tool based on Wikipedia. Wik- iBench is a trace based benchmark, able to create realistic workloads with thousands of requests per second to any system hosting the freely available Wikipedia data and software. We obtained completely anonymized, sam- pled access traces from the Wikimedia Foundation, and we created software to process these traces in order to reduce the intensity of its tra c while still maintaining the most important properties such as inter-arrival times and distribution of page popularity. This makes WikiBench usable for both small and large scale benchmarks. Initial benchmarks show a regular day of tra c with its ups and downs. By using median response times, we are able to show the e ects of increasing tra c intensities on our system under test.Contents 1 Introduction 2 2 Related Work 4 2.1 TPC-W . . . . . . . . . . . . . . . . . . . . . . . ...