Institut fu¨r InformatikArbeitsgruppe Maschinelles LernenLearning under Differing Trainingand Test DistributionsDissertationzur Erlangung des akademischen Grades“doctor rerum naturalium”(Dr. rer. nat.)in der Wissenschaftsdisziplin Informatikeingereicht an derMathematisch-Naturwissenschaftlichen Fakult¨atder Universit¨at PotsdamvonSteffen BickelPotsdam, den 22.07.2009 Published online at the Institutional Repository of the University of Potsdam: URL http://opus.kobv.de/ubp/volltexte/2009/3333/ URN urn:nbn:de:kobv:517-opus-33331 [http://nbn-resolving.org/urn:nbn:de:kobv:517-opus-33331] AbstractOneofthemainproblemsinmachinelearningistotrainapredictivemodelfromtrain-ing data and to make predictions on test data. Most predictive models are constructedundertheassumptionthatthetrainingdataisgovernedbytheexactsamedistributionwhich the model will later be exposed to. In practice, control over the data collectionprocess is often imperfect. A typical scenario is when labels are collected by question-naires and one does not have access to the test population. For example, parts of thetest population are underrepresented in the survey, out of reach, or do not return thequestionnaire. In many applications training data from the test distribution are scarcebecause they are difficult to obtain or very expensive.