In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, I present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? I will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, I discuss three common simple data set shifts, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant projections, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which I will discuss in the last section. I conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.