Back Navigation Next Navigation Machine Learning Overview (page 9 of 13)

What happens if an observation is contained in both the training and testing data sets? This results in data leakage, which means our training and testing data sets are contaminated (analagous to contaminated drinking water). We cannot train and test a model on the same data (even a single observation contained in both data sets invalidates the process).

Important Note: If you search the internet for "data leakage", you will find several websites that use the term synonymously with target leakage (explained in a few pages). In this course, I will use the terms data leakage and target leakage to refer to two distinctly different issues.

ML Overview