goal is to take a snapshot of the data and get the best approximation that models the data the best while condensing it down
find best fit of data points by measuring the projected distance against a potential line of best fit
now, the data has been reduced from 2d to 1d; you can now apply statistics on top of this data: mean, variance
variance for 1d is easy, what about 2d?
Covariance is the sum of the product of the coordinates
before it was sum of square of distances, now you can do (2 + 0 + 2)/3 as your mean of sorts
we differentiate the types of covariance by whether they are positive or negative which ultimately decides the type of correlation (negative correlation is as x grows, y decreases)