• goal is to take a snapshot of the data and get the best approximation that models the data the best while condensing it down

  • find best fit of data points by measuring the projected distance against a potential line of best fit

    • the projection of the point is the perpendicular of the line and the point
  • now, the data has been reduced from 2d to 1d; you can now apply statistics on top of this data: mean, variance

    • Variance: measures the distance of points from the mean/middle line and that distance is squared → if mean = (a+b+c)/3, then variance = (a^2 + c^2 + ([a+c]/2)^2)/3
    • VARIANCE IS A MEASURE OF HOW SPREAD OUT A SET IS
  • variance for 1d is easy, what about 2d?

    • you have a measure for the x and y variane both independently, but this is flawed because we can't actually use 2 scalars and differentiate between them (fail to take into account direction, plane, etc)
    • better idea is to use the product of coordinates → known as COVARIANCE
      • Covariance is the sum of the product of the coordinates

        • before it was sum of square of distances, now you can do (2 + 0 + 2)/3 as your mean of sorts

        • we differentiate the types of covariance by whether they are positive or negative which ultimately decides the type of correlation (negative correlation is as x grows, y decreases)