Facebook Pixel
Mathos AI logo

Linear Regression and Correlation

Linear Regression and Correlation

When analyzing data on a scatter plot, we often want to know if there is a relationship between the two variables. Linear regression helps us model this relationship using a straight line, while correlation tells us how strong that relationship is.

The Line of Best Fit

A line of best fit (or trend line) is a straight line drawn through the center of the data points on a scatter plot. It best represents the general trend of the data.

  • The line should follow the general direction of the data cluster.
  • It should have roughly an equal number of points above and below it.

Once you have a line of best fit, you can use its equation to make predictions.

Example: Suppose you draw a line of best fit for a scatter plot, and you determine its equation is y=2.5x+4y = 2.5x + 4. If you need to predict the value of yy when x=10x = 10, you simply substitute 1010 for xx: y=2.5(10)+4=25+4=29y = 2.5(10) + 4 = 25 + 4 = 29

The Correlation Coefficient (rr)

The correlation coefficient, denoted by the letter rr, measures both the strength and the direction of a linear relationship between two variables.

The value of rr always falls in the range of 1-1 to 11: 1r1-1 \le r \le 1

Direction

  • Positive Correlation (r>0r > 0): As xx increases, yy increases. The line of best fit slopes upward.
  • Negative Correlation (r<0r < 0): As xx increases, yy decreases. The line of best fit slopes downward.

Strength

  • Strong Relationship: The closer rr is to 11 or 1-1, the tighter the data points cluster around the line of best fit.
  • Weak Relationship: The closer rr is to 00, the more scattered the points are.
  • No Relationship: If r=0r = 0, there is no linear correlation at all.

Example: Interpreting rr

Problem: Interpret a correlation coefficient of r=0.87r = -0.87.

Solution:

  1. Look at the sign: The negative sign means there is a negative correlation (as one variable goes up, the other goes down).
  2. Look at the number: The absolute value 0.870.87 is very close to 11.

Therefore, r=0.87r = -0.87 indicates a strong, negative linear relationship between the two variables.