Linear Regression and Correlation
Linear Regression and Correlation
When analyzing data on a scatter plot, we often want to know if there is a relationship between the two variables. Linear regression helps us model this relationship using a straight line, while correlation tells us how strong that relationship is.
The Line of Best Fit
A line of best fit (or trend line) is a straight line drawn through the center of the data points on a scatter plot. It best represents the general trend of the data.
- The line should follow the general direction of the data cluster.
- It should have roughly an equal number of points above and below it.
Once you have a line of best fit, you can use its equation to make predictions.
Example: Suppose you draw a line of best fit for a scatter plot, and you determine its equation is y=2.5x+4. If you need to predict the value of y when x=10, you simply substitute 10 for x: y=2.5(10)+4=25+4=29
The Correlation Coefficient (r)
The correlation coefficient, denoted by the letter r, measures both the strength and the direction of a linear relationship between two variables.
The value of r always falls in the range of −1 to 1: −1≤r≤1
Direction
- Positive Correlation (r>0): As x increases, y increases. The line of best fit slopes upward.
- Negative Correlation (r<0): As x increases, y decreases. The line of best fit slopes downward.
Strength
- Strong Relationship: The closer r is to 1 or −1, the tighter the data points cluster around the line of best fit.
- Weak Relationship: The closer r is to 0, the more scattered the points are.
- No Relationship: If r=0, there is no linear correlation at all.
Example: Interpreting r
Problem: Interpret a correlation coefficient of r=−0.87.
Solution:
- Look at the sign: The negative sign means there is a negative correlation (as one variable goes up, the other goes down).
- Look at the number: The absolute value 0.87 is very close to 1.
Therefore, r=−0.87 indicates a strong, negative linear relationship between the two variables.