# Australian Bureau of Statistics

 ABS Home #search{vertical-align:-3.5px; }
 Understanding statistics

Module 3: Interpreting Data

5.3 Scatter plots

5.3.4 The correlation coefficient, r

The strength of linear association can be measured using a number - the correlation co-efficient. Correlation measures the strength and direction of a linear relationship. Rather than use subjective word descriptors such as "strong positive correlation", r gives a numerical measure.

The correlation coefficient, r, has a specific range of values:

Note that:
• r never, ever lies outside this range, therefore r = 2 is a nonsense answer whose only explanation can be "I made an arithmetic error".
• r = 1 is perfect positive correlation and all the data points lie exactly on a straight line with positive gradient.
• r = -1 likewise is perfect negative correlation.
How do I find the r value for a data set?

Steps you need to follow:
1. draw the scatterplot;
2. draw the trend line which describes the direction of the data;
3. evaluate how closely the cloud of data points clusters around the line;
4. determine what r value and what word descriptor best suits the data cloud.
The following diagram has a number line of r values to help you assigning the numbers and the word descriptors.

Consider the following examples of scatterplots.

These have the cloud of data points and a trend line fitted to show the direction of the data.

It would be helpful for you to memorise these to assist you describe your own data sets:

 zero correlation weak negative correlation r = -0.3 moderate positive correlation r = 0.5 moderate negative correlation r = -0.6 strong positive correlation r = 0.8 strong positive correlation r = 0.95 And now consider some negative gradients: weak negative correlation r = -0.40 moderate negative correlation r = -0.65 moderate negative correlation r = -0.75 strong negative correlation r = -0.85

 Scenario (Moore, 1995) Archaeopteryx is an extinct animal that possessed both scales and feathers and at one stage was thought to be the 'missing link' between lizards and birds. Only six fossil specimens exist and they vary greatly in size. As a result, there has been a lot of discussion about whether the fossils all belong to one species or to different species. In order to help answer this question, data from the length (cm) of the femur (a leg bone) was plotted against the length of the humerus (a bone in the arm) on a scatter plot. Data were available for five of the specimens. Comment: If the specimens belong to the same species and the differences are due to differences in size because of age, then the points should show a positive (but not necessarily linear) relationship. If any of the plotted points was an outlier from the bivariate pattern shown by the other points, this might suggest (but not prove) that the point represented a specimen from a different species.

 Test your knowledge Question What does the scatterplot indicate? Answer No association indicates that five separate species are present. An outlier can be observed indicating the presence of two separate species. There is a strong positive association indicating the specimens belong to one species. It is not possible to make a statement about the number of species present from the scatterplot. Click here for answers

Assets and Incomes for 20 US Banks (1973)

 Scenario 1969-1979 Assets and Liabilities of all Commercial Banks in the United States (H.8)

 Test your knowledge Question How strong is the relationship? Answer No relationship A banks income can be predicted accurately from their assets There is some relationship between assets and income for a bank. Click here for answers

TEST EXAMPLES
Appropriate frames / boxes
Estimate the strength of association (correlation coefficient) for the following scatterplots:
 If you said r = 0 that is a good estimate – the exact value is r = -0.08 If you said somewhere from r = 0.1 to r = 0.3 that is a good estimate – the exact value is r = 0.22 If you said somewhere from r = -0.3 to r = -0.5 that is a good estimate – the exact value is r = -0.45 If you said somewhere from r = 0.3 to r = 0.5 that is a good estimate – the exact value is r = 0.38 If you said somewhere from r = 0.8 to r = 0.9 that is a good estimate – the exact value is r = 0.87 If you said somewhere around r = -0.95 that is a good estimate – the exact value is r = -1.00 exactly If you said somewhere from r = 0.5 to r = 0.7 that is a good estimate – the exact value is r = 0.63 If you said somewhere from r = - 0.65 to r = -0.8 that is a good estimate – the exact value is r = -0.75

 Privacy | Disclaimer | Feedback | | © Copyright| Sitemap| Online Security