The chi squared test

Posted in Maths, Statistics 2

The Chi squared test, or Dynamic image 0 statistic, is a test which is made on a contingency table - a table which contains a number of recorded values of variables for a number of items, for example:

School Pass Failed
School A 60 62
School B 40 56
School C 70 98

By looking at these statistics you might jump the conclusion that one school is better than another based on the amount of pupils that passed. The Chi squared test aims to provide a way for you to test whether the differences in the results are significant enough for that conclusion to be justified.

Calculating the test statistic

The first step is add total columns and rows and fill in the cells:

School Pass Failed Total
School A 60 62 122
School B 40 56 96
School C 70 98 168
Total 170 216 386

The next step is to calculate the expexted frequencies of each cell on the original contingency table. This is the amount of pupils you would expect to pass or fail if there is no link between the schools and the pass rate.

The expected frequency of a table cell is given by:

Dynamic image 1

So, for example the expected frequency of passes for school A is:

Dynamic image 2

You can work on the rest of the cells on the table. TIP The row totals will not change, so you won't have to calculate every cell as you can work out the remainder by using the row total.

The table of expected frequencies can now be drawn up:

School Pass Failed Total
School A 53.7 68.3 122
School B 42.3 53.7 96
School C 74 94 168
Total 170 216 386

It's time for another table - but it's the last one! Next you want to ditch you total columns return to the original contingency table. This time however the values in each cell are given by the equation:

Dynamic image 3

Where f0 is the observed frequency for that cell, i.e. the original value from the original contingency table, and fe is the expected frequency, i.e. the value from the corresponding cell on the tabe you previously drew.

So, for example, the value for the first cell - the pass rate for school A, is calculated using the observed frequency, 60, from the original table and the expected frequenct, 53.7, from the table you just calculated. Hence, the value is:

Dynamic image 4

Calculating the other values:

School Pass Failed
School A 0.739 0.581
School B 0.125 0.0985
School C 0.216 0.17

The chi squared test statistic is now the sum of all these values:

Dynamic image 5

Whilst the maths involved in calculating the chi squared test statistic isn't particularly hard, it is time consuming. As a result you can expect any question involving the chi squared test in the exam to be worth 10-12 marks.

Hypothesis testing

As with any hypothesis test you need to compare your value for the chi squared statistic with the critical value found in the tables booklet. However before you can find the critical value you need to find the 'degrees' of freedom, denoted by v. This is found using the equal:

Dynamic image 6

Where m is the number of rows in the original contingency table and n is the number of columns. Using the example above the value for v is:

Dynamic image 7

From the tables of critical values, the critical value for the chi squared test statistic at the 5% significance level for 2 degrees of freedom is 5.991. As the chi-squared test statistic, 1.9295, is much less than the critical value, so we can say that there is no association between the school and the amount of students that pass.