Wednesday, 14 March 2018

Chi-Square Test for Independence - Question 17(A research team...)

Question 17
A research team investigated whether there was any significant correlation between the severity of a certain disease runoff and the age of the patients. During the study, data for n = 200 patients were collected and grouped according to the severity of the disease and the age of the patient. The table below shows the result

below 40 40 - 60 above 60
runoff slight 41 34 9
average 25 25 12
serious 6 33 15
Let us decided about the correlation between the age of the patients and the severity of disease progression.

Solution Steps
As usual, we need to understand the problem and decide on which particular test to carry out.

In this case, since the question says to investigate whethere there was any significant correlation between the severity and age, it means that the null hypothesis would be that 'there is correlation between the age and the severity'. That is the hypothesis we are going to test.

Step 1: State the null and alternate hypothesis
H0: there is significant correlation between the severity and the age
Ha: there is no significant correlation between the severity and the age

Since we are going to be using Excel to simplify the solving of this problem, I have transfered the table to MS Excel. This is shown in Table 1. You can get the completed excel sheet from here

Table 1

Step 2: Calcualte the totals
In this step we calculate the totals for each of the row. This i have done using excel formula as you can see in Table 2

Table 2

Step 3: Calculate the expected values
The expected values are calculated by multiplying the corresponding row and column sub-total and dividing  by the grand-total. For example, the first expected value that corresponds to Slight and Below 40 would be calculated as follows:

 Do this for all the 9 observed values. I have used excel to automatically generate these values and it is shown in Table 3

Table 3

Step 4: Calculate Squared Difference (O-E)2
Where O is the observed values in Table 2 and E is the expected values calcualted in Table 3. The first squared difference would be.

Do this for all the the observed values and the corresponding expected values. The resulting sets of values is given in Table 4

Table 4

Step 5: Calculate the Component
This is the squared deviation you calculated in step 4 divided by the corresponding expected values. For the first value it would be

If you repeat this all the values, then the resulting table would be table 5.

Step 6: Calculate the Test Statistic

This is the sum of all the terms in calculated in the table. I calculated this using the Sum() formula in Excel, but you can do this by hand just to verify.

Test Statistic = 3.83 + 0.56 + 2.48 + 0.32 + 0.43 + 0.06 + 9.29 +2.68 + 2.87 = 22.52

Step 7: Look up the critical Value from Chi-Square table
Get Statistical table from here
First we calcuale the degrees of freedom
df = (3-1) * (3-1) =  4
alpha = 0.01

The critical value from the table of Chi-Square distribution is written as

K0.01, 4 = 13.28

Step 8: State your conclusion
Since the calculated value of the test statistic is greater than the critical value, we therefore reject the null hypothesis and conclude that the data is not related.

The whole tables are shown below, you can also download it for free

Download this excel sheet for free