Monday, 26 March 2018

What is Run Test in Statistics - A Simple Explanation with Step by Step Examples

Good to see you again! Hope you are doing fine
😀
The Run Test is actually one of the most interesting statistical test ever and it is so easy to understand, or even the easiest. Remember that if you have any difficulty following this lesson, let me know in the comment box below. 👇👇👇 I would clearify
 

In this simple lesson, we are going to explain in very simple and clear terms, the concept of run test in statiscs

Content
  1. What is  Run Test?
  2. What is a Run?
  3. Example of Runs 
  4. Run Test Procedure
    • Hypothesis
    • Test Statistic and Decision Rule
    • Critical Value
  5. Example 1 and Solution
  6. Example 2 and Solution
  7. Example 3 and Solution
  8. Solving With Formular for large samples 
  9. Final Notes


1. What is Run Test


Run test is a statistical test used to determine of the data obtained from a sample is ramdom. That is why it is called Run Test for Randomness.
Randomness of the data is determined based on the number and nauture of runs present in the data of interest.

2. What is a Run?


A run is a sequence of similar or like events, items or symbols that is preceded by and followed by an event, item or symbol of a different type,  or by none at all.
Randomness of of the series  is unlikely when there appear to be either too many or two few runs. In this case, a run test need to be carried out to determine the randomness.
The Run Test when performed helps us to decide whether a sequence of events, items or symbol is the result of a random process.

3. Example of Runs


A data scientist carrying out a research interviewed 10 persons during a survey. We denote the genders of the poeple by M for men and W for women.
Assuming the respondents were chosen as follows:

Scenario 1
M M M M M F F F F F

Scenario 2
F M F M F M F M F M

Scenario 3
F F F M M F M M F F

Scenario 1 has only 2 runs and therefore the scenario cannot be considered random because there are to few runs

Scenario 2 has too many runs, 5 runs. And therefore would not be considered as random

Scenario 3 has 5 runs and therefore we need to perform a test to determine the randomness of the data.


4. Run Test Procedure


First we need to assume that the data available for the analysis consistes of a sequence of observations, recorded in order of occurence, which we can categorize into two mutually exclusive types.

First, you need to determine the total sample size, then the number of observation ofeach type as presented below:

n = total sample size
n1 = the number of observation of one type
n2 = the number of observations of the other type

Hypothesis
Then State the null and alternate hypothesis

A. TWO-SIDED
H0: the pattern of occurence is random
H1: the pattern of occurence is not random

B. ONE-SIDED
H0: the pattern of occurence is random
H1: the pattern of occurence is not random (because there are too few runs to be atributed as random)

C. ONE-SIDED
H0: the pattern of occurence is random
H1: the pattern of occurence is not random (because there are too few runs to be atributed as random)


Test Statistic and Decision Rule
The test statistic is r = total number of runs
The decision rules is also called the acceptance or rejection criteria. It depends on the test statistic(calculated) and the value and the values of upper and lower limits(from statistical tables)

Table 1: Decision Rule


Critical Value
Critical value is determined from statistical table using n1 and n2








We can solve some examples to clarify this.  



5. Example 1


On a commuter train, the conductor want to see whether the passengers entering a train enter in a random manner. He observes the first 25 people, with the following sequence of males(M) and females(F).

F F F M M F F F F M F M M M F F F F M M F F F M M

Test for randomness at α = 0.05

Solution Steps
Step 1: State the null and alternate hypothesis
H0: The patter of occurence of males and females enter the train is random
H1: The pattern of occurence of males and females entering the train is not random

Step 2: Find the test statistic (number of runs)
You can easily get this by grouping each run as shown below:

FFF  MM   FFFF   M   F   MMM   FFFF   MM   FFF   MM

Test statistic, r = 10
n1 = number of females = 15
n2 = number of males = 10

Step 3: Find the critical value
We can find the lower and upper critical value from statistical run table
n1 = 15, n2 = 10
Lower critical value = 7
Upper critical value = 18

Step 4: Make your decision
Since r = 10 which is between 7 and 18, we accept the null hypothesis (we fail to reject it)

Step 5: Draw a Conclusion
There are not enough evidence to reject the claim hat the pattern of occurence of males and femals enter the train is determined by a random process



6. Example 2


We have 20 people that enrolled in a drug abuse program. Test the claim that the ages of the people, according to the order in whihc they enroll occur at random, at α = 0.05.

The data are as follows:
18, 36, 19, 22, 25, 44, 23, 27, 27, 35, 19, 43, 37, 32, 28, 43, 46, 19, 20, 22

Solution Steps
Step 1: State the hypothesis and identify the claim

The claim is the null hypothesis H0 and the hypothesis is the alternate hypothesis H1.
H0: The pattern of occurence of ages of the people enrolled in a drug abuse program is determined by a random process
H1: The pattern of occurence ofa ages of people enrolled in a drug abuse program is not random

Step 2: Find the test statistic (number of runs)
To find the number of runs we first arrange the data  in ascending order and find the median of the data set.

Then compare the original data with the median. The replace the above median in the original sequence with an A if it is above the median and with B if it is below the median.(you can also use the mean instead of median)
I have done this using excel and the result is shown below:


We can now arrange the data according to runs and we would have the output below:
B  A  BBB   B  A  B  AAAAAA   BBB

From the above  we have

Test statistic, r = 9
n1 = number or A runs = 9
n2 = number of B runs =9


Step 3: Find the Critical Value
n1 = 9, n2 = 9
From statistical table of Runs Test, we get the critical values

Upper critical value = 5
Lower critical value =15

Step 4: Make the decision
Since the statistic r is between the upper and lower critical values, we accept H0


Step 5: Draw your conclusion
There is not enough evidence to reject the claim that the patter of occurence of ages of people in th program is determined by a random process



7. Example 3:


Table 1.0 shows the departures from normal of daily temperatures recorded at Atlanta, Georgia during February1969. We would like to know whether we may conclude that the pattern of departures above and below normal is the result of a non-random process.


Solution Steps

Step 1: State the null and the alternate hypothesis
H0: The pattern of occurence of negative and positive deviations from normal is determined by a random process
H1: The pattern of occurrences of negative and positive deviations from normal is not determined by a random process (claim)

Step 2: Find the test statistics (number of runs)
To get the number of runs, we need to find the departures from normal above and below zero. The departures from normal that is above 0 are recorded as A and those that are below 0 are recorded as B.
If we do this we would have the arrangement as follows:

AAAAAA B A B AA BBBBB AAAAAAAA BBBBBB

Test Statistic (number of runs) r = 8
n1 = number of A = 17
n2 = number of B = 13

Step 3: Find the critical value
n1 = 17, n2 = 13

Using statistical table  we find the:
Lower critcal value = 10
Upper critical value = 22

Step 4: Make the decision
Since r = 8, which is lower than the critical value, we reject the null hypothesis (H0)

Step 5: Draw the conclusion
There is enough evidence to support the claim that the pattern of occurence of positive and negative departures from normal is not random

8. Formular for Large Samples


What if the sample size is large? In this case we could use a formula to solve it. This formular calculates the the test statistic based on n1 and n2.
The formula is given by:


Then we can look up the critical value in the table of normal distribution

9. Final Notes


Now that you have completed the lessons on run tests. Thumbs up to you! One thing you can be sure is it does not get more complicated than this.
Just a a quiz, try to solve the three examples using the formula presented and compare the result you have with the result gotten without using the formula