Kruskal- Wallis Non-Parametric AOV
Non-parametric AOV, as with other non-parametric tests, uses ranked data. The non-parametric form of AOV is called the Kruskal- Wallis test, and the test statistic is: H k 12 N( N 1) i 1 R n 2 i i 3( N 1) where R i is the sum of the ranks in category i, and n i is the number of observations in category i, and N is the pooled observations.
The procedure is simple: Rank the pooled set of N observations from lowest to highest, with the lowest rank being 1. Sum the ranks in each category. Plug the numbers into the equation.
Interest on International Debt, 2007 Country Region Interest07 Country Region Interest07 Country Region Interest07 Cameroon AF 0.90 Argentina LA 6.55 Bangladesh SE 2.11 Chad AF 1.34 Bolivia LA 2.71 Bhutan SE 9.69 Ethiopia AF 1.25 Brazil LA 7.25 Cambodia SE 1.54 Gabon AF 6.75 Colombia LA 7.88 Lao PDR SE 1.50 Kenya AF 1.06 Costa Rica LA 5.58 Malaysia SE 6.77 Mali AF 1.14 Ecuador LA 5.36 New Guinea SE 0.75 Namibia AF 1.37 Guatemala LA 5.85 Philippines SE 5.20 Nigeria AF 0.78 Honduras LA 1.81 Thailand SE 2.00 Tanzania AF 0.77 Jamaica LA 7.21 Vietnam SE 2.99 Uganda AF 0.79 Nicaragua LA 1.82 Zambia AF 1.32 Paraguay LA 5.30 Peru LA 6.92 Venezuela LA 6.58
Ranked Interest on International Debt, 2007 Country Region Interest07 Country Region Interest07 Country Region Interest07 Cameroon AF 5 Argentina LA 25 Bangladesh SE 17 Chad AF 10 Bolivia LA 18 Bhutan SE 33 Ethiopia AF 8 Brazil LA 31 Cambodia SE 13 Gabon AF 27 Colombia LA 32 Lao PDR SE 12 Kenya AF 6 Costa Rica LA 23 Malaysia SE 28 Namibia AF 7 Ecuador LA 22 New Guinea SE 1 Mali AF 11 Guatemala LA 24 Philippines SE 20 Nigeria AF 3 Honduras LA 14 Thailand SE 16 Tanzania AF 2 Jamaica LA 30 Vietnam SE 19 Uganda AF 4 Nicaragua LA 15 Zambia AF 9 Paraguay LA 21 Peru LA 29 Venezuela LA 26 Rank Sums 92 310 159
1) 3( 1) ( 12 1 2 N n R N N H k i i i 18.68 102 10970.8) (0.011 1) 3(33 9 159 13 310 11 92 1) 33(33 12 2 2 2 H H H
The Kruskal-Wallis H table is VERY limited in terms of the sample size displayed. It is only really useful for very small sample sizes. The critical value of H for larger samples sizes or where k > 5 is approximated by χ 2 table with k 1 degrees of freedom, where k is the number of groups.
Therefore: H 18.68 Critical value: χ 2 critical 5.991 Since 18.68 > 5.991, reject H 0. There is a significant difference in interest rates in 2007 among the regions (H 18.68, p < 0.001).
Occasionally we find that we have tied ranks. There are two additional procedures that must be performed: 1. Give the tied ranks the average rank. 2. Apply for following adjustment to H. C 1 i N ( t 3 3 i t N i ) where t i is the number of observations tied at a given rank summed over all sets of ranks.
Chilean Nitrate Processing Facilities
North Chilean Nitrate Processing Facilities Middle South
North Prod Middle Prod South Prod Jazpampa 40000 Agua Santa 150000 Buen Retiro 53000 La Patria 45000 Amelia 50000 Cala Cala 25000 Paccha 100000 Aurora 50000 Humberstone 80000 San Patricio 40000 Democracia 50000 Mercedes 40000 Santa Rita 45000 Primitiva 300000 Paposo 35000 Union 35000 Puntunchara 60000 Pena Chica 40000 Rosario de Huara 180000 San Donato 75000 San Jorge 100000 Sebastopol 70000 Santa Rosa de Huara 70000 Slavia 40000
North Rank Middle Rank South Rank Jazpampa 19 Agua Santa 3 Buen Retiro 11 La Patria 15.5 Amelia 13 Cala Cala 24 Paccha 4.5 Aurora 13 Humberstone 6 San Patricio 19 Democracia 13 Mercedes 19 Santa Rita 15.5 Primitiva 1 Paposo 22.5 Union 22.5 Puntunchara 10 Pena Chica 19 Rosario de Huara 2 San Donato 7 San Jorge 4.5 Sebastopol 8.5 Santa Rosa de Huara 8.5 Slavia 19 Rank Sums 96 87 117 These date were ranked from largest production to smallest.
North Middle South Rank Sum 96 87 117 n 6 10 8 N 23 H H 2 2 2 12 96 87 117 ( 3(24 24(24 1) 6 10 8 (0.02 (1536 756.9 1711.1)) 75 1) H (0.02 4004) 75 H 5.08
Primitiva 1 Rosario de Huara 2 Agua Santa 3 Paccha 4.5 t San Jorge 4.5 t Humberstone 6 San Donato 7 Santa Rosa de Huara 8.5 t Sebastopol 8.5 t Puntunchara 10 Buen Retiro 11 Amelia 13 t Aurora 13 t Democracia 13 t La Patria 15.5 t Santa Rita 15.5 t Jazpampa 19 t San Patricio 19 t Slavia 19 t Mercedes 19 t Pena Chica 19 t Union 22.5 t Paposo 22.5 t Cala Cala 24 We have many tied ranks: 4 set of 2 tied ranks (red) 1 set of 3 tied ranks (green) 1 set of 5 tied ranks (blue)
0.988 13800 168 1 13800 120 24 6 6 6 6 1 24 24 5) (5 3) (3 2) (2 2) (2 2) (2 2) (2 1 3 3 3 3 3 3 3 C C C C So the correction is: 14 5. 0.988 5.08 H N N t t C i i i 3 3 ) ( 1
Df k 1 or 3-1 or 2
Thus we get: Critical χ 2 value 5.991 Since 5.14 < 5.991, accept H0 There is no significant difference in nitrate production among the three oficina groups (Kruskal-Wallis χ 2 5.14, 0.10 < p < 0.05). SPSS confirms our results.
In terms of correcting for ties: Ties in the data make the H value a somewhat less than it should be, so the correction increases the size of H. Small ties (where 2 observations are tied) do not influence the results very much unless there are a VERY large number of them. Situations where there are multiple large ties (where 4 or 5 observations are tied) and where few of the ranks are not tied will have an influence on the results.
Thoughts on tied ranks: If your data has a very large number of ties then it lacks variation. A lack of variation in the data makes it difficult to say anything meaningful about any differences you may happen to find.