The world cup is over, Germany is the new world champion and Brazil was brutally walloped by the Germans in the semi-final. There were a lot of predictions prior to the world cup and most of them (like fivethirtyeight) forecasted Brazil as the new world champion. The main reason for this prediction was primarily the fact, that Brazil always had an easy match with foreign countries in their home country.
And so I wondered if there is a statistical significant correlation between the travel distance of a team to venue of the world cup and the number of victories in the cup. I collected data (number of games played in the world cup, victories, goals, traveling distance) from the last four world cups (2002: Japan and South Korea, 2006: Germany, 2010: South Africa, 2014: Brazil). With the purpose of simplifying the model, the traveling distance is the bee-line of a country’s capital to the capital of the venue of the world cup. So for example, the travel distance from Germany to Brazil is denoted with 9682 kilometres because the distance between Berlin and Brasilia is approximately 9682 km. The data source (csv, German) is available here.
First, I draw a scatterplot with the distance of a team to the venue on the x-axis and the total number of won games on the y-axis respectively.
As we see, the plot is pretty noisy. The only thing we can derive is that teams with a traveling distance over 15.000 kilometres have never won more than 1 game in the last 4 world cups except Brazil in the year 2002. In 2002, Brazil won all seven possible games in the world cup (3 in the group stage and 4 in the knockout stage).
An interesting thing is the “victory gap” between 4000 and 8000 km and the “victory hill” from 8000 to 12000 km.
A possible explanation for this is that a lot of countries are close together (like in Europe) and a lot of these countries had respectable good results even when they had to travel a long distance to South Africa or to Japan/South Korea.
To get a better understanding if traveling distance and victories are correlated, I performed a Spearman’s rank correlatio test (note that neither “victories” nor “distance to venue” are normally distributed variables) with the command
this is the result:
Spearman's rank correlation rho data: wmdata$distance. and wmdata$victories S = 402701.1, p-value = 0.08632 alternative hypothesis: true rho is not equal to 0 sample estimates: rho -0.1522073
The rho value is -1 if there is a perfect negative correlation and it shows 1 if there is a perfect positive correlation. A value close to 0 means, there is no correlation at all. In our case, rho is -0.15 which is pretty low. But, the Spearmans correlation only measure how monotonic functions are. In our case, monotony is not given which means, the test does not perfectly fit for our data.
Conclusion: There is little correlation between the traveling distance and the performance of a team in the world cup. I also tested if a team is more successful if the country is located in the same climate zone as the venue – again, no correlation.