BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

The Wisdom of Crowds: Predicting Euro 2012 Group Play Outcomes

Following
This article is more than 10 years old.

The 2012 European Championship kicks off this Friday, and with such a big tournament always comes no shortage of predictions as to its outcome.  I am not immune to such prognostications, but my access to sophisticated models is limited.  Therefore, I have returned to a method I used during the group play stage of the 2010 World Cup group: utilizing the model for international match play found in Soccernomics.

The second edition of the book was released just in time for the tournament, and what seems to be a very planned update includes a European-specific version of the model they introduced in the first edition of the book.  The model uses home pitch advantage, population (a measure of the available pool of potential players), GDP per capita (a measure of wealth that is indicative of how much time and how many resources a country has to invest in the national team), and number of games played (an estimate of the national team’s experience) to predict the goal differential within a match.  Per Chapter 17 (page 369) in the second edition, the following statistics hold true for European national teams:

  1. Home pitch advantage is worth half a goal per match, lower than the 2/3 a goal seen worldwide.
  2. Experience is also worth less than on the world stage given a continent full of knowledge networks and no shortage of opportunities for friendlies within the historical home of the game.  This means that having double the number of matches played versus the opposition only provides a 0.3 goal differential benefit.
  3. Doubling up the opposition’s population gives a national team a 0.25 goal differential benefit.
  4. Doubling up the opposition's per capita GDP gives a national team a 1/6 goal differential benefit.

These observations are then applied to the more general world-wide model that was disclosed in the UK-version of Wired magazine prior to World Cup 2010.  Adjusting the coefficients to achieve the above relationships yields a revised equation that can then be used to estimate goal differential by match and thus a sum of goal differential after the three group play matches.  That sum is then used to rank the teams within the group to give an idea as to where they might finish.

At first it might seem relatively straightforward to apply the above equation to countries involved in the Euro 2012 tournament and come up with predictions for the outcome of the group play stage.  After all, the data seems readily available with GDP per capita data compiled in the CIA World Fact Book, and European nations readily reporting their population totals.  The difficulty in generating the numbers for the model comes in identifying a reliable source for the number of matches played.

For the purposes of their analysis the authors of Soccernomics use Russell Gerrard’s Archive of International Football Results.  The database is very comprehensive, but appears to not have been updated since 2001 and thus restricted Soccernomics’ analysis to no later than that calendar year.  Missing the last eleven years worth of data when evaluating the potential outcome of a tournament held today seems a bit too big of a gap to leave unaddressed.  Another resource might be the Rec.Sport.Soccer Statistics Foundation (RSSSF), but this user-maintained database can be very disjointed in its coverage.  A number of countries only have data through 2008 without spending a good bit of time piecing together multiple, different data sets.  This would be too much work required in the short amount of time prior to the start of the tournament.  Thus, there appeared to be no readily available resource for data related to the number of matches played by each national team.

That’s when I reached out to the RSSSF’s host, the Infostrada Sports Group.  I have known one of their managers for about a year now, and he was able to put me in contact with one of their employees responsible for maintaining the RSSSF data.  I put my request in for the data, and 24 hours later I had every match count I needed including the different permutations for national teams that played under multiple nationalities (Croatia as Yugoslavia, Germany under both East and West Germany, etc.).  I can’t thank Infostrada enough, and I would highly recommend contacting them if you’re having trouble identifying sources for the soccer statistics you need for your own analyses.

There’s one more adjustment to be made prior to making Soccernomics-based predictions about Euro 2012 group play.  The equation listed above applies to all of Europe, but not all teams perform to the expected goal differential.  A number of teams over achieve and another large number under achieve.  Figure 17.6 (page 370) in the second edition of Soccernomics outlines the historical under- and over-performance per match for each European team.  Assuming that each team participating in the European Championship performs per their historical norm (a potentially dangerous assumption given only three matches in group play), the overall predicted goal differential for each team needs to be added to or subtracted from by three times their historical over/under performance versus the model.  Summing the predicted performance for each team over the three matches and adding three times that team’s average over/under performance versus the model yields the table of results shown below.

It might be tempting to draw too many conclusions from the table above, ones that might be erroneous regarding who would qualify for the knockout rounds.  This is especially true based upon the results of the model that were realized during the 2010 World Cup.  On an individual match basis the model itself can only predict the outcome of “loss” versus “not loss”, which makes predicting final table position by group a bit problematic  as only 9 points are available to each team.  So the overall predicted goal differential may be reasonably useful, but one should not draw too many conclusions from that analysis alone.

One way to counteract this limitation of the Soccernomics goal differential model is to use other models to build a wider picture of the predicted outcomes and draw conclusions based upon the “wisdom of the crowd”.  Two publicly available models are Infostrada’s Euro Club Index-based (ECI) model (Group A, B, C, and D) and ESPN’s Soccer Power Index-based (SPI) model (Group A, B, C, and D).  Both models look to recent results from each of the teams and those of their group opponents to determine the likelihood of each team making it out of the group stage of the tournament.  With two spots available, the indices' sums of projections totals 200% for each group, with the magnitude of separation between teams’ percentages indicating the relative likelihood of making it into the knockout stage.

The table below provides the projected finish positions by group from each of the methods – Soccernomics, ECI, and SPI.  The order of table finish in the ECI and SPI columns is based upon the percentages assigned to each team making it to the knockout round within each index.  On the right of each group is the projected order of finish based upon an average projected finish position over the three indexes.

The following conclusions can be drawn from the table above:

  • Perhaps the most competitive group is Group A.  Their is wide variation in the finishing positions given the quality of teams within the group, and Russia's edge in the SPI index is a very small 3%.  Similarly, Russia is tied with Greece at a 54% likelihood of advancing in the ECI.  Where one comes down on this group depends on whether or not Poland realizes a home pitch advantage.  The Soccernomics index clearly gives such an advantage to Poland, and the SPI gives a similar benefit.  If either of the two host nations is to realize such a benefit, it will likely be Poland given the quality of their team relative to Ukraine's.  Anything can happen in this wide open group, but I'm sticking with the numbers and saying it will be Russia and Poland who make it out of the group.
  • Group B has the starkest contrast in teams.  It's a toss up as to whether Germany or the Netherlands finishes in the top spot, but both the SPI and ECI have them dominating this group with at least 135 of the available 200 percentage points split between them.  The ECI has a more even split between Portugal and Denmark (35% to 31%), while the SPI has a wider split with Portugal at 49% and Denmark at 15% (the lowest chances of any club).  Germany and the Netherlands may be the surest bets for qualifying out of the group, but an upset by the Danes or Portugal can certainly make things fun and is not unlikely given the compressed format of the tournament.  Still, I am going with the Germans and the Dutch to make it out of the group.
  • Group C comes down to one sure thing (Spain, ECI = 79%, SPI = 89%) and a scrum for the one remaining spot.  No one seems to be giving Ireland a shot at the second spot (ECI = 28%, SPI = 25%).   The ECI favors Croatia for the final spot, while the SPI favors Italy. I am going to assume the distraction of the Italian match fixing scandal and the pluckiness of the Croatian team sees them through to the next round with Spain, who will be attempting to be the first European national team to win three major tournaments in a row.
  • Finally, with all of the lowered expectations for England at this tournament, all of the models see them as an odds-on-favorite to qualify from the group.  The SPI has them at 71% likelihood for advancing, while the ECI has them at 57%.  The grouping in the ECI is much tighter than the SPI, with the Ukraine at 53%, Sweden at 47%, and France at 44%.  I personally think the ECI's ranking for the Ukraine is a bit high given their historical performance and even factoring in home pitch advantage (see Soccernomics table above).  Thus, it comes down to Sweden and France, who are separated by tenths of a per cent in the SPI index.  I'll go out on a limb and say Sweden gets through, with additional chaos in the French camp from a Euro disaster after a meltdown in World Cup 2010.
So those are what the numbers say about group play.  Luckily the numbers don't determine who wins - it's the 90 minutes of action on the pitch that does.  Enjoy the opening weekend of Euro 2012, no matter where you're viewing it.  Just keep these numbers in mind when evaluating the relative success of each national team when the dust settles on group play in a few weeks.