Chat
(not available)

About World Football Rankings

The Rating System

Process

The InOnIt.com rankings use international match data from friendlies ("A" internationals), minor and major tournaments, and World Cup qualifying and finals. The algorithm attempts to model scoring. It models the number of goals scored in a given match by a team against a particular opponent as a Poisson process.

Given the match data, the ratings perform a best-fit analysis by attempting to assign arbitrary ratings to individual sides and adjusting them to reflect actual results. In essence, the process assumes that the ability to score (or prevent scoring) is a transitive process. If A will score twice as often as B against a given opponent (C), the model assumes that A will score twice as often as B against any opponent D (although the rate of scoring for A and B will change based on the quality of D's defense).

Because of the best-fit analysis, it's possible for a result between A and B to affect C's rating by shifting the relative value of previous results against A and B for every side that has competed with either, and shifting their opponents' and opponents' opponents (and so forth) ratings. This differs from the FIFA ratings or the proposed alternative Elo ratings (based on the chess rating system).

A previous version of the algorithm used goal differential as the main variable and did not attempt to model individual goals for each side. During the development of that algorithm, several variables were tuned to produce a least-squared error for the predictions the algorithm made against actual results.

Essentially, all matches were divided into ten groups. The algorithm used nine groups to generate ratings and used them to predict the results of the match in the tenth group. This analysis was repeated for each group as the target group (and the others as the rating groups). This produced an error function which depended upon a single variable (the variable being optimized; for example, the value of playing on home field, or the amount of regression toward the mean that is the best predictor, or the amount to value newer results over older ones). Using straightforward numerical methods, a tuning program found local minima of this function in order to find the best possible values for various constants. One interesting finding from this analysis is that more "important" games (e.g., World Cup finals and qualifiers) have more predictive value than other results (even for predicting "unimportant" results). This was not shocking (national teams deploy their best players in more important matches), but was not wholly expected. It's a nice verification that the model is measuring something real.

These constants have been adapted for use with the new algorithm. The constants can be re-computed for the new algorithm at some point, but computing them the first time was extremely computationally expensive and there is no apparent reason that their values will change appreciably because of the change in model from goal differential to goals.

FIFA World Rankings

The FIFA World Rankings measure ... well, something. But certainly not the quality of a side. Their algorithm is quite complex, and may serve some overarching purpose of "fairness" or creating incentives for national federations to behave in a particular way. But their composition has the flavor of the outcome of a series of "design meetings" that created arbitrary cutoffs (the double-the-top-seven system, for example) and/or an attempt to get the ratings to match some sort of preconceived notions by iteratively tweaking them with special rules and exceptions (the continental multipliers make little sense) until the inaugural set looked a certain way. Much as any American loves any rankings that (7-May-2006) have the United States fourth in the world, InOnIt.com must bow to reality and argue that the FIFA ratings, while perhaps a masterful bit of lawmaking, are a horrible mess at modeling anything (if in fact they seek to model at all; FIFA's page suggests that they do -- "a reliable measure for comparing national A-teams").

This is not an article about the FIFA world rankings. But developing these rankings provided some insights into the FIFA rankings:

The weights that FIFA uses to emphasize "important" games (e.g., World Cup final) over "unimportant" games (e.g., friendlies) are actually quite reasonable; similar weights were found to have better predictive value than either "heavier" weights (devaluing less important matches further) or no weights at all.
FIFA's model for degrading older results is within the realm of reason as well. It seems to discount results from 1-3 years ago too little and results from 6+ years ago too much, but these are quibbles.
Based on the information provided about FIFA's rating system, it almost certainly underestimates the value of home field (and thus would systematically overestimate the ability of teams that play many matches at home).

Those things said, the FIFA rankings make little sense conceptually:

A bad result for a team can never cause it to count for negative points.
The algorithm for comparing regional strength, though not well-specified, makes little sense in the abstract and generates numbers which are hard to justify.
Counting the best-N-results (in an attempt to exclude outliers on the negative side?) emphasizes outliers on the positive side, and also systematically biases the ratings toward sides that play lots of matches. This would perhaps be a healthy instinct if it were to prevent sides with very small sample sizes from zooming to the top of the ratings with good results -- in other words, if it were a way of introducing some very primitive regression toward the mean by requiring sides to play seven matches. But given that the results encompass eight years, the sides at the top play more matches (and even more high-scoring matches, given that they advance farther in World Cup qualifying and finals), and that the very best sides will eventually have bad results unless they win the World Cup, this seems a misguided attempt, and it's not clear what its original intent was.
The United States is ranked fourth. (OK, so that's not really "conceptual.")

Self-critique

The most important way in which scoring in football may differ from a true Poisson process is that a team that is leading will attempt to change tactics (and possibly personnel) in order to protect its lead, while a trailing team will make corresponding changes. It is possible to argue that these changes in fact cancel one another out -- while one side attempts to attack more (which will tend to make both it and its opponent score more goals), the other side defends more (which will tend to make both it and its opponent score less). But it's not clear whether the effect is symmetrical (there are good theoretical reasons it should be).

One area in which this ought to induce a deviation in which the scores of the two sides are not independent is that FIFA's 3-1-0 scoring system -- in which both teams are penalized for a draw -- ought to induce more attacking to occur when the score is level than they do when the score is not level. Whether this occurs in practice is not clear. In competitions (e.g., friendlies, knockout stages of competitions) where the 3-1-0 scoring system is not relevant, it of course would not apply (and in friendlies, there is no clear payoff system at all).

Whether the differences between sides in they ability to score or prevent scoring form the sort of transitive linear relationship modeled here is not clear.

Other commentary

Feel free to give feedback (positive or negative) on these ratings (and whether you are happy to be identified).

One commenter notes (5-May-2006):

I calculated each teams % diff from the high rating (for France, this is 100*(2503-2419)/2503. The resulting curve is very near to being a perfect log function. It's almost eerie.

Perhaps someone mathematically inclined can help think this through?

World Cup Odds

With scoring in a match being modeled as a Poisson process, with a rate calculable given each side's offensive and defensive ratings, it's straightforward to do a Monte Carlo simulation of the World Cup and then use it to calculate the probabilities of selected outcomes, and thus odds of their occurrence. A few details:

Germany is treated as the home side for all matches, and hence has greater fortunes in the simulations than their rating alone would suggest.
In the knockout round, extra time is treated as a match with the same Poisson distributions but one-third the length (no provision is made for the sides not being level after one extra period; this would be a good enhancement).
Penalty kicks are decided as a coin flip.