CCSG By The Numbers: An Introduction to Statistics (2024 Remastered)

CCSG By The Numbers: An Introduction to Statistics (2024 Remastered)

AUTHOR'S NOTE: Due to a website migration, and further updates in my methodology, it was necessary to rewrite the article I penned in the winter of 2023/24. This is essentially the same article as that lost piece, with a few updated segments and a few new paragraphs. The player as an exemplar used also changed, because it was necessary to have an updated card, and it isn't an ATO player for risk of spoiling upcoming articles.

How can we, as fans, tell if a player is good or not? The answer that comes to mind is simply by watching the game – known as the eye test. By watching the game holistically, individual players, and the interactions between the two, you should be able to tell the good from the bad. As much as the following article might seem like it's trying to convince you otherwise, the eye test is still necessary to give context to the statistics, and one should not be used without the other. 

But what if you wanted to know exactly how good a player is, where they excel, where they might have flaws, and whether they can be measured against other players, and, as a result, how do they stack up against their peers? This is where statistics come in. For the last couple of years, I have been working to quantify these strengths and weaknesses, and the goal of this series is both to help you familiarize yourself with statistical methods, and to introduce you to my specific model of statistical analysis, in order to discover how players rank amongst their peers.

STATISTICS:

The first step is data collection. Take, for example, a hypothetical Player A and Player B (hereby referred to as A and B), and say that they both scored seven goals during the CPL season. Now, broadly speaking, they are both equally valuable players. The main goal of football is to score goals, and they both contributed the same towards that end. However, they might not be as equally efficient. The next step would be to look at how many games each played, which adds further context. But what if they both played the same amount of games?

If both players played a hypothetical 90 minutes (or one full game) without being substituted, who could score the most goals? This is called per90 statistics and helps normalize every metric. By dividing a player's goals by their minutes, and then creating a baseline of a full game by multiplying that number by 90, we can achieve a Goals p90 statistic (Goals/Minutes x 90). If A scored their seven goals in 600 minutes, and B scored their seven in 1000, we see that A achieved a Goals p90 of 1.05 – meaning that in a full game, they would score around one goal – while B achieved 0.63 meaning they would take around a game and a half to score their goal. By using these methods, we can see that A is much more efficient at scoring goals on a 90-minute basis, regardless of how many games they actually played. We could, therefore, say they had more goal-scoring impact. 

A final comparative metric used in sports is Percentile Rankings. In the simplest terms, in a sample of data, how good was the result, really? The percentile formula will spit out a number, which essentially shows what portion of the sample the data point was higher than. Take, for example, A’s 1.05 Goals per 90 from earlier. If I were to run that calculation for every player in the CPL, how many players would they outperform in terms of scoring goals? In 2024, the answer for that would be literally every single player because scoring at a goal-per-game pace is unheard of. The formula would thus return a percentile ranking of 100% because they were better at scoring goals than 100% of the players in the CPL. For context, a ranking of 50% would mean that they are exactly average. 

INITIAL OUTPUT:

To begin my analysis, I took 18 different statistics, including Goals, Assists, Passes, Tackles, etc, from every CPL player, converted them into p90 numbers, and then ranked them against each other using the percentile formula. Next, I added a colour gradient to help with comprehension (Blue = Above Average, Grey = Average, and Orange = Below Average) and included the raw number below the corresponding box. My spreadsheet then generates something like this: 

Above is former Ottawa Fury player Jeremy Gagnon-Lapare’s statistical profile from the 2024 season, which can be used to assess his strengths and weaknesses. I know there are a lot of numbers and colours there, but you can start by focusing on the darker blue boxes, which hold his better metrics. He was really good at completing Long Balls (top middle), winning Aerial Duels (top right), crossing (bottom left), and completing passes (bottom middle). For instance, his 92.3 Long Ball percentile rank means he was better than 92.3% of the CPL players in that metric. These strengths make sense for a good midfielder.

His worst categories would be Blocks, Clearances, and Goals, which also make sense for a midfielder. He wouldn't have been in a position to block many shots or clear many chances anyway, and his non-existent scoring also tracks. This would be an example of using the eye test and knowledge of the game to contextualize his statistics because, as I mentioned in the introduction, they should both be used equally when evaluating players. All-in-all, you could tell simply by watching him that he was a good player in 2024, and his statistical profile clearly backs this up. 

MY MODEL:

I mentioned above that his strengths and weaknesses make sense for his position, and therein lies the final portion of this article, which will be the rating system that I came up with. How can we judge players if they play two different positions and thus have statistical profiles that vary wildly from one another? For example, a Striker and Centre-Back, the most diametrically opposed positions on the field? The simplest answer would be to simply average out their percentile ranks, right? 

However, different positions have different jobs on the field, and therefore, their actions on the field and resultant statistics will have different quantities and qualities. For example, a Striker’s job is to score goals and not block shots, and a Centre-Back’s job is the opposite (although scoring goals would naturally be appreciated). In theory, it's possible that a Striker will record zero blocked shots, simply because they are almost never behind the ball that deep in their own half. Their percentile ranking for blocked shots would, therefore, be 0%, but they shouldn’t be penalized for that because that isn’t their role. Conversely, if a Striker is not scoring any goals but blocking a lot of shots, their rating should not be the same as a colleague doing the opposite, and actually contributing what is expected of them as a Striker. 

Long story short, I tried to balance each statistic based on how important they were to fulfilling a player's position on the field. For example, a Striker’s Goals and Assists (G/A) percentile rankings matter a lot more than their blocked shots because that is what they are there to do. In weighing all the stats such that the total weight is equal to 100, G/A accounts for more than half of the final output (for a Striker). Of course, the weight changes for each statistic based on the position. Each positional weight takes into account the 18 statistics I collected and ranked and produces a Weighted Percentile Average (WPA). It sounds fancy, but don’t worry about it. When looking at my work, all you need to know at a glance is that 60 is typically the average in any given season, anything 65-75 is good to great, and 80+ is amazing. The highest grades in a season typically fall in the high 80s. JGL's simple WPA grade for 2024 was 62.2, which was above average. Note that this does NOT mean he was better than 62.2% of CPL players in 2022. Even though it uses percentile rankings, it acts more like a rating system akin to one you would see on football apps or websites. 

One thing of note here is that the relative weights of each metric are entirely arbitrary. My experience playing and watching the game has led me to value certain characteristics in a player and a position; thus, I have weighted the stats accordingly (also, some adjustments were necessary to make the grades come out relatively equal – more on that later). Someone else might have slightly different priorities and thus might have different weights – every advanced metric model will have its own bias according to its author. While the relative weights are mostly personal, there were some compromises to general consensus, so while my model might never be fully transparent, I hope it does reflect a certain universality.

The next adjustment takes into account a player's minutes played and their WPA grade. The player’s grade can get boosted under two conditions, the first being if they played better than average for more minutes than average, because it is harder to play at a high level over a larger sample size. I consider this a reward mechanism because of said difficulty. They can also get boosted if they played fewer minutes and graded lower than average, because they perhaps didn’t have enough opportunities to make an impact, but I would consider this a reduction in punishment rather than a reward.

Secondly, the player’s grade can get lowered under the opposite conditions. If they played worse than average but played more minutes, they get lowered because they had the opportunities and couldn’t make the most of it. This is entirely a way to punish those who kept getting selected but couldn’t perform. Finally, they also get lowered if they played better than average but over a smaller sample size, to ensure they are not graded the same as someone who played more minutes but at the same high level. By taking minutes and WPA grade into consideration, I came up with a formula that returns a scalable factor, which then gets added or subtracted from the original grade. After running JGL’s rating through this formula that acknowledges that he played better than average for more minutes than average, it gets boosted slightly.

A final adjustment is made to make every positional group comparable. Due to inherent imbalances in the metrics used, the weights themselves not always being perfectly equal, and simply the nature of certain positions (for reasons too complex to lay down briefly, Full Backs get extremely shortchanged using an unadjusted grade), not all simple WPA grades are created equal. Some come out boosted across the board, some have ranges too small to leave their highest achievers among the top players, and so on. Thus, I have to manipulate the positional group's grades to make each one's median and range approximately equal. Doing this to JGL's 2024 mark gives him a final Adjusted WPA of 66.2. All this information and more subsequently gets presented in something that looks like this:

You can see his personal information as well as his position, games, and minutes in the central boxes, his Adjusted WPA grade (66.2) on the left, where that ranks for all midfielders (5th of 26), as well as where that ranks league-wide (72nd out of 169 outfield players). The Team Factor box can be ignored; however, it means that in order to properly reflect his team's success relative to their players' average grades, 2.57 can be subtracted from his grade (if it were positive, you would add it). Essentially, Halifax's average grades were higher than expected relative to the other CPL teams (best in 2024) given their final league position (6th), and thus they can be decreased if you value the results of the collective over the results of the individual. This subjectivity is why I kept it out of the adjusted WPA grade above it, but those scores are tracked season to season and kept in the same database for curiosity's sake.

The boxes on the right are the same numbers as the statistical profile from before, just aggregated and weighted slightly. The same colour gradients from earlier apply, such that blue is above average and orange is below average. For example, "Ball Distribution" takes into account Pass Attempts, Pass Accuracy, Long Ball accuracy and Cross Accuracy (weighted more towards Pass%), and then percentile ranks that again with the league. Using this number, you could say that JGL was better than 84.5% of the league at Passing - one of his consistent strengths. The radar chart in the bottom middle takes the seven categories I deemed the most important and universal (G/A Goals+Assists, Chance Creation – Key Passes, Passing Ball Distribution, Touches, Strength Duel/Aerial%, Tackle%/Interceptions, and Defending Blocks/Clearances, and arranges them clockwise. You can tell at a glance the general strengths and weaknesses of the player by looking at the area and shape of the radar chart. 

All of this info has been collated together in one, hopefully simple, card in order to quickly assess a player's season. I use them on my various social media accounts (@CPLNumbers on Twitter/X, BlueSky, and Instagram), and for various CCSG articles detailing a team's season, potential/incoming transfers, etc. Feel free to reach out to me at any of these!

About Alexander:

When he isn't busy playing or watching sports (or going to school at uOttawa), Alexander is busy managing his Atlético Ottawa database, which he started in 2020, and tracks everything you can think of about the club and its players. He also runs a Twitter account dedicated to analyzing and rating CPL players using statistics, CPL by the Numbers.