The future of soccer is data-driven: American Soccer Insights Summit shows what numbers can do
Data can tell us a lot about soccer. I spent two days with some of the people uncovering new insights.
Sitting in an auditorium at Rice University on Saturday afternoon, watching a presenter show off the statistics equations underpinning her analysis of Chinese Super League players, I was transported back to the University of Washington classroom where I realized, in the Waterloo of my undergraduate career, "I'm a journalism major; it was a mistake to sign up for this genetics course and I should really probably just stick to words."
But I was in the Houston auditorium, for the inaugural American Soccer Insights Summit, on a mission to become more immersed in this data-rich world. Austin resident and Athletic writer John Muller, who spoke at the conference, titled his presentation, "Numbers Talk — Now Make Them Tell a Story," and that's a good encapsulation of why I've integrated more data into my Austin FC coverage.
Numbers help us understand the game better. You don't need to know how to code in R or Python to appreciate soccer, but because people are doing that work, soccer clubs can be better informed in signing players, assessing upcoming opponents, and improving their own squads' performances, media can paint a more thorough picture in their coverage, and fans can more completely understand soccer with that information.
Conference attendee Matt Barger, part of the team at American Soccer Analysis, created a series of charts I seized on several months ago that illustrated that Austin FC's 2024 designated players were what I termed "expensively average" based on what his analysis unveiled.
You might be familiar with another conference attendee, Sebastián Bush, who runs the indispensable MLS Analytics account on Bluesky, and created a dashboard allowing for multiple variations of x-y axis plotting to make sense of the 2024 season, including helping to visualize whether goals do indeed change games.
Just one of many visualizations possible with this dashboard
The conference also featured two presenters with Austin FC ties: Hayden Van Brewer, who worked as the club's manager of data and analytics from March 2022 to October 2024 (now with SkillCorner, which provided the datasets driving many of the weekend's presentations), and Zayne M. Thomajan, among the first Austin FC employees, who rose to the role of senior manager of business and sporting operations during all three sporting director eras (and in the process helping current sporting director Rodolfo Borrell learn the ways of MLS roster rules) before moving on to Gotham FC and her current post at Chicago Fire FC.
How Verde's rivals use data
But it also featured some rivals who are getting an edge as a result of how they're using data.
One of the first presenters, Mazatlán FC director of analysis Alejandro Dávila, largely focused on data analysis to improve the coaches' training programs for a squad that beat Verde in the 2023 Leagues Cup and made a run to the quarterfinals of the 2024 Leagues Cup.
His club invests in data and in his work to gain a competitive advantage in a league where the biggest clubs can address their issues by acquiring more expensive players – or, as he put it, "Monterrey can just buy Sergio Ramos." Indeed, when we got together Friday night at Pitch 25, we watched a León team that had recently acquired James beat a valiant Mazatlán team 2-1 (with Verde alumnus Emiliano Rigoni scoring the opening goal for León).
One of the keynote presentations, from Houston Dynamo FC president Pat Onstad and data scientist Ethan Creager, walked the audience through their data-driven process for assessing potential new Dynamo players.
It also started with Onstad dropping news of a trade with the Philadelphia Union (which turned out to be one of the league's first cash-for-player deals, sending midfielder Jack McGlynn Houston's way), and included an interesting nugget in which, if not for his salary, Onstad might have made an offer to turn Sebastián Driussi from Verde to Forever Orange. (More precisely, despite the team's relaunched motto, Orange for a Time.)
A whole new stat
As much as I appreciated the news, I was there to better understand how numbers could help augment words in soccer coverage, and in the process, saw the collaborative spirit of those doing this work.
For example, Temple University Ph.D. candidate Rob Oakley created a whole new category of measurement to help assess defensive work beyond tackles and interceptions. He dubs it expected possession value prevention, or EPVp, which gauges when a defender stands up an attacker, forcing that attacker to abandon a move that would advance an attack (via a backpass or some other retreating action). In part because of their roles in engaging with wide attacks, fullbacks and central midfielders in the NWSL dataset registered higher EPVp totals on average than other positions.
(If you're familiar with the successful take-on stat, measuring how many times an attacker stands up and then gets past a defender, the EPVp stat aims to calculate the work resulting in an unsuccessful take-on separate from an interception. Players can also register negative EPVp for an action, as when an attacker facing that defender does make a successful take-on or registers a shot-creating action.)
Audience members, mostly data researchers, forwarded suggestions to expand the work. I offered an additional dimension to further explore this: Seeing if there's a corollary between a team's collective EPVp and expected goals against.
'Secrecy is the enemy of progress'
Stefan Szymanski, the University of Michigan professor best known for his pioneering Soccernomics book, summed up the collaborative spirit driving this work in his keynote address by noting, "Secrecy is the enemy of progress," encouraging the community of researchers to be more open about sharing models.
As more and more data becomes available – and technology is greatly enabling this growth – we can become more observant of the game and more conversant about why the teams and players we love are excelling or struggling.
But it takes work to do that. It takes the work of people fascinated enough by data to gather it, create a research question around it, assemble a model that helps answer it, and then translate it into texts and visuals to communicate it.
And the people doing this work are driven by a love of soccer. Conference organizers encouraged attendees to wear a favorite jersey on Friday, resulting in nearly 100 percent participation and an incredible range of team affiliations on display, including a remarkable Ballard FC kit and the debut Portland Hearts of Pine kit I'm currently coveting.
In a recent American Soccer Analysis podcast, Arman Kafai (who did data analysis for FC Dallas and has helped us explore why "xG means nothing anymore" after Verde's season-extending win against Portland in October), emphasized the importance of effectively communicating data to those who can use it – in his case, then-FC Dallas head coach Nico Estévez, who was receptive to data's role in informing his decisions.
I came away from the conference more determined than ever to consume data, to think about research that might be done, and to help share the work that this community is doing. I may never progress to scraping data and learning R, but I'm committed to further incorporating numbers into the quest for narrative that drives my coverage.
Verde All Day is a reader-supported online publication covering Austin FC. Additional support is provided by Austin Telco Federal Credit Union. You can comment here if you’re a subscriber, or reach out via Bluesky.
Sitting in an auditorium at Rice University on Saturday afternoon, watching a presenter show off the statistics equations underpinning her analysis of Chinese Super League players, I was transported back to the University of Washington classroom where I realized, in the Waterloo of my undergraduate career, "I'm a journalism major; it was a mistake to sign up for this genetics course and I should really probably just stick to words."
But I was in the Houston auditorium, for the inaugural American Soccer Insights Summit, on a mission to become more immersed in this data-rich world. Austin resident and Athletic writer John Muller, who spoke at the conference, titled his presentation, "Numbers Talk — Now Make Them Tell a Story," and that's a good encapsulation of why I've integrated more data into my Austin FC coverage.
Numbers help us understand the game better. You don't need to know how to code in R or Python to appreciate soccer, but because people are doing that work, soccer clubs can be better informed in signing players, assessing upcoming opponents, and improving their own squads' performances, media can paint a more thorough picture in their coverage, and fans can more completely understand soccer with that information.
Conference attendee Matt Barger, part of the team at American Soccer Analysis, created a series of charts I seized on several months ago that illustrated that Austin FC's 2024 designated players were what I termed "expensively average" based on what his analysis unveiled.
You might be familiar with another conference attendee, Sebastián Bush, who runs the indispensable MLS Analytics account on Bluesky, and created a dashboard allowing for multiple variations of x-y axis plotting to make sense of the 2024 season, including helping to visualize whether goals do indeed change games.
The conference also featured two presenters with Austin FC ties: Hayden Van Brewer, who worked as the club's manager of data and analytics from March 2022 to October 2024 (now with SkillCorner, which provided the datasets driving many of the weekend's presentations), and Zayne M. Thomajan, among the first Austin FC employees, who rose to the role of senior manager of business and sporting operations during all three sporting director eras (and in the process helping current sporting director Rodolfo Borrell learn the ways of MLS roster rules) before moving on to Gotham FC and her current post at Chicago Fire FC.
How Verde's rivals use data
But it also featured some rivals who are getting an edge as a result of how they're using data.
One of the first presenters, Mazatlán FC director of analysis Alejandro Dávila, largely focused on data analysis to improve the coaches' training programs for a squad that beat Verde in the 2023 Leagues Cup and made a run to the quarterfinals of the 2024 Leagues Cup.
His club invests in data and in his work to gain a competitive advantage in a league where the biggest clubs can address their issues by acquiring more expensive players – or, as he put it, "Monterrey can just buy Sergio Ramos." Indeed, when we got together Friday night at Pitch 25, we watched a León team that had recently acquired James beat a valiant Mazatlán team 2-1 (with Verde alumnus Emiliano Rigoni scoring the opening goal for León).
One of the keynote presentations, from Houston Dynamo FC president Pat Onstad and data scientist Ethan Creager, walked the audience through their data-driven process for assessing potential new Dynamo players.
It also started with Onstad dropping news of a trade with the Philadelphia Union (which turned out to be one of the league's first cash-for-player deals, sending midfielder Jack McGlynn Houston's way), and included an interesting nugget in which, if not for his salary, Onstad might have made an offer to turn Sebastián Driussi from Verde to Forever Orange. (More precisely, despite the team's relaunched motto, Orange for a Time.)
A whole new stat
As much as I appreciated the news, I was there to better understand how numbers could help augment words in soccer coverage, and in the process, saw the collaborative spirit of those doing this work.
For example, Temple University Ph.D. candidate Rob Oakley created a whole new category of measurement to help assess defensive work beyond tackles and interceptions. He dubs it expected possession value prevention, or EPVp, which gauges when a defender stands up an attacker, forcing that attacker to abandon a move that would advance an attack (via a backpass or some other retreating action). In part because of their roles in engaging with wide attacks, fullbacks and central midfielders in the NWSL dataset registered higher EPVp totals on average than other positions.
(If you're familiar with the successful take-on stat, measuring how many times an attacker stands up and then gets past a defender, the EPVp stat aims to calculate the work resulting in an unsuccessful take-on separate from an interception. Players can also register negative EPVp for an action, as when an attacker facing that defender does make a successful take-on or registers a shot-creating action.)
Audience members, mostly data researchers, forwarded suggestions to expand the work. I offered an additional dimension to further explore this: Seeing if there's a corollary between a team's collective EPVp and expected goals against.
'Secrecy is the enemy of progress'
Stefan Szymanski, the University of Michigan professor best known for his pioneering Soccernomics book, summed up the collaborative spirit driving this work in his keynote address by noting, "Secrecy is the enemy of progress," encouraging the community of researchers to be more open about sharing models.
As more and more data becomes available – and technology is greatly enabling this growth – we can become more observant of the game and more conversant about why the teams and players we love are excelling or struggling.
But it takes work to do that. It takes the work of people fascinated enough by data to gather it, create a research question around it, assemble a model that helps answer it, and then translate it into texts and visuals to communicate it.
And the people doing this work are driven by a love of soccer. Conference organizers encouraged attendees to wear a favorite jersey on Friday, resulting in nearly 100 percent participation and an incredible range of team affiliations on display, including a remarkable Ballard FC kit and the debut Portland Hearts of Pine kit I'm currently coveting.
In a recent American Soccer Analysis podcast, Arman Kafai (who did data analysis for FC Dallas and has helped us explore why "xG means nothing anymore" after Verde's season-extending win against Portland in October), emphasized the importance of effectively communicating data to those who can use it – in his case, then-FC Dallas head coach Nico Estévez, who was receptive to data's role in informing his decisions.
I came away from the conference more determined than ever to consume data, to think about research that might be done, and to help share the work that this community is doing. I may never progress to scraping data and learning R, but I'm committed to further incorporating numbers into the quest for narrative that drives my coverage.
Verde All Day is a reader-supported online publication covering Austin FC. Additional support is provided by Austin Telco Federal Credit Union. You can comment here if you’re a subscriber, or reach out via Bluesky.
Read Next
Josh Wolff joins Houston Dynamo as assistant coach
Yes, Austin, you will see Josh Wolff next season...on another Texas team's bench.
'Preseason is a liar': Highlights from the latest Austin FC preseason media availability
Verde head coach Nico Estevez shared his thoughts ahead of a match against Louisville City on Saturday.
Breakfast with Nico: An early morning interview with Austin FC's new head coach
We got the chance to talk to Nico Estévez between preseason trips to Florida and California.
We ponder an intriguing question: Would Eduard Atuesta in Verde make sense?
The musings of an MLS expert has us wondering if there's really one more move for a midfielder in the works.