This report explains, in simple terms, the steps we followed to build a chart that compares each Premier League team's actual goals scored (GF) to the goals they were expected to score (xG) in the 2023-2024 season. The chart shows which teams scored more than expected (overperformed) and which scored less (underperformed).
We will not describe the data in detail as everything is clearly documented in the link below (see source of our data).
Premier League Data Library [https://pypi.org/project/premier-league/#matchstatistics]
MatchStatistics is a class for retrieving and analyzing detailed match-level statistics in the form of ML datasets from Premier League games and other top European leagues. It provides access to extensive game data including team performance metrics, player statistics, and match events for ML training or analysis.
Data Structure: The data is stored in a SQLite database [premier_league.db]
Core Tables: League, Team, Game, GameStats
RankingTable fetches team ranking data for a given season and league.
League: English Premier League
Season: 2023-2024
Units compared:
o Due to the large amount of data across multiple tables, it was more efficient to use PostgreSQL and join only the information required, so we initialized the database locally as per the documentation instructions.
o The necessary data were later added to a DataFrame for sanity checks and further manipulation.
o We pulled match-level data from the database for every game in the 2023-2024 Premier League season.
o For each team we added up:
Due to the smaller amount of data, we directly loaded the information into a Pandas DataFrame.
We merged the two DataFrames above into a single one which we would later use for data checks and visualisation.
Our data set is complete and there is no data missing.

Layout:

Expected goals (xG) gives a sense of how many good scoring opportunities a team created. Comparing GF to xG helps separate:
This helps coaches, analysts and fans understand whether a team's scoring record was sustainable or might regress toward the xG level in the future.
Scatterplot of each team's final league position (x) against their GF − xG (y); blue = overperformed, red = underperformed. Positive y means scored more than expected.

Scatterplot shows a weak negative/positive relationship, with notable outliers such as Liverpool (3rd / diff=-4.3) and Luton (18th / diff=+4.9), suggesting that goal over/underperformance only partially explains final league position.
The analysis of GF vs xG differences gives useful insight, but it's not the whole story. Goals scored relative to expected goals helps explain finishing quality or luck, while final league position is driven by points (wins/draws/losses) which also depend on defence, consistency and game management.
It does however explain an important part of how teams ended up where they did: big positive gaps boosted some teams' attacking output and helped top sides, while big negative gaps (notably Everton) aligned with poor outcomes. But final league position is multifactorial — to make stronger causal claims we should continue our analysis and combine GF/xG with defensive stats and points-based analysis.