Note
This note was transcribed by Claude.
Overview
Lecture 4 (03.03.2026) focused on inferential statistics — how to determine whether observed differences between groups are statistically significant. The lecturer described this as the most difficult session in the course. The lecture built directly on Lecture 3, which covered descriptive statistics and the Shapiro-Wilk normality test.
Recap from Lecture 3
- Students learned descriptive statistics and applied the Shapiro-Wilk normality test in Jamovi.
- Key threshold: p = 0.05
- Shapiro-Wilk p-value > 0.05 = data is normally distributed (parametric)
- Shapiro-Wilk p-value < 0.05 = data is non-normally distributed (non-parametric)
Core Concept: Why Inferential Statistics?
Running example: comparing Benfica’s expected goals (xG) against opposition xG across a season.
- Descriptive statistics can show one group has a higher average than another.
- However, a higher mean alone does not prove the difference is meaningful.
- Inferential statistics tests whether the difference is statistically significant — i.e., unlikely to have occurred by chance.
- Significance threshold: p < 0.05.
Three Types of Inferential Analysis
- Differences (comparing means) — e.g., comparing Benfica xG vs. opposition xG. Main focus of this lecture.
- Correlations — e.g., testing whether more passes leads to higher xG. Values close to 1 = strong positive, 0 = no relationship, negative = inverse.
- Categories (frequencies) — e.g., counting whether a team performs more out-swing or in-swing corners.
The Statistical Decision Tree
The central framework of the lecture. Three questions guide which test to use:
Question 1: Same subject or different subjects?
- Same subject (dependent/paired): Comparing the same entity under different conditions.
- Benfica at home vs. Benfica away
- A player’s performance in league vs. cup
- Benfica in Champions League vs. domestic league
- Different subjects (independent): Comparing different entities.
- Benfica vs. Porto
- Benfica xG vs. opposition xG
- Messi vs. Ronaldo vs. Neymar
Question 2: How many groups?
- Two groups = t-test variant
- Three or more groups = ANOVA variant
Question 3: Parametric or non-parametric?
Determined by Shapiro-Wilk test. Important rule: if comparing two groups and one is normal but the other is non-normal, treat the overall comparison as non-parametric.
Complete Decision Tree
Two Groups
| Subject type | Distribution | Test |
|---|---|---|
| Different subjects (independent) | Parametric | Independent samples t-test (Student’s) |
| Different subjects (independent) | Non-parametric | Mann-Whitney U test |
| Same subject (dependent/paired) | Parametric | Paired samples t-test |
| Same subject (dependent/paired) | Non-parametric | Wilcoxon signed-rank test |
Three or More Groups
| Subject type | Distribution | Test |
|---|---|---|
| Different subjects (independent) | Parametric | One-way ANOVA |
| Different subjects (independent) | Non-parametric | Kruskal-Wallis H test |
| Same subject (dependent/paired) | Parametric | Repeated measures ANOVA |
| Same subject (dependent/paired) | Non-parametric | Friedman test |
Examples Used to Illustrate
| Scenario | Subject type | Groups | Example test |
|---|---|---|---|
| Benfica xG vs. opposition xG | Different | 2 | Mann-Whitney U (one normal, one non-normal) |
| Benfica home vs. away | Same | 2 | Paired t-test or Wilcoxon |
| Benfica win vs. draw vs. loss | Same | 3 | Repeated measures ANOVA or Friedman |
| Benfica vs. Porto vs. Sporting | Different | 3 | One-way ANOVA or Kruskal-Wallis H |
| Benfica home vs. Porto away | Different | 2 | Independent t-test or Mann-Whitney U |
Data Organization in Excel
For same-subject (paired) comparisons
- Data organized side by side in columns (e.g., Column A = home values, Column B = away values)
- Each row = a matched pair (1st home game with 1st away game, etc.)
- Number of observations in each column must be equal
For different-subject (independent) comparisons
- Data in a single column with all values stacked
- A grouping variable column identifies which group each value belongs to (e.g., “Benfica” or “Opposition”)
- In Jamovi, use the “Split by” function for descriptive statistics
Practical data preparation steps
- Start with full dataset in Excel
- Filter to relevant competition (e.g., league only)
- Filter to relevant team
- Add column for condition variable (e.g., “Home” / “Away”)
- Keep only the dependent variable column(s) needed
- Reorganize into appropriate format
- Remove all filters before importing to Jamovi
- Delete extraneous columns
Practical Exercise
15-minute timed exercise:
- Task: Compare FC Porto’s ball possession at home vs. away (2023-24 Liga Portugal, 10 games each)
- Dependent variable: Possession percentage
- Independent variable: Match venue (home/away)
- Analysis type: Same subject, two groups (paired)
- Steps:
- Organize Excel data (filter Porto league matches, label home/away, extract possession)
- Import into Jamovi
- Run Shapiro-Wilk normality test on both conditions
- Select appropriate test based on normality results
- Determine if there is a statistically significant difference
Key Terminology
| Term | Definition |
|---|---|
| Dependent variable | The variable being measured (e.g., possession, xG) |
| Independent variable | The condition or grouping factor (e.g., home/away) |
| Parametric data | Follows normal distribution (Shapiro-Wilk p > 0.05) |
| Non-parametric data | Does not follow normal distribution (Shapiro-Wilk p < 0.05) |
| Statistical significance | p < 0.05 on inferential test |
| Paired/Dependent | Comparing the same entity across conditions |
| Independent | Comparing different entities |
Software and Tools
| Tool | Purpose |
|---|---|
| Jamovi | Primary statistical software for all analyses |
| Microsoft Excel / Office 365 | Data preparation before importing to Jamovi |
| Moodle | Lecture recordings and materials |
What Comes Next
- Next lecture will cover effect size — measuring the magnitude of a difference, beyond just whether it is significant.
- Five analysis aims to complete across practical exercises.