Note

This note was transcribed by Claude.

Overview

Lecture 4 (03.03.2026) focused on inferential statistics — how to determine whether observed differences between groups are statistically significant. The lecturer described this as the most difficult session in the course. The lecture built directly on Lecture 3, which covered descriptive statistics and the Shapiro-Wilk normality test.


Recap from Lecture 3

  • Students learned descriptive statistics and applied the Shapiro-Wilk normality test in Jamovi.
  • Key threshold: p = 0.05
    • Shapiro-Wilk p-value > 0.05 = data is normally distributed (parametric)
    • Shapiro-Wilk p-value < 0.05 = data is non-normally distributed (non-parametric)

Core Concept: Why Inferential Statistics?

Running example: comparing Benfica’s expected goals (xG) against opposition xG across a season.

  • Descriptive statistics can show one group has a higher average than another.
  • However, a higher mean alone does not prove the difference is meaningful.
  • Inferential statistics tests whether the difference is statistically significant — i.e., unlikely to have occurred by chance.
  • Significance threshold: p < 0.05.

Three Types of Inferential Analysis

  1. Differences (comparing means) — e.g., comparing Benfica xG vs. opposition xG. Main focus of this lecture.
  2. Correlations — e.g., testing whether more passes leads to higher xG. Values close to 1 = strong positive, 0 = no relationship, negative = inverse.
  3. Categories (frequencies) — e.g., counting whether a team performs more out-swing or in-swing corners.

The Statistical Decision Tree

The central framework of the lecture. Three questions guide which test to use:

Question 1: Same subject or different subjects?

  • Same subject (dependent/paired): Comparing the same entity under different conditions.
    • Benfica at home vs. Benfica away
    • A player’s performance in league vs. cup
    • Benfica in Champions League vs. domestic league
  • Different subjects (independent): Comparing different entities.
    • Benfica vs. Porto
    • Benfica xG vs. opposition xG
    • Messi vs. Ronaldo vs. Neymar

Question 2: How many groups?

  • Two groups = t-test variant
  • Three or more groups = ANOVA variant

Question 3: Parametric or non-parametric?

Determined by Shapiro-Wilk test. Important rule: if comparing two groups and one is normal but the other is non-normal, treat the overall comparison as non-parametric.


Complete Decision Tree

Two Groups

Subject typeDistributionTest
Different subjects (independent)ParametricIndependent samples t-test (Student’s)
Different subjects (independent)Non-parametricMann-Whitney U test
Same subject (dependent/paired)ParametricPaired samples t-test
Same subject (dependent/paired)Non-parametricWilcoxon signed-rank test

Three or More Groups

Subject typeDistributionTest
Different subjects (independent)ParametricOne-way ANOVA
Different subjects (independent)Non-parametricKruskal-Wallis H test
Same subject (dependent/paired)ParametricRepeated measures ANOVA
Same subject (dependent/paired)Non-parametricFriedman test

Examples Used to Illustrate

ScenarioSubject typeGroupsExample test
Benfica xG vs. opposition xGDifferent2Mann-Whitney U (one normal, one non-normal)
Benfica home vs. awaySame2Paired t-test or Wilcoxon
Benfica win vs. draw vs. lossSame3Repeated measures ANOVA or Friedman
Benfica vs. Porto vs. SportingDifferent3One-way ANOVA or Kruskal-Wallis H
Benfica home vs. Porto awayDifferent2Independent t-test or Mann-Whitney U

Data Organization in Excel

For same-subject (paired) comparisons

  • Data organized side by side in columns (e.g., Column A = home values, Column B = away values)
  • Each row = a matched pair (1st home game with 1st away game, etc.)
  • Number of observations in each column must be equal

For different-subject (independent) comparisons

  • Data in a single column with all values stacked
  • A grouping variable column identifies which group each value belongs to (e.g., “Benfica” or “Opposition”)
  • In Jamovi, use the “Split by” function for descriptive statistics

Practical data preparation steps

  1. Start with full dataset in Excel
  2. Filter to relevant competition (e.g., league only)
  3. Filter to relevant team
  4. Add column for condition variable (e.g., “Home” / “Away”)
  5. Keep only the dependent variable column(s) needed
  6. Reorganize into appropriate format
  7. Remove all filters before importing to Jamovi
  8. Delete extraneous columns

Practical Exercise

15-minute timed exercise:

  • Task: Compare FC Porto’s ball possession at home vs. away (2023-24 Liga Portugal, 10 games each)
  • Dependent variable: Possession percentage
  • Independent variable: Match venue (home/away)
  • Analysis type: Same subject, two groups (paired)
  • Steps:
    1. Organize Excel data (filter Porto league matches, label home/away, extract possession)
    2. Import into Jamovi
    3. Run Shapiro-Wilk normality test on both conditions
    4. Select appropriate test based on normality results
    5. Determine if there is a statistically significant difference

Key Terminology

TermDefinition
Dependent variableThe variable being measured (e.g., possession, xG)
Independent variableThe condition or grouping factor (e.g., home/away)
Parametric dataFollows normal distribution (Shapiro-Wilk p > 0.05)
Non-parametric dataDoes not follow normal distribution (Shapiro-Wilk p < 0.05)
Statistical significancep < 0.05 on inferential test
Paired/DependentComparing the same entity across conditions
IndependentComparing different entities

Software and Tools

ToolPurpose
JamoviPrimary statistical software for all analyses
Microsoft Excel / Office 365Data preparation before importing to Jamovi
MoodleLecture recordings and materials

What Comes Next

  • Next lecture will cover effect size — measuring the magnitude of a difference, beyond just whether it is significant.
  • Five analysis aims to complete across practical exercises.