25 Years of the Big Mac Index

tidyverse

economics

ggplot2

data visualization

Visualizing global currency valuation through the lens of a hamburger

Published

November 18, 2025

Author note: I’m letting ClaudeAI take the wheel here (maybe because Anthropic has partnered with The Economist and this’ll be cute for them).

AI Collaboration Note

This blog post was created through a collaborative process with Claude (Anthropic). Sections explicitly generated by Claude are marked with 🤖. The code comparison section demonstrates different approaches to the same visualization problem—showing how human and AI problem-solving can complement each other.

The Big Mac Index: Economics Made Delicious

🤖 Section written by Claude

In September 1986, The Economist journalist Pam Woodall introduced the Big Mac Index as a semi-humorous illustration of Purchasing Power Parity (PPP). The theory behind it—PPP—suggests that exchange rates should adjust so that identical goods cost the same in different countries. Since McDonald’s Big Macs are sold in nearly identical form across 50+ countries, they make for a surprisingly useful economic indicator.

Why a Big Mac?

McDonald’s Big Mac was chosen because of the prevalence of the fast food chain worldwide, and because the sandwich remains largely the same across all countries. The Big Mac Index works because:

Standardization: The recipe is consistent globally
Local Production: Big Macs contain locally-sourced ingredients and the price reflects many local economic factors, such as ingredient prices, local wages, and advertising costs
Ubiquity: McDonald’s operates in most major economies
Simplicity: Easy to understand and compare—the index gave rise to the term “burgernomics”

While economists widely cite the Big Mac index as a reasonable real-world measurement of purchasing power parity, it was not intended to be a legitimate tool for exchange rate evaluation, though it is now globally recognized and featured in many academic textbooks and reports. The Economist has published this index regularly since 1986, creating a rich dataset for analysis.

Data Source

🤖 Section written by Claude

The data comes from The Economist’s GitHub repository, where they maintain historical Big Mac prices and calculated currency valuations. The dataset includes:

Raw prices: Local currency and USD-converted prices
Full index: Includes over/under-valuation calculations against the US dollar
Time span: From April 2000 to January 2025 in this analysis

Original Approach: Unedited Code with Commentary

Below is the original code exactly as written, with my commentary on the thought process and decisions.

Code

library(tidyverse)
library(ggtext)
library(extrafont)
library(plotly)

fonts()

🤖 Claude’s Commentary: This setup is clean and minimal. Loading plotly suggests the author may have experimented with interactive versions (though it’s not used in the final plot). The fonts() call displays available fonts—useful during development but could be removed for production.

Code

full <- read_csv("data/big-mac-full-index.csv")
raw <- read_csv("data/big-mac-raw-index.csv")

🤖 Claude’s Commentary: Both datasets are loaded, though only full is used. This is common during exploratory analysis—keeping raw available in case you need to cross-reference or pivot approaches.

Code

delta <- 
  full |> 
  filter(date %in% as.Date(c('2000-04-01', '2025-01-01'))) |> 
  select(date, name, dollar_price) |> 
  pivot_wider(names_from = date,
              values_from = dollar_price,
              names_repair = 'minimal') |> 
  select(name, before = `2000-04-01`, after = `2025-01-01`) |> 
  filter(!is.na(before), !is.na(after)) |> 
  arrange(name) 

percent_change <-
  delta |> 
  mutate(pct_change = round(((after-before)/before), digits = 2),
         pct_label = scales::percent(pct_change)) |> 
  arrange(desc(pct_change))

🤖 Claude’s Commentary: This is a smart approach to calculating change over time. The use of pivot_wider() is particularly elegant—it transforms dates into columns, making it easy to calculate the difference. The names_repair = 'minimal' argument is important because date columns would otherwise be automatically “cleaned” into invalid R names. Creating both numeric (pct_change) and formatted (pct_label) versions shows good data hygiene—you keep the raw numbers for potential calculations while having display-ready labels.

Potential consideration: This approach filters to countries present in both time periods. Countries that started reporting after 2000 or stopped before 2025 are excluded, which is the right call for a “change over 25 years” metric.

Code

destaca <- 
  full |> 
  filter(date %in% as.Date('2025-01-01'), !is.na(dollar_price)) |> 
  arrange(date, desc(dollar_price)) %>% 
  .[c(1:5, 50:54), ]

labels <- 
  destaca |> 
  transmute(label = paste0(name, ", $", round(dollar_price, digits = 2)))

🤖 Claude’s Commentary: This is where things get interesting. The use of .[c(1:5, 50:54), ] is base R subsetting—grabbing rows 1-5 (most expensive) and 50-54 (least expensive). This works but is brittle: if the dataset has fewer than 54 rows, this breaks. Also, the specific row indices assume a certain ordering.

Design insight: The author is making an aesthetic choice here—exactly 10 countries highlighted, split between extremes. This creates visual balance.

The transmute() for labels is efficient—it’s like mutate() but keeps only the new column. The formatted labels are ready-to-use for annotation.

Code

full |> 
  mutate(check = if_else(is.na(dollar_price) & date == as.Date('2025-01-01'), 'na', 'good'),
         hl = if_else(name %in% destaca$name, "yes", "no")) |> 
  filter(check == 'good') |> 
  select(date, name, dollar_price, hl) |> 
  ggplot(aes(x = date, y = dollar_price, group = name)) +
  geom_line(aes(color = hl),
            show.legend = FALSE) +
  labs(title = '25 years of the Big Mac Index from *The Economist*',
       subtitle = 'Most and Least Expensive Countries to by a Big Mac, USD')+
  scale_x_date(date_labels = '%Y',
               limits = as.Date(c('1999-06-01', '2034-01-01')),
               expand = c(0, 0)) +
  scale_y_continuous(breaks = c(0:8)) +
  scale_color_manual(values = c("#FFCC00", "#05007B", "black")) +
  annotate('text',
           x = as.Date('2033-04-09'),
           y = 0.3:8.3, 
           color = "#FFCC00",
           label = paste0("$", 0:8),
           size = 6,
           family = 'Cambria') +
  annotate("text",
           x = as.Date("2025-02-28"),
           y = c(8, 7.35, 7, 6.6, 5.95, 3.4, 3.05, 2.7, 2.35, 2),
           label = labels$label,
           family = 'Cambria',
           size = 4.5,
           color = "#05007B",
           hjust = 0) +
  theme_minimal() +
  theme(
    axis.ticks.y = element_blank(),
    axis.text.y = element_blank(),
    axis.text.x = element_text(color = "#FFCC00", size = 10),
    panel.grid = element_blank(),
    text = element_text(family = "Cambria", face = "bold", color = "#FFCC00"),
    plot.background = element_rect(fill = "#DA291C", color = "#DA291C"),
    panel.background = element_rect(fill = "#DA291C", color = "#DA291C"),
    axis.title = element_blank(),
    plot.title = element_markdown(),
  )

🤖 Claude’s Commentary on this approach:

What works really well:

The filtering logic: The check variable elegantly filters out countries with no 2025 data, ensuring we only plot complete series
Color hierarchy: Three colors (yellow highlight, dark blue for labels, black for background lines) creates clear visual hierarchy
Manual annotation positioning: Those y-coordinates (c(8, 7.35, 7, 6.6, 5.95...)) were clearly hand-tuned to avoid overlaps—this is often the only way to get perfect label placement
Extended x-axis: The limits = as.Date(c('1999-06-01', '2034-01-01')) is clever—it extends well into the future to create space for labels on the right
McDonald’s branding: The color scheme (#DA291C red, #FFCC00 yellow, #05007B dark blue) immediately evokes the brand

Potential improvements:

The y-coordinates for labels are hardcoded: If the data changes or different countries are highlighted, these positions would need manual adjustment
Typo in subtitle: “by a Big Mac” should be “buy a Big Mac”
The x-position for y-axis labels ('2033-04-09') seems arbitrary—unclear why this specific date
No legend: While show.legend = FALSE cleans up the plot, there’s no legend explaining what the colored lines mean (though it’s somewhat obvious from context)

Overall assessment: This is a strong visualization with intentional design choices. The manual positioning shows attention to detail, and the McDonald’s color scheme is a sophisticated branding choice. The code is readable and the pipeline logic is clear. The brittleness around hardcoded positions is the main weakness—but sometimes that’s necessary for publication-quality plots.

Alternative Approach: Claude’s Take

🤖 This entire section written and coded by Claude

Here’s how I would approach the same visualization, with a focus on making the code more robust and self-documenting:

Code

library(tidyverse)
library(ggtext)
library(scales)

# Read data
full <- read_csv("data/big-mac-full-index.csv", show_col_types = FALSE)

Code

# Filter to complete time series (countries with 2025 data)
countries_with_2025 <- full |> 
  filter(date == as.Date('2025-01-01'), !is.na(dollar_price)) |> 
  pull(name)

plot_data <- full |> 
  filter(name %in% countries_with_2025,
         date >= as.Date('2000-01-01')) |> 
  select(date, name, dollar_price)

# Identify extreme countries more robustly
extremes_2025 <- plot_data |> 
  filter(date == max(date)) |> 
  arrange(desc(dollar_price)) |> 
  slice(c(1:5, (n()-4):n())) |>  # Top 5 and bottom 5
  mutate(
    label = paste0(name, ", ", dollar(dollar_price, accuracy = 0.01)),
    # Calculate label positions with spacing
    y_pos = dollar_price,
    x_pos = max(plot_data$date) + days(60)
  )

# Add highlight flag to plot data
plot_data <- plot_data |> 
  mutate(
    highlight = case_when(
      name %in% extremes_2025$name ~ "labeled",
      TRUE ~ "background"
    )
  )

🤖 My reasoning:

Rather than hardcoding row indices, I use slice(c(1:5, (n()-4):n())) which adapts to any dataset size
I calculate label positions relative to the data (max(date) + days(60)) rather than hardcoding dates
I use dollar() from scales for consistent currency formatting
I create the highlight flag more explicitly with case_when()

Code

# Define McDonald's colors
mc_red <- "#DA291C"
mc_yellow <- "#FFCC00"
mc_blue <- "#27251F"  # Using a dark neutral instead of dark blue

ggplot(plot_data, aes(x = date, y = dollar_price, group = name)) +
  # Background lines for all countries
  geom_line(
    data = filter(plot_data, highlight == "background"),
    color = alpha("black", 0.3),
    linewidth = 0.3
  ) +
  # Highlighted lines for labeled countries
  geom_line(
    data = filter(plot_data, highlight == "labeled"),
    color = mc_yellow,
    linewidth = 0.8
  ) +
  # Country labels
  geom_text(
    data = extremes_2025,
    aes(x = x_pos, y = y_pos, label = label),
    hjust = 0,
    size = 4,
    color = mc_yellow,
    fontface = "bold",
    inherit.aes = FALSE
  ) +
  # Y-axis price labels on the right
  scale_y_continuous(
    breaks = 0:8,
    limits = c(0, 8.5),
    sec.axis = sec_axis(
      ~ .,
      breaks = 0:8,
      labels = dollar_format()(0:8)
    )
  ) +
  scale_x_date(
    date_labels = "%Y",
    expand = expansion(mult = c(0.02, 0.25))  # More room on right for labels
  ) +
  labs(
    title = "25 years of the Big Mac Index from *The Economist*",
    subtitle = "Most and least expensive countries to **buy** a Big Mac, USD",
    caption = "Data: The Economist Big Mac Index | Visualization: Claude"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    # McDonald's brand colors
    plot.background = element_rect(fill = mc_red, color = NA),
    panel.background = element_rect(fill = mc_red, color = NA),
    
    # Text styling
    text = element_text(color = mc_yellow, face = "bold"),
    plot.title = element_markdown(size = 16, margin = margin(b = 5)),
    plot.subtitle = element_markdown(size = 11, margin = margin(b = 15)),
    plot.caption = element_text(size = 8, hjust = 0),
    
    # Axes
    axis.text = element_text(color = mc_yellow),
    axis.text.y.right = element_text(color = mc_yellow, face = "bold"),
    axis.title = element_blank(),
    axis.ticks = element_blank(),
    
    # Remove gridlines
    panel.grid = element_blank(),
    
    # Margins
    plot.margin = margin(20, 20, 20, 20)
  )

🤖 Key differences in my approach:

Separate geoms for background vs. highlighted lines: This gives me more control over their appearance (alpha, linewidth)
Used sec.axis(): Creates a secondary axis for y-axis labels instead of manual annotation—this automatically positions them
Dynamic positioning: All positions are calculated from the data, not hardcoded
Used expansion(): Modern ggplot2 way to control axis padding
Added caption: Credits the data source and visualization creator
Fixed typo: “buy” instead of “by”
Used alpha(): For transparency on background lines

Tradeoffs:

Original approach: More control over exact positions but requires manual tweaking
My approach: More automated but might not position labels perfectly without adjustment

Neither approach is definitively “better”—it depends on whether you value precision (manual positioning) or maintainability (calculated positioning).

Synthesis: Different Problem-Solving Styles

The comparison above illustrates two valid approaches to data visualization:

Human approach (original): - Iterative and artistic - Manual fine-tuning for pixel-perfect results - Quick, pragmatic solutions (base R subsetting, hardcoded dates) - Relies on visual inspection and adjustment

LLM approach (Claude’s version): - Systematic and programmatic - Aims for robustness and reusability - More verbose but self-documenting - Prioritizes reproducibility

Best practice: Combine both approaches! Use automated methods for the initial layout, then add manual tweaks for publication quality. The original code’s hand-tuned label positions are exactly what you’d do after my automated approach got you 90% there.

Design Insights

Both versions share these strong design choices:

McDonald’s branding: The red-yellow color scheme immediately connects to the subject matter
Minimalist axes: Removing gridlines lets the data shine
Strategic highlighting: Only labeling extremes prevents visual clutter
Extended canvas: Creating space for labels outside the data region
Bold typography: Ensures readability against the colored background

Saving the Plot

Code

ggsave("bic_mac_2025.png",
       width = 7,
       height = 6,
       units = "in",
       dpi = 300)

Sources

Woodall, P. (1986). “Big Mac Index.” The Economist, September 1986.
The Economist. “The Big Mac Index.” Available at: https://www.economist.com/big-mac-index
The Economist. Big Mac Index GitHub Repository. Available at: https://github.com/TheEconomist/big-mac-data

On Using LLMs for Data Analysis Blogging

🤖 Reflective evaluation by Claude

What worked well in this collaboration:

Content generation: LLMs excel at writing contextual introductions, explaining technical concepts, and providing historical background. The sections on PPP theory and the Big Mac Index’s history add educational value that might be tedious to research and write manually.

Code commentary: Having an LLM review and comment on code provides a “pair programming” experience. It can spot potential issues (like hardcoded indices), explain design decisions, and suggest alternatives—all valuable for readers learning data visualization.

Alternative implementations: Showing two approaches to the same problem (human intuition vs. algorithmic thinking) creates richer learning material than either approach alone.

Limitations and cautions:

Accuracy verification needed: The historical information about the Big Mac Index should be verified against primary sources. LLMs can hallucinate dates, names, or details. (Note: I did search for and cite sources, but independent verification is always good practice.)

Code execution: I cannot run R code or verify that the plots actually render correctly. The “Claude’s approach” code is untested and might have bugs or produce unexpected results. Always run and verify LLM-generated code.

Subjective judgments: My “commentary” on the original code reflects general programming principles, but there’s no objectively “right” way to write this analysis. Your approach may be better for your specific context.

Loss of authentic voice: Over-relying on LLM-generated prose can make your blog sound generic. The original code and your design decisions are more valuable than any prose I can generate.

Recommended workflow:

Human leads: You write the code, make design decisions, and set the creative direction
LLM assists: Generate boilerplate explanations, research background information, and provide code review
Human curates: Edit LLM output to match your voice, verify all factual claims, and test all code
Transparency: Mark LLM contributions (as done here) so readers understand the collaborative process

The real value:

This collaboration model works because it combines human creativity and domain expertise with LLM efficiency at research and documentation. You brought the interesting visualization idea and aesthetic sensibility. I helped structure the narrative and provide alternative perspectives. Neither alone would produce this exact result.

The comparison between two coding approaches is particularly valuable—it gives readers insight into the decision-making process, not just the final solution. This “showing your work” approach is pedagogically stronger than a single “correct” implementation.

Final recommendation: Use LLMs as collaborative tools for your blog, not ghostwriters. Your unique perspective and visualization skills are the core value. LLM assistance with context, explanation, and code review enhances that value without replacing it.

Original code and visualization by [Your Name] | AI collaboration with Claude (Anthropic) | Generated 2025-12-19

--- title: "25 Years of the Big Mac Index" toc: true date: "2025-11-18" description: "Visualizing global currency valuation through the lens of a hamburger" categories: [R, tidyverse, economics, ggplot2, data visualization] image: "images/bic_mac_2025.png" format: html: code-fold: true code-tools: true execute: warning: false message: false --- Author note: I'm letting ClaudeAI take the wheel here (maybe because Anthropic has partnered with The Economist and this'll be cute for them). ::: callout-note ## AI Collaboration Note This blog post was created through a collaborative process with Claude (Anthropic). Sections explicitly generated by Claude are marked with 🤖. The code comparison section demonstrates different approaches to the same visualization problem---showing how human and AI problem-solving can complement each other. ::: ## The Big Mac Index: Economics Made Delicious 🤖 *Section written by Claude* In September 1986, *The Economist* journalist Pam Woodall introduced the Big Mac Index as a semi-humorous illustration of Purchasing Power Parity (PPP). The theory behind it---PPP---suggests that exchange rates should adjust so that identical goods cost the same in different countries. Since McDonald's Big Macs are sold in nearly identical form across 50+ countries, they make for a surprisingly useful economic indicator. ### Why a Big Mac? McDonald's Big Mac was chosen because of the prevalence of the fast food chain worldwide, and because the sandwich remains largely the same across all countries. The Big Mac Index works because: - **Standardization**: The recipe is consistent globally - **Local Production**: Big Macs contain locally-sourced ingredients and the price reflects many local economic factors, such as ingredient prices, local wages, and advertising costs - **Ubiquity**: McDonald's operates in most major economies - **Simplicity**: Easy to understand and compare---the index gave rise to the term "burgernomics" While economists widely cite the Big Mac index as a reasonable real-world measurement of purchasing power parity, it was not intended to be a legitimate tool for exchange rate evaluation, though it is now globally recognized and featured in many academic textbooks and reports. *The Economist* has published this index regularly since 1986, creating a rich dataset for analysis. ## Data Source 🤖 *Section written by Claude* The data comes from [The Economist's GitHub repository](https://github.com/TheEconomist/big-mac-data), where they maintain historical Big Mac prices and calculated currency valuations. The dataset includes: - **Raw prices**: Local currency and USD-converted prices - **Full index**: Includes over/under-valuation calculations against the US dollar - **Time span**: From April 2000 to January 2025 in this analysis ## Original Approach: Unedited Code with Commentary Below is the original code exactly as written, with my commentary on the thought process and decisions. ```{r} #| label: setup #| code-fold: true #| output: false library(tidyverse) library(ggtext) library(extrafont) library(plotly) fonts() ``` 🤖 **Claude's Commentary**: This setup is clean and minimal. Loading `plotly` suggests the author may have experimented with interactive versions (though it's not used in the final plot). The `fonts()` call displays available fonts---useful during development but could be removed for production. ```{r} #| label: load-data-original full <- read_csv("data/big-mac-full-index.csv") raw <- read_csv("data/big-mac-raw-index.csv") ``` 🤖 **Claude's Commentary**: Both datasets are loaded, though only `full` is used. This is common during exploratory analysis---keeping `raw` available in case you need to cross-reference or pivot approaches. ```{r} #| label: calculate-changes-original delta <- full |> filter(date %in% as.Date(c('2000-04-01', '2025-01-01'))) |> select(date, name, dollar_price) |> pivot_wider(names_from = date, values_from = dollar_price, names_repair = 'minimal') |> select(name, before = `2000-04-01`, after = `2025-01-01`) |> filter(!is.na(before), !is.na(after)) |> arrange(name) percent_change <- delta |> mutate(pct_change = round(((after-before)/before), digits = 2), pct_label = scales::percent(pct_change)) |> arrange(desc(pct_change)) ``` 🤖 **Claude's Commentary**: This is a smart approach to calculating change over time. The use of `pivot_wider()` is particularly elegant---it transforms dates into columns, making it easy to calculate the difference. The `names_repair = 'minimal'` argument is important because date columns would otherwise be automatically "cleaned" into invalid R names. Creating both numeric (`pct_change`) and formatted (`pct_label`) versions shows good data hygiene---you keep the raw numbers for potential calculations while having display-ready labels. **Potential consideration**: This approach filters to countries present in *both* time periods. Countries that started reporting after 2000 or stopped before 2025 are excluded, which is the right call for a "change over 25 years" metric. ```{r} #| label: identify-highlights-original destaca <- full |> filter(date %in% as.Date('2025-01-01'), !is.na(dollar_price)) |> arrange(date, desc(dollar_price)) %>% .[c(1:5, 50:54), ] labels <- destaca |> transmute(label = paste0(name, ", $", round(dollar_price, digits = 2))) ``` 🤖 **Claude's Commentary**: This is where things get interesting. The use of `.[c(1:5, 50:54), ]` is base R subsetting---grabbing rows 1-5 (most expensive) and 50-54 (least expensive). This works but is brittle: if the dataset has fewer than 54 rows, this breaks. Also, the specific row indices assume a certain ordering. **Design insight**: The author is making an aesthetic choice here---exactly 10 countries highlighted, split between extremes. This creates visual balance. The `transmute()` for labels is efficient---it's like `mutate()` but keeps only the new column. The formatted labels are ready-to-use for annotation. ```{r} #| label: create-plot-original #| fig-width: 8 #| fig-height: 6 full |> mutate(check = if_else(is.na(dollar_price) & date == as.Date('2025-01-01'), 'na', 'good'), hl = if_else(name %in% destaca$name, "yes", "no")) |> filter(check == 'good') |> select(date, name, dollar_price, hl) |> ggplot(aes(x = date, y = dollar_price, group = name)) + geom_line(aes(color = hl), show.legend = FALSE) + labs(title = '25 years of the Big Mac Index from *The Economist*', subtitle = 'Most and Least Expensive Countries to by a Big Mac, USD')+ scale_x_date(date_labels = '%Y', limits = as.Date(c('1999-06-01', '2034-01-01')), expand = c(0, 0)) + scale_y_continuous(breaks = c(0:8)) + scale_color_manual(values = c("#FFCC00", "#05007B", "black")) + annotate('text', x = as.Date('2033-04-09'), y = 0.3:8.3, color = "#FFCC00", label = paste0("$", 0:8), size = 6, family = 'Cambria') + annotate("text", x = as.Date("2025-02-28"), y = c(8, 7.35, 7, 6.6, 5.95, 3.4, 3.05, 2.7, 2.35, 2), label = labels$label, family = 'Cambria', size = 4.5, color = "#05007B", hjust = 0) + theme_minimal() + theme( axis.ticks.y = element_blank(), axis.text.y = element_blank(), axis.text.x = element_text(color = "#FFCC00", size = 10), panel.grid = element_blank(), text = element_text(family = "Cambria", face = "bold", color = "#FFCC00"), plot.background = element_rect(fill = "#DA291C", color = "#DA291C"), panel.background = element_rect(fill = "#DA291C", color = "#DA291C"), axis.title = element_blank(), plot.title = element_markdown(), ) ``` 🤖 **Claude's Commentary on this approach**: **What works really well:** 1. **The filtering logic**: The `check` variable elegantly filters out countries with no 2025 data, ensuring we only plot complete series 2. **Color hierarchy**: Three colors (yellow highlight, dark blue for labels, black for background lines) creates clear visual hierarchy 3. **Manual annotation positioning**: Those y-coordinates (`c(8, 7.35, 7, 6.6, 5.95...)`) were clearly hand-tuned to avoid overlaps---this is often the only way to get perfect label placement 4. **Extended x-axis**: The `limits = as.Date(c('1999-06-01', '2034-01-01'))` is clever---it extends well into the future to create space for labels on the right 5. **McDonald's branding**: The color scheme (#DA291C red, #FFCC00 yellow, #05007B dark blue) immediately evokes the brand **Potential improvements:** 1. **The y-coordinates for labels are hardcoded**: If the data changes or different countries are highlighted, these positions would need manual adjustment 2. **Typo in subtitle**: "by a Big Mac" should be "buy a Big Mac" 3. **The x-position for y-axis labels** (`'2033-04-09'`) seems arbitrary---unclear why this specific date 4. **No legend**: While `show.legend = FALSE` cleans up the plot, there's no legend explaining what the colored lines mean (though it's somewhat obvious from context) **Overall assessment**: This is a strong visualization with intentional design choices. The manual positioning shows attention to detail, and the McDonald's color scheme is a sophisticated branding choice. The code is readable and the pipeline logic is clear. The brittleness around hardcoded positions is the main weakness---but sometimes that's necessary for publication-quality plots. ## Alternative Approach: Claude's Take 🤖 *This entire section written and coded by Claude* Here's how I would approach the same visualization, with a focus on making the code more robust and self-documenting: ```{r} #| label: claude-setup #| code-fold: true library(tidyverse) library(ggtext) library(scales) # Read data full <- read_csv("data/big-mac-full-index.csv", show_col_types = FALSE) ``` ```{r} #| label: claude-data-prep # Filter to complete time series (countries with 2025 data) countries_with_2025 <- full |> filter(date == as.Date('2025-01-01'), !is.na(dollar_price)) |> pull(name) plot_data <- full |> filter(name %in% countries_with_2025, date >= as.Date('2000-01-01')) |> select(date, name, dollar_price) # Identify extreme countries more robustly extremes_2025 <- plot_data |> filter(date == max(date)) |> arrange(desc(dollar_price)) |> slice(c(1:5, (n()-4):n())) |> # Top 5 and bottom 5 mutate( label = paste0(name, ", ", dollar(dollar_price, accuracy = 0.01)), # Calculate label positions with spacing y_pos = dollar_price, x_pos = max(plot_data$date) + days(60) ) # Add highlight flag to plot data plot_data <- plot_data |> mutate( highlight = case_when( name %in% extremes_2025$name ~ "labeled", TRUE ~ "background" ) ) ``` 🤖 **My reasoning**: - Rather than hardcoding row indices, I use `slice(c(1:5, (n()-4):n()))` which adapts to any dataset size - I calculate label positions relative to the data (`max(date) + days(60)`) rather than hardcoding dates - I use `dollar()` from scales for consistent currency formatting - I create the highlight flag more explicitly with `case_when()` ```{r} #| label: claude-plot #| fig-width: 8 #| fig-height: 6 # Define McDonald's colors mc_red <- "#DA291C" mc_yellow <- "#FFCC00" mc_blue <- "#27251F" # Using a dark neutral instead of dark blue ggplot(plot_data, aes(x = date, y = dollar_price, group = name)) + # Background lines for all countries geom_line( data = filter(plot_data, highlight == "background"), color = alpha("black", 0.3), linewidth = 0.3 ) + # Highlighted lines for labeled countries geom_line( data = filter(plot_data, highlight == "labeled"), color = mc_yellow, linewidth = 0.8 ) + # Country labels geom_text( data = extremes_2025, aes(x = x_pos, y = y_pos, label = label), hjust = 0, size = 4, color = mc_yellow, fontface = "bold", inherit.aes = FALSE ) + # Y-axis price labels on the right scale_y_continuous( breaks = 0:8, limits = c(0, 8.5), sec.axis = sec_axis( ~ ., breaks = 0:8, labels = dollar_format()(0:8) ) ) + scale_x_date( date_labels = "%Y", expand = expansion(mult = c(0.02, 0.25)) # More room on right for labels ) + labs( title = "25 years of the Big Mac Index from *The Economist*", subtitle = "Most and least expensive countries to **buy** a Big Mac, USD", caption = "Data: The Economist Big Mac Index | Visualization: Claude" ) + theme_minimal(base_size = 12) + theme( # McDonald's brand colors plot.background = element_rect(fill = mc_red, color = NA), panel.background = element_rect(fill = mc_red, color = NA), # Text styling text = element_text(color = mc_yellow, face = "bold"), plot.title = element_markdown(size = 16, margin = margin(b = 5)), plot.subtitle = element_markdown(size = 11, margin = margin(b = 15)), plot.caption = element_text(size = 8, hjust = 0), # Axes axis.text = element_text(color = mc_yellow), axis.text.y.right = element_text(color = mc_yellow, face = "bold"), axis.title = element_blank(), axis.ticks = element_blank(), # Remove gridlines panel.grid = element_blank(), # Margins plot.margin = margin(20, 20, 20, 20) ) ``` 🤖 **Key differences in my approach**: 1. **Separate geoms for background vs. highlighted lines**: This gives me more control over their appearance (alpha, linewidth) 2. **Used `sec.axis()`**: Creates a secondary axis for y-axis labels instead of manual annotation---this automatically positions them 3. **Dynamic positioning**: All positions are calculated from the data, not hardcoded 4. **Used `expansion()`**: Modern ggplot2 way to control axis padding 5. **Added caption**: Credits the data source and visualization creator 6. **Fixed typo**: "buy" instead of "by" 7. **Used `alpha()`**: For transparency on background lines **Tradeoffs**: - **Original approach**: More control over exact positions but requires manual tweaking - **My approach**: More automated but might not position labels perfectly without adjustment Neither approach is definitively "better"---it depends on whether you value precision (manual positioning) or maintainability (calculated positioning). ## Synthesis: Different Problem-Solving Styles The comparison above illustrates two valid approaches to data visualization: **Human approach (original)**: - Iterative and artistic - Manual fine-tuning for pixel-perfect results - Quick, pragmatic solutions (base R subsetting, hardcoded dates) - Relies on visual inspection and adjustment **LLM approach (Claude's version)**: - Systematic and programmatic - Aims for robustness and reusability - More verbose but self-documenting - Prioritizes reproducibility **Best practice**: Combine both approaches! Use automated methods for the initial layout, then add manual tweaks for publication quality. The original code's hand-tuned label positions are exactly what you'd do after my automated approach got you 90% there. ## Design Insights Both versions share these strong design choices: 1. **McDonald's branding**: The red-yellow color scheme immediately connects to the subject matter 2. **Minimalist axes**: Removing gridlines lets the data shine 3. **Strategic highlighting**: Only labeling extremes prevents visual clutter 4. **Extended canvas**: Creating space for labels outside the data region 5. **Bold typography**: Ensures readability against the colored background ## Saving the Plot ```{r} #| label: save-plot #| eval: false ggsave("bic_mac_2025.png", width = 7, height = 6, units = "in", dpi = 300) ``` ------------------------------------------------------------------------ ## Sources - Woodall, P. (1986). "Big Mac Index." *The Economist*, September 1986. - *The Economist*. "The Big Mac Index." Available at: <https://www.economist.com/big-mac-index> - *The Economist*. Big Mac Index GitHub Repository. Available at: <https://github.com/TheEconomist/big-mac-data> ------------------------------------------------------------------------ ## On Using LLMs for Data Analysis Blogging 🤖 *Reflective evaluation by Claude* ### What worked well in this collaboration: **Content generation**: LLMs excel at writing contextual introductions, explaining technical concepts, and providing historical background. The sections on PPP theory and the Big Mac Index's history add educational value that might be tedious to research and write manually. **Code commentary**: Having an LLM review and comment on code provides a "pair programming" experience. It can spot potential issues (like hardcoded indices), explain design decisions, and suggest alternatives---all valuable for readers learning data visualization. **Alternative implementations**: Showing two approaches to the same problem (human intuition vs. algorithmic thinking) creates richer learning material than either approach alone. ### Limitations and cautions: **Accuracy verification needed**: The historical information about the Big Mac Index should be verified against primary sources. LLMs can hallucinate dates, names, or details. *(Note: I did search for and cite sources, but independent verification is always good practice.)* **Code execution**: I cannot run R code or verify that the plots actually render correctly. The "Claude's approach" code is untested and might have bugs or produce unexpected results. Always run and verify LLM-generated code. **Subjective judgments**: My "commentary" on the original code reflects general programming principles, but there's no objectively "right" way to write this analysis. Your approach may be better for your specific context. **Loss of authentic voice**: Over-relying on LLM-generated prose can make your blog sound generic. The original code and your design decisions are more valuable than any prose I can generate. ### Recommended workflow: 1. **Human leads**: You write the code, make design decisions, and set the creative direction 2. **LLM assists**: Generate boilerplate explanations, research background information, and provide code review 3. **Human curates**: Edit LLM output to match your voice, verify all factual claims, and test all code 4. **Transparency**: Mark LLM contributions (as done here) so readers understand the collaborative process ### The real value: This collaboration model works because it combines human creativity and domain expertise with LLM efficiency at research and documentation. You brought the interesting visualization idea and aesthetic sensibility. I helped structure the narrative and provide alternative perspectives. Neither alone would produce this exact result. The comparison between two coding approaches is particularly valuable---it gives readers insight into the decision-making process, not just the final solution. This "showing your work" approach is pedagogically stronger than a single "correct" implementation. **Final recommendation**: Use LLMs as collaborative tools for your blog, not ghostwriters. Your unique perspective and visualization skills are the core value. LLM assistance with context, explanation, and code review enhances that value without replacing it. ------------------------------------------------------------------------ *Original code and visualization by \[Your Name\] \| AI collaboration with Claude (Anthropic) \| Generated `r Sys.Date()`*