Skip to main content

Data & Code

This page is a small hub for datasets, helper functions, and relics of scripts that turned out to be useful more than once (even more so when your favourite AI assistant is offline). Most things here are lightweight: a mix of Python, Stata, and a few project-specific utilities.

Feel free to use anything that helps. If you see something broken, strange, or improvable, send me a note or ping me on GitHub.

API access

Minimal working examples for the data sources I return to most. Each entry shows the auth requirements upfront.

Utilities

Small reusable pieces I keep reaching for across projects.

Country identifiers

Maps between ISO3, ISO2, UN M49, and country names — the glue layer every multi-source merge eventually needs.

Python · country lookup (ISO3 ↔ name ↔ UN numeric)

Quick mapping between ISO codes and country names.

Python
country_identifier.py
python
1import pandas as pd
2import pycountry
3import re
4
5# ----- dummy input -----
6df = pd.DataFrame({"ISO3": ["HKG", "GBR", "CHE", "ETH"]})
7
8# ----- helper class -----
9class CountryCodeConverter:
10 def __init__(self):
11 self.alpha3_to_num = {}
12 self.num_to_alpha3 = {}
13 for c in pycountry.countries:
14 if hasattr(c, "alpha_3") and hasattr(c, "numeric"):
15 iso3 = c.alpha_3
16 num = c.numeric.zfill(3)
17 self.alpha3_to_num[iso3] = num
18 self.num_to_alpha3[num] = iso3
19
20 def to_un_numeric(self, iso3):
21 return self.alpha3_to_num.get(str(iso3).upper())
22
23cc = CountryCodeConverter()
24
25# ----- Generate complimentary identifiers -----
26df["ISO2"] = df["ISO3"].apply(
27 lambda x: pycountry.countries.get(alpha_3=x).alpha_2
28 if pycountry.countries.get(alpha_3=x) else None
29)
30df["UN_numeric"] = df["ISO3"].apply(cc.to_un_numeric)
31df["country"] = df["ISO3"].apply(
32 lambda x: pycountry.countries.get(alpha_3=x).name
33 if pycountry.countries.get(alpha_3=x) else None
34)

Datetime conversion

For reshaping quarterly/annual series, building year–quarter indices, and fixing date formats that should have been standard but weren't.

Python · datetime parsing & quarters

Convert dates into year, month, and quarter (using crisis dates as example).

Python
datetime.py
python
1import pandas as pd
2from datetime import datetime
3
4# ----- sample df -----
5df = pd.DataFrame({
6 "label": ["Dot-com", "GFC", "COVID"],
7 "date_str": ["2000-03-10", "2008-09-15", "2020-03-11"]
8})
9
10# ----- convert to datetime -----
11df["date"] = df["date_str"].apply(
12 lambda x: datetime.strptime(x, "%Y-%m-%d")
13)
14# ----- extract components -----
15df["year"] = df["date"].dt.year
16df["month"] = df["date"].dt.month
17df["quarter"] = df["month"].apply(lambda m: (m - 1) // 3 + 1)
18df["year_quarter"] = df.apply(
19 lambda r: f"{r.year}Q{r.quarter}", axis=1
20)

File export

A simple wrapper to export DataFrames with a consistent datestamp in the filename — useful for sharing outputs without overwriting previous runs. The Stata version additionally handles column-name truncation and Latin-1 encoding issues that can silently corrupt .dta exports from Python.

Python · file export in excel & stata formats

Assume df is the target dataframe to be exported.

Python
export_to_excel.py
python
1import pandas as pd
2from datetime import datetime
3
4def output_to_excel(df, filename):
5 today = datetime.now().strftime('%d%b%Y')
6 # %d%b%Y would return something like "01Jan2026"
7 # Alternatively use: %Y%m%d for "20260101" if you prefer numeric sorting
8 output = f'./{filename}_{today}.xlsx'
9 df.to_excel(output, index=False)
10 print(f"Excel file saved: {output}")
11
12# Usage:
13output_to_excel(df, "Filename")
export_to_stata.py
python
1import pandas as pd
2from datetime import datetime
3
4def output_to_stata(df, filename):
5 today = datetime.now().strftime('%d%b%Y')
6 # %d%b%Y would return something like "01Jan2026"
7 # Alternatively use: %Y%m%d for "20260101" if you prefer numeric sorting
8 df = df.copy() # avoid mutating the caller's DataFrame
9
10 # Clean column names: remove invalid chars, truncate to 32 chars (Stata limit)
11 df.columns = (
12 df.columns
13 .str.replace('-', '_')
14 .str.replace(' ', '_')
15 .str.replace(r'[^0-9a-zA-Z_]', '', regex=True)
16 .str[:32]
17 )
18
19 # Ensure string values are Latin-1 compatible (replaces unencodable chars with '?')
20 for col in df.select_dtypes(include=[object]):
21 df[col] = (
22 df[col].astype(str)
23 .apply(lambda x: x.encode('latin-1', errors='replace').decode('latin-1'))
24 )
25
26 output = f'{filename}_{today}.dta'
27 df.to_stata(output, write_index=False, version=117)
28 print(f"Stata file saved: {output}")
29
30# Usage:
31output_to_stata(df, "Filename")

Visualisation (cool graph(s))

Chart templates built on top of the API sources above.

Country ball plot

Country-year scatter using flag images as markers for the top economies per year. Two variants: flag version for presentations, ISO3-label version for publications and offline use. Data pulled live from IMF WEO — no local file needed. View the full guide →

African GDP per capita country ball plot preview