API for UN Comtrade bulk data

21 Mar 2024: Found a Comtrade GitHub page with their official API guideline and code to use their new platform. They have explained better than I could: link here

21 Sep 2023: Comtrade decided to make an upgrade that makes the platform somehow LESS user-friendly... This API guide is not compatible to the current Comtrade platform.

UN Comtrade Database is one of the best source when it comes to bilateral trade data by product code. As of early 2022, it covers more country-year observations than WTO and ITC. This blogpost aims to guide you through step-by-step to fetch large amount of data from Comtrade via Stata-Python function.

*I used to be a Python dummy (to a certain extent, still am), it was my buddy, Satyam Anand, at Graduate Institute (now at George Town) introduced the API magic to me. All credit goes to him.

Too long didn’t read: jump to the final do.file here

Step 0 - Preamble: Is your Stata ready for this?

1) Check if Python is already integrated with your Stata, you can simply type “python search” in the Stata command box. If there is no Python installation found, follow this

. python search
----------------------------------------------------------------------------------------------------------------------
 Python environments found:  
 /usr/local/bin/python3
----------------------------------------------------------------------------------------------------------------------

2) Check if you have the necessary python packages for the task. Follow this guide to install the missing package(s)

. python which json
<module 'json' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py
> '>
. python which numpy
<module 'numpy' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/n
> umpy/__init__.py'>

. python which pandas
<module 'pandas' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/
> pandas/__init__.py'>

. python which requests
<module 'requests' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package
> s/requests/__init__.py'>

Step 1 - Identify your data of interest

As a working example, let’s say we want to get:

  • Data: country i export to China, for all countries in the world

  • Frequency: Annual

  • Time period: 2000 to 2021

  • Classification: All 2-digit SITC Rev. 2

Once, you put the according info on Comtrade - Get data you should see the below page with “View API call” in the bottom

! The requested data is therefore accessible via: http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2

Note that, I changed max=502 to max=1000, I will explain below what does that mean and why.

Step 2 - Understand the API call

/api/get?max=502&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2 (What are we looking at ??)

API code Range Category In simple words
max=502 [1, 10000] Limit Download limits on number of observations, guest limit is 10000
type=C [C,S] Data type C is Commodities (merchandise trade data); S is Services (trade in services data)
freq=A [A,M] Obs. frequency A is annual frequency; M is monthly
px=S2 S[i], H[i], i = [1,4] Classification system S2 is SITC Revision 2 ; S[i] is SITC Revision i ; H0 is HS 1992 (oldest), H4 is HS 2012 (relatively latest)
ps=2021 [1962,2021] Time period 2021 is year 2021; it is possible to do up to 5 years per request, but it might run into the download limit, hence I will do it by year here
r=all All, UN codes Reporting area all is all countries (Not aggregated)
p=156 All, UN codes Partner(s) 156 is China’s UN numerical country code
rg=2 [1,2,all] Trade flow 2 is export from reporting area to partner, 1 is import, all is both (not recommended, because download ceiling AGAIN!)
cc=AG2 [TOTAL, AG1, AG2, AG3, AG4, AG5, AG6, ALL] Classification code AG2 is all of the 2-digit codes, AG6 is the most detailed option that are internationally comparable.

See more in SITC Revision 2, UN numerical country code and API documentation

Some of you might already notice the pattern:
/api/get?[API code 1]&[API code 2]&….
while each block of [API code] is in a format of
[x] = [choice]

We will make use of this pattern for the Python code below.

! Now, recall our objective is to fetch all bilateral export to China from 2000 to 2021, that means the above API setting is applicable to all years once we introduce a simple loop on ps = [2000, 2001, 2002, …, 2021]

Step 3 - Check the API link

We have this link generated by the click and select interface from UN Comtrade
http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2

We better make sure the API link is valid before put it in work

  1. Install JSON formatter on Chrome extension

  2. Copy and paste the API link

  3. You should see the below:

! Note that all our data of interest begins from

“dataset” : (red box)

Step 4 - Test version

Jump to code

1) Open do-file, set up the destination folder

local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china  
cd `filepath'/output/i_X_CHN

2) Initiate python in Stata do-file

python: 

[python code]

end

3) [python code part] Add necessary packages

import json
import numpy        as  np
import pandas       as  pd
import requests

4) Copy and paste the valid API link we have AND fetch the JSON file

url       = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2'

result    = requests.get(url).json()

5) Recall that we found our data of interest from “dataset” : in Step 3

*Indent is essential

if 'dataset' in result: 
	df        = pd.DataFrame(result['dataset'])
	df        = df.replace({None: np.nan})
	df.columns= [i[:32] for i in df.columns]

6) Store the data in Stata format (.dta)

        df.to_stata(f'i_X_China_2021.dta')

end 
*exiting python environment

7) Check your work

use ./i_X_China_2021, clear

^All countries export to China in year 2021

Test Version:

local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china  
cd `filepath'/output/i_X_CHN

python:
import json
import numpy        as  np
import pandas       as  pd
import requests

url       = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2'

result    = requests.get(url).json()
if 'dataset' in result: 
	df        = pd.DataFrame(result['dataset'])
	df        = df.replace({None: np.nan})
	df.columns= [i[:32] for i in df.columns]

	df.to_stata(f'i_X_China_2021.dta')

end

use ./i_X_China_2021, clear

Step 5 - Loop it through years

Now that we have a single-year do-file up and running, we can easily extend it with a loop from 2000 to 2021 and storing each year as one .dta file.

1) Compartmentalize the API link

def Comtrade_Scraper   (ps: int,
                       type: str=   'C',
                       freq: str=   'A',
                       px  : str=  'S2',
                       r   : str= 'all',
                       p   : int=     156,
                       rg  : int=     2,
                       cc  : str= 'AG2'):

Note that ps is the only compartment without an equal sign as it is our only varying part and it is set to be loops by the later command

2) Define the ps (refers to year) being the varying input

    """
    Wrapper for creating URLs to access the Comtrade API

    ARGUMENTS
    *********
    Required
    ps   = year
    """

3) Assemble the API link

    base      = 'https://comtrade.un.org/api/get?max=10000'
    url       = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'

4) JSON to .dta file

a little twist here is that now we name the file as i_X_China_{ps}.dta where {ps} refers to the year [2000, 2021]

    result    = requests.get(url).json()
    if 'dataset' in result: 
        df        = pd.DataFrame(result['dataset'])
        df        = df.replace({None: np.nan})
        df.columns= [i[:32] for i in df.columns]

        df.to_stata(f'i_X_China_{ps}.dta')
        return df

Note that I also added return df, it is just a small trick to return the dataframe right before converting it into .dta file. The practical use is that you can keep an eye on which year has ran and quickly vet if there is any issue from the no. of obs. (red box) and/or no. of columns (blue box).

5) Introduce the loop

Python iterates with increments by 1 and stops before the ending number. In other words, we have to put (2000, 2022) for our period of interest year 2000 to 2021.

for i in range(2000,2022): Comtrade_Scraper(i)

6) Assemble all

Final do-file:

local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china  
cd `filepath'/output/i_X_CHN

python:
import json
import numpy        as  np
import pandas       as  pd
import requests

def Comtrade_Scraper   (ps: int,
                       type: str=   'C',
                       freq: str=   'A',
                       px  : str=  'S2',
                       r   : str= 'all',
                       p   : int=     156,
                       rg  : int=     2,
                       cc  : str= 'AG2'):
    """
    Wrapper for creating URLs to access the Comtrade API

    ARGUMENTS
    *********
    Required
    ps   = year
    """
    base      = 'https://comtrade.un.org/api/get?max=10000'
    url       = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'

    result    = requests.get(url).json()
    if 'dataset' in result: 
        df        = pd.DataFrame(result['dataset'])
        df        = df.replace({None: np.nan})
        df.columns= [i[:32] for i in df.columns]

        df.to_stata(f'i_X_China_{ps}.dta')
        return df

for i in range(2000,2022): Comtrade_Scraper(i)
end

*Explains below in 8) Final touch
di "`c(pwd)'" // Display path to current folder
local files : dir "`c(pwd)'" files "*.dta" 
foreach x of local files {
    di "`x'" // Display file name
	append using `x'
}
save i_X_China_2000_2021.dta, replace

7) Result

8) Final touch

Appending all the by-year files into long format

di "`c(pwd)'" // Display path to current folder
local files : dir "`c(pwd)'" files "*.dta" 
foreach x of local files {
    di "`x'" // Display file name
	append using `x'
}
save i_X_China_2000_2021.dta, replace

Variant version:

Some final notes:

An obvious shortcoming is that the download data limit restricting researchers access to the most granular data possible (6-digit product code). Some organisation has premium site license subscription, e.g. Geneva Graduate Institute. By connecting via their IP addresses, I could download up to 100’000 obs. per request, which is sufficient to cover all country i-to-country j by 4-digit product code (75’803 country-pair-product obs. for year 2019).

To go further from there, you will need to find an authentication token/ valid account at Comtrade from powerful friend(s), colleague(s) or supervisor(s).

Long page done and you have made it! Hope you all enjoy this guide! Any questions / suggestion, you can find me here.

More resources:

Stata-Python integration blogpost by Chuck Huber

Previous
Previous

CV in 1-min