API with Python via Google colab

One of the readers raised the problem that Stata-Python is only available in Stata 16 onwards. Users with older versions would not be able to make use of the convenient API. Here is a variant version from the original guide where I will show you how to download the data via Google Colab, a completely free and online python platform requires minimal setup.

Step 0 - Preamble

  1. Install Google Colab on your browser (preferably Google Chrome) if you haven’t had it.

  2. Setup your folder on google drive. Recommended folder name: Comtrade with two subfolder: code and i_X_CHN

Step 1 - Mount Google Drive locally

from google.colab import drive
drive.mount('/content/drive')

My personal trick is to separate the drive mounting code from the rest of the code block, hence you don’t need to re-authorize in every re-run.

Step 2 - Paste the Python code and adjust file directory

  1. Copy and paste everything between Python: and end from the final do-file

  2. Add the following file directory shortcut

root = '/content/drive/MyDrive/Comtrade/i_X_CHN'

In addition, adjust the data export line:

df.to_stata(f'{root}/i_X_CHN_{ps}.dta')

Then you should be done and arrive to this:

from google.colab import drive
drive.mount('/content/drive')
import json
import numpy        as  np
import pandas       as  pd
import requests

root = '/content/drive/MyDrive/Comtrade/i_X_CHN'
def Comtrade_Scraper   (ps: int,
                       type: str=   'C',
                       freq: str=   'A',
                       px  : str=  'S2',
                       r   : str= 'all',
                       p   : int=     156,
                       rg  : int=     2,
                       cc  : str= 'AG2'):
    """
    Wrapper for creating URLs to access the Comtrade API

    ARGUMENTS
    *********
    Required
    ps   = year
    """
    base      = 'https://comtrade.un.org/api/get?max=10000'
    url       = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'

    result    = requests.get(url).json()
    if 'dataset' in result: 
        df        = pd.DataFrame(result['dataset'])
        df        = df.replace({None: np.nan})
        df.columns= [i[:32] for i in df.columns]

        df.to_stata(f'{root}/i_X_CHN_{ps}.dta')
        return df

for i in range(2000,2022): Comtrade_Scraper(i)

Output files should look like this:

Some drawbacks:

  • There are some glitches, e.g. 2004 data was not downloaded (file size only 315 bytes)

  • Re-run the code block too many times will hit the request limit (somehow more often than the Stata version)

If you have a Python setup already, e.g. ANOVA or others, you are probably better off using that than Google Colab. It is convenient but it is not without a cost.