{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) 2021, salesforce.com, inc.  \n",
    "All rights reserved.  \n",
    "SPDX-License-Identifier: BSD-3-Clause  \n",
    "For full license text, see the LICENSE file in the repo root or https://opensource.org/licenses/BSD-3-Clause"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# This is a notebook that uses real-world data and fits models to calibrate the COVID-19 simulation environment\n",
    "\n",
    "### What does it mean to calibrate the simulation?\n",
    "\n",
    "Simply put, we want to set the parameters of the simulation so that its behavior is consistent with real-world data.\n",
    "\n",
    "To go into a bit more detail, the point of the simulation is to capture how public health policy decisions influence the spread of COVID-19 and influence the economy.\n",
    "We model the spread of COVID-19 through a set of **SIR (+vaccination)** models, one for each US State.\n",
    "We model the economy through a model of **unemployment** (again, one for each US State) and relief payments provided by the federal government.\n",
    "\n",
    "Both the SIR disease dynamics and unemployment respond to public health policy decisions.\n",
    "This notebook quantifies those responses as measured in the real world data, and exports that quantification as parameters that the simulation can use.\n",
    "\n",
    "Another function of the simulator is to quantify **social welfare**, which is the metric that we want to maximize.\n",
    "Our simulation defines social welfare as  \n",
    "$$W = \\alpha H + (1-\\alpha)E$$\n",
    "where $H$ and $E$ are indices representing the health outcome (i.e. COVID deaths) and economic outcome (i.e. GDP). \n",
    "$\\alpha$ is a weighting term (between 0 and 1).\n",
    "\n",
    "We don't know what the actual goals/priorities were of real-world policymakers, but we want to calibrate $\\alpha$ (separately for each US State) so that the resulting definition of social welfare is most consistent with public health policy decisions made in the real world."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime, timedelta\n",
    "import json\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import os\n",
    "import pickle\n",
    "import scipy\n",
    "from scipy.signal import convolve\n",
    "from scipy.optimize import minimize\n",
    "from tqdm import tqdm\n",
    "\n",
    "import ai_economist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Install torch 1.8.0 or higher (required for the unemployment fits)\n",
    "!pip install torch==1.8.0\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before running this notebook, \n",
    "1. Please use the **gather_real_world_data.ipynb** notebook to download and pre-processing the raw real-world data.\n",
    "2. Please set the base directory path below; it is the folder into which the real_world data have been downloaded, and it corresponds to the BASE_DATA_DIR_PATH in gather_real_world_data.ipynb notebook (and defaults to \"/tmp/covid19_data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "BASE_DATA_DIR_PATH = \"/tmp/covid19_data\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use the latest dated folder in BASE_DATA_DIR_PATH to fetch the real-world data\n",
    "Note: if you do not wish to use the latest data, you will need to manually overwrite the data_dir variable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dir = os.path.join(BASE_DATA_DIR_PATH, sorted(os.listdir(BASE_DATA_DIR_PATH))[-1])\n",
    "print(\"In this notebook, we will use the real world data saved in '{}'\".format(data_dir))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read in the model constants"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(os.path.join(data_dir, \"model_constants.json\"), \"r\") as fp:\n",
    "    model_constants = json.load(fp)\n",
    "    \n",
    "DATE_FORMAT = model_constants[\"DATE_FORMAT\"]\n",
    "STRINGENCY_POLICY_KEY = model_constants[\"STRINGENCY_POLICY_KEY\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## This notebook uses the real-world data to calibrate the simulation.\n",
    "\n",
    "### What does it mean to calibrate the simulation?\n",
    "\n",
    "Simply put, we want to set the parameters of the simulation so that its behavior is consistent with real-world data.\n",
    "\n",
    "To go into a bit more detail, the point of the simulation is to capture how public health policy decisions influence the spread of COVID-19 and influence the economy.\n",
    "We model the spread of COVID-19 through a set of **SIR (+vaccination)** models, one for each US State.\n",
    "We model the economy through a model of **unemployment** (again, one for each US State) and relief payments provided by the federal government.\n",
    "\n",
    "Both the SIR disease dynamics and unemployment respond to public health policy decisions.\n",
    "This notebook quantifies those responses as measured in the real world data, and exports that quantification as parameters that the simulation can use.\n",
    "\n",
    "Another function of the simulator is to quantify **social welfare**, which is the metric that we want to maximize.\n",
    "Our simulation defines social welfare as  \n",
    "$$W = \\alpha H + (1-\\alpha)E$$\n",
    "where $H$ and $E$ are indices representing the health outcome (i.e. COVID deaths) and economic outcome (i.e. GDP). \n",
    "$\\alpha$ is a weighting term (between 0 and 1).\n",
    "\n",
    "We don't know what the actual goals/priorities were of real-world policymakers, but we want to calibrate $\\alpha$ (separately for each US State) so that the resulting definition of social welfare is most consistent with public health policy decisions made in the real world."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set data / fit parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up dictionary to write fitted parameters\n",
    "fitted_params_dict = {'settings': {}}\n",
    "fitted_params_filename = \"fitted_params.json\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, choose the dates we'll use to split the data.\n",
    "\n",
    "- Data until the last date for **training** can be used to directly fit parameters.\n",
    "\n",
    "- Date until the last date for **validation** can be used for measuring fit quality.\n",
    "\n",
    "Our default calibration settings use data in 2020 for fitting (training+validation) so that data in 2021 can be held out as a test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Do fitting up until the last day in the train set:\n",
    "fitted_params_dict['settings']['LAST_DATE_IN_TRAIN_SET'] = '2020-11-30'\n",
    "# Cross validation should uses non-training data up until:\n",
    "fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET'] = '2020-12-31'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, set some parameters governing how the SIR and unemployment models use their data. The provided default settings yield good fits for the default date range (above)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Length of the convolutional filters to use in the unemployment fitting\n",
    "fitted_params_dict['settings']['FILTER_SIZE_UNEMPLOYMENT'] = 600\n",
    "# Weight of regularization term to enforce state-by-state similarity in unemployment fits\n",
    "fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_UNEMPLOYMENT'] = 0.5\n",
    "\n",
    "# Weight of regularization term to enforce state-by-state similarity in Beta fits (for SIR model)\n",
    "fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_SIR'] = 1.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, set the environment configuration that we will use when we calibrate $\\alpha$ values (which determine how social welfare is computed).\n",
    "\n",
    "Note! $\\alpha$ calibration is sensitive to these choices, so we will want to take care to use these same settings between calibration and any downstream experiments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Env settings for calibrating alphas. Default settings reflect env defaults.\n",
    "fitted_params_dict['settings']['env'] = {\n",
    "    'economic_reward_crra_eta': 2,\n",
    "    \"start_date\": '2020-03-22',\n",
    "    \"infection_too_sick_to_work_rate\": 0.1,\n",
    "    \"pop_between_age_18_65\": 0.6,\n",
    "    \"risk_free_interest_rate\": 0.03,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load real_world_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataframes = pickle.load( open(os.path.join(data_dir, \"dataframes.pkl\"), \"rb\" ) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Beta Fits\n",
    "\n",
    "Here, we calibrate the **SIR** model.\n",
    "\n",
    "Within the SIR model, the **beta** parameter controls the rate that infections spread from infected people to susceptible people. It reflects the probability of disease transmission when a member from each group come in contact, and the rate at which those groups come into contact.\n",
    "\n",
    "As such, the public health policy can affect beta (intuitively, through its influence on how often people interact with one another).\n",
    "\n",
    "Our simulation models the relationship between public health policy (i.e. \"stringency\") and beta via a time-delayed linear model. \n",
    "Specifically, we model beta of US State $i$ at time $t$ as\n",
    "\n",
    "$$\\beta_{i,t} = \\beta_{i}^{\\pi} \\cdot \\pi_{i,t-d} + \\beta_{i}^{0},$$\n",
    "\n",
    "where $\\pi_{i,t-d}$ is the stringency level $d$ days ago, $\\beta_{i}^{\\pi}$ is the slope, and $\\beta_{i}^{0}$ is the intercept.\n",
    "\n",
    "  \n",
    "  \n",
    "The next block of code finds the time delay $d$ and then fits the slope and intercept parameters for each US State. \n",
    "Each State uses the same delay.\n",
    "In addition, we regularize the State-to-State differences in slope/intercept parameters to help reduce overfitting to noise."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE!** Although we're directly fitting linear models of beta, our ground-truth data is COVID-19 deaths. The best test of this fit is how well our simulation reproduces the COVID-19 death data given the real-world policy choices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the subset of the data we will use:\n",
    "beta_df = dataframes[\"beta\"]\n",
    "policy_df = dataframes[\"policy\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the optimal delay in the response of beta (IT SHOULD BE >0)\n",
    "\n",
    "Step through many possible policy delays to find where the empirical correlation between delayed policy and beta is strongest. Since all US States will use the same delay, we aggregate the data across states.\n",
    "\n",
    "Intuitively, we expect higher stringency to lead to smaller beta, so we're actually looking for the delay where the correlation is smallest (i.e. most negative)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "delays = list(range(-90, 90))\n",
    "\n",
    "fits = []\n",
    "\n",
    "for delay in delays:\n",
    "    D = fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "    if delay < 0:\n",
    "        x = policy_df[:D].values[-delay:].flatten()\n",
    "        y = beta_df[:D].values[:delay].flatten()\n",
    "    elif delay == 0:\n",
    "        x = policy_df[:D].values.flatten()\n",
    "        y = beta_df[:D].values.flatten()\n",
    "    else:\n",
    "        x = policy_df[:D].values[:-delay].flatten()\n",
    "        y = beta_df[:D].values[delay:].flatten()\n",
    "    keep = np.logical_not(np.logical_or(np.isnan(y), np.isnan(x)))\n",
    "\n",
    "    x = x[keep]\n",
    "    y = y[keep]\n",
    "    fit = scipy.stats.linregress(x, y)\n",
    "    \n",
    "    fits.append(fit)\n",
    "\n",
    "_, (ax0, ax1) = plt.subplots(1, 2, figsize=(16, 6));\n",
    "ax0.plot(delays, [f.rvalue for f in fits]);\n",
    "ax0.set_xlabel('Policy-vs-Beta Delay');\n",
    "ax0.set_ylabel('Correlation r-value');\n",
    "\n",
    "ax1.plot(delays, [f.slope for f in fits]);\n",
    "ax1.set_xlabel('Policy-vs-Beta Delay');\n",
    "ax1.set_ylabel('Slope of Linear Fit');\n",
    "\n",
    "BETA_DELAY = delays[np.argmin(np.array([f.rvalue for f in fits]))]\n",
    "ax0.set_title('Optimal delay = {}'.format(BETA_DELAY));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What does the aggregate policy vs. beta trend look like at this delay?\n",
    "\n",
    "This plot shows the raw policy & beta data (blue) after applying the delay. The black dashed line shows the linear fit. The red points show the beta mean at each of the stringency levels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert BETA_DELAY > 0\n",
    "\n",
    "x = policy_df[:fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']].values[:-BETA_DELAY].flatten()\n",
    "y = beta_df[:fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']].values[BETA_DELAY:].flatten()\n",
    "keep = np.logical_not(np.logical_or(np.isnan(y), np.isnan(x)))\n",
    "\n",
    "x = x[keep]\n",
    "y = y[keep]\n",
    "fit = scipy.stats.linregress(x, y)\n",
    "\n",
    "print('Fit Intercept: {:+f}'.format(fit.intercept))\n",
    "print('Fit Slope:     {:+f}'.format(fit.slope))\n",
    "\n",
    "_, ax = plt.subplots(1, 1, figsize=(8, 7))\n",
    "\n",
    "ax.plot(x, y, 'o', alpha=0.03);\n",
    "xL = ax.get_xlim()\n",
    "ax.plot(xL, [fit.intercept + fit.slope*x_ for x_ in xL], 'k--', linewidth=5);\n",
    "ax.set_xlim(xL);\n",
    "\n",
    "xs = np.sort(np.unique(x))\n",
    "ys = np.zeros_like(xs, dtype=np.float)\n",
    "for i, x_ in enumerate(xs):\n",
    "    ys[i] = np.nanmean(y[x==x_])\n",
    "\n",
    "ax.plot(xs, ys, 'ro-', markersize=13);\n",
    "\n",
    "ax.set_ylim([0, 2*np.nanmax(ys)]);\n",
    "ax.set_xlabel('Policy ({})'.format(STRINGENCY_POLICY_KEY), fontsize=16);\n",
    "ax.set_ylabel('Estimated Beta ({} days later)'.format(BETA_DELAY), fontsize=16);\n",
    "ax.set_title('Effect of Policy on Transmission Rate', fontsize=20);\n",
    "ax.grid(b=True, axis='y');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This code sets up the regularized, State-by-State linear regression. It separates data for training/validation and defines the fitting procedure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def shift_datestring(dstring, delta):\n",
    "    return datetime.strftime(\n",
    "        datetime.strptime(dstring, DATE_FORMAT) + timedelta(delta), DATE_FORMAT\n",
    "    )\n",
    "\n",
    "x_train_d0 = '2020-01-01'\n",
    "x_train_dT = fitted_params_dict['settings']['LAST_DATE_IN_TRAIN_SET']\n",
    "x_val_dT   = fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "y_train_d0 = shift_datestring(x_train_d0, BETA_DELAY)\n",
    "y_train_dT = shift_datestring(x_train_dT, BETA_DELAY)\n",
    "y_val_dT   = shift_datestring(x_val_dT,   BETA_DELAY)\n",
    "\n",
    "x_data = policy_df[x_train_d0:x_train_dT].values.T\n",
    "y_data = beta_df[y_train_d0:y_train_dT].values.T\n",
    "\n",
    "x_data_val = policy_df[x_train_dT:x_val_dT].values.T\n",
    "y_data_val = beta_df[y_train_dT:y_val_dT].values.T\n",
    "\n",
    "n_states = x_data.shape[0]\n",
    "\n",
    "def predict_fn(x, weights):\n",
    "    slopes = weights[:n_states, None]\n",
    "    intercepts = weights[n_states:, None]\n",
    "    return x*slopes + intercepts\n",
    "\n",
    "def loss_fn(weights, w_sse_lambda):    \n",
    "    y_hat = predict_fn(x_data, weights)\n",
    "    y_sse = np.nansum((y_data - y_hat)**2)\n",
    "    \n",
    "    slopes = weights[:n_states]\n",
    "    intercepts = weights[n_states:]\n",
    "    s_sse = np.sum((slopes - np.mean(slopes))**2)\n",
    "    i_sse = np.sum((intercepts - np.mean(intercepts))**2)\n",
    "    \n",
    "    return y_sse + w_sse_lambda*(s_sse*np.nanmean(x_data) + i_sse)\n",
    "    \n",
    "def do_fit(w_sse_lambda):\n",
    "    w_bounds = [(None, 0)] * n_states\n",
    "    i_bounds = [(0, None)] * n_states\n",
    "    res = minimize(\n",
    "        loss_fn,\n",
    "        np.zeros(n_states * 2),\n",
    "        args=(w_sse_lambda),\n",
    "        bounds=w_bounds + i_bounds,\n",
    "    )\n",
    "\n",
    "    weights = res.x\n",
    "    slopes = weights[:n_states]\n",
    "    intercepts = weights[n_states:]\n",
    "    y_hat = predict_fn(x_data, weights)\n",
    "    \n",
    "    return weights, slopes, intercepts, y_hat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This code gives us a look at the fit parameters/quality for different regularization levels (lambda).\n",
    "\n",
    "Notice how stronger regularization leads to more State-by-State similarity across the fit parameters. This also leads to slightly worse training $r^2$ and slightly better validation $r^2$, as one would typically expect.\n",
    "\n",
    "For the default data split, validation $r^2$ is poor, but, as described above, we are more interested in fitting the *SIR model* (not just fitting betas). So, calibration settings should focus on how well the simulation predicts real-world COVID-19 deaths given the real-world policy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, ax = plt.subplots(1, 1, figsize=(16, 6))\n",
    "_, axes = plt.subplots(2, 2, figsize=(16, 8))\n",
    "ax0, ax1, ax2, ax3 = axes.flatten()\n",
    "\n",
    "ax.plot(y_data.flatten(), label='Real Data');\n",
    "\n",
    "# (regularization_amount, plot_color, plot_style)\n",
    "plot_specs = [\n",
    "    ( 0.0, 'r', ':'),\n",
    "    ( 5.0, 'g', '--'),\n",
    "    (10.0, 'c', '-.'),\n",
    "]\n",
    "for LAMBDA, color, linestyle in plot_specs:\n",
    "    weights, slopes, intercepts, y_hat = do_fit(w_sse_lambda=LAMBDA)\n",
    "    y_hat_val = predict_fn(x_data_val, weights)\n",
    "    print('lambda = {:5.1f},   r^2 = {:5.3f},   r^2 (val) = {:5.3f}'.format(\n",
    "        LAMBDA,\n",
    "        1 - np.nanvar((y_hat - y_data)) / np.nanvar(y_data),\n",
    "        1 - np.nanvar((y_hat_val - y_data_val)) / np.nanvar(y_data_val)\n",
    "    ))\n",
    "    \n",
    "    ax.plot(y_hat.flatten(), color=color, linestyle=linestyle, label='Predicted (lambda={})'.format(LAMBDA));\n",
    "    ax0.plot(slopes, color=color, linestyle=linestyle, label='lambda={}'.format(LAMBDA));\n",
    "    ax1.plot(intercepts, color=color, linestyle=linestyle, label='lambda={}'.format(LAMBDA));\n",
    "    ax2.plot(intercepts+slopes, color=color, linestyle=linestyle, label='lambda={}'.format(LAMBDA));\n",
    "    ax3.plot(y_data.flatten(), y_hat.flatten(), 'o', color=color, alpha=0.1, label='lambda={}'.format(LAMBDA));\n",
    "\n",
    "LIM = [0, 0.3]\n",
    "ax3.plot(LIM, LIM, 'k--');\n",
    "ax3.set_xlim(LIM);\n",
    "ax3.set_ylim(LIM);\n",
    "\n",
    "for ax_ in [ax, ax0, ax1, ax2, ax3]:\n",
    "    ax_.legend();\n",
    "    \n",
    "ax.set_ylabel('Beta');\n",
    "ax0.set_ylabel('Slope');\n",
    "ax1.set_ylabel('Intercept');\n",
    "ax2.set_ylabel('Beta @ stringency=1');\n",
    "ax3.set_ylabel('Predicted Beta');\n",
    "\n",
    "ax0.set_xlabel('State Index');\n",
    "ax1.set_xlabel('State Index');\n",
    "ax2.set_xlabel('State Index');\n",
    "ax3.set_xlabel('Actual Beta');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform the actual calibration fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (if you want to tweak the regularization setting, you can do so here)\n",
    "# fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_SIR'] = ...\n",
    "\n",
    "_, slopes, intercepts, y_hat = do_fit(\n",
    "    w_sse_lambda=fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_SIR']\n",
    ")\n",
    "\n",
    "# Update the calibration fit dictionary\n",
    "fitted_params_dict.update(\n",
    "    {\n",
    "        \"BETA_DELAY\": BETA_DELAY,\n",
    "        \"BETA_SLOPES\": slopes.tolist(),\n",
    "        \"BETA_INTERCEPTS\": intercepts.tolist()\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Unemployment Fits\n",
    "\n",
    "Here, we calibrate the **unemployment** model.\n",
    "\n",
    "The simulation models unemployment using a time-series model that predicts a State’s unemployment rate based on its history of stringency level increases and decreases.\n",
    "\n",
    "Each change triggers a combination of exponentially-decaying unemployment responses. We learn a shared set of those responses by learning a set of 5 decay lambdas. Each State separately fits its own model that says strongly each of those underlying responses is triggered by a change in stringency.\n",
    "\n",
    "In math, the unemployment model looks like this:\n",
    "\n",
    "$$\n",
    "\\tilde{U}_{i,t}^k = \\sum_{t'=t-L}^{t'=t} e^{\\frac{t'-t}{\\lambda_k}} \\cdot \\Delta \\pi_{i, t'}, \\\\\n",
    "U_{i,t} = \\texttt{softplus}\\left( \\sum_k w_{i, k} \\cdot \\tilde{U}_{i,t}^k \\right) + U^0_i.\n",
    "$$\n",
    "\n",
    "$\\tilde{U}_{i,t}^k$ describes the unemployment driven by the $k$-th component of the response. $\\lambda_k$ is the decay constant for that component. $L$ is the total length of the response filters. And $\\Delta \\pi_{i,t}$ is how much the stringency changed in State $i$ on time $t$.\n",
    "\n",
    "$U_{i,t}$ is the actual unemployment according to the model. It adds baseline unemployment $U^0_i$ to excess unemployment coming from the policy choices. Excess unemployment is a State-specific weighted sum of each $\\tilde{U}_{i,t}^k$, with weights $w_{i,k}$, passed through a $\\texttt{softplus}$ function so that excess unemployment is $\\geq 0$.\n",
    "\n",
    "Similar to the way we fit beta (above), we regularize State-to-State variability in $w_{i,k}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the subset of the data we will use for calibrating the unemployment model\n",
    "unemployment_df = dataframes[\"unemployment\"]\n",
    "policy_df = dataframes[\"policy\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model set-up\n",
    "\n",
    "For a model like this, we take advantage of the tools available through PyTorch. In particular, we instantiate the unemployment model as a PyTorch module, and write a simple wrapper to handle the data feeding, parameter updates, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SharedConvUnemp(nn.Module):\n",
    "    \"\"\"\n",
    "    The unemployment model.\n",
    "    Given a history of stringency changes, predicts current unemployment.\n",
    "    Embeds the full set of shared and State-specific model weights.\n",
    "    \"\"\"\n",
    "    def __init__(\n",
    "        self, \n",
    "        x_dim,\n",
    "        n_filters,\n",
    "        filter_size,\n",
    "        n_states=51,\n",
    "        signal_bias=True,\n",
    "        initial_lambda_guess=None,\n",
    "    ):\n",
    "        super().__init__()\n",
    "        self.x_dim = int(x_dim)\n",
    "        self.n_filters = int(n_filters)\n",
    "        self.filter_size = int(filter_size)\n",
    "        self.n_states = int(n_states)\n",
    "        \n",
    "        # We can use grouped 1D convolution to do state-specific projection of x --> signal\n",
    "        #  Expects input of size [1, x_dim*n_states, time+filter_size] (+filter_size implies pre-padding)\n",
    "        #  Output has size [1, n_filters*n_states, time+filter_size]\n",
    "        self.signal = nn.Conv1d(\n",
    "            in_channels=self.n_states*self.x_dim,\n",
    "            out_channels=self.n_states*self.n_filters,\n",
    "            kernel_size=1,\n",
    "            groups=self.n_states,\n",
    "            bias=bool(signal_bias),\n",
    "        )\n",
    "        \n",
    "        # Convolve the learned \"signal\" with a shared bank of exponential filters w/ learnable lambdas\n",
    "        #  Expects input of size [n_states, n_filters, time+filter_size] (+filter_size implies pre-padding)\n",
    "        #  Output has size [n_states, 1, time+filter_size] (here, +filter_size comes from internal padding)\n",
    "        self.conv_lambdas = nn.Parameter(initial_lambda_guess,\n",
    "            requires_grad=True\n",
    "        )\n",
    "        self.f_ts = torch.tile(\n",
    "            torch.flip(torch.arange(self.filter_size), (0,))[None, None],\n",
    "            (1, self.n_filters, 1)\n",
    "        )\n",
    "        \n",
    "        # State-specific unemployment offsets\n",
    "        self.unemp_bias = nn.Parameter(\n",
    "            torch.tensor(np.ones(self.n_states, dtype=np.float32)*3.5),\n",
    "            requires_grad=True\n",
    "        )\n",
    "    \n",
    "    def get_similarity_regularization_loss(self):\n",
    "        w = self.signal.weight.flatten().view(self.n_states, -1)\n",
    "        dev = w - w.mean(0, keepdim=True)\n",
    "        mse = torch.pow(dev, 2).mean()\n",
    "        return mse\n",
    "        \n",
    "    def x2signal(self, x):\n",
    "        # Assume input is [n_states, x_dim, time+filter_size], reshape to size expected by self.signal\n",
    "        x = x.reshape(1, self.n_states*self.x_dim, -1)\n",
    "        signal = self.signal(x)\n",
    "        # Output should be returned to [n_states, n_filters, time+filter_size]\n",
    "        signal = signal.reshape(self.n_states, self.n_filters, -1)\n",
    "        return signal\n",
    "        \n",
    "    def signal2unemp(self, signal):\n",
    "        # Assume input is [n_states, n_filters, time+filter_size]\n",
    "        conv_filters = torch.exp(-self.f_ts / self.conv_lambdas[None, :, None])\n",
    "        unemp = nn.functional.conv1d(signal, conv_filters, padding=self.filter_size-1)\n",
    "            \n",
    "        # Output should be returned as [n_states, time] (i.e. we remove padded outputs)\n",
    "        unemp = unemp.reshape(self.n_states, -1)[:, self.filter_size:-(self.filter_size-1)]\n",
    "        \n",
    "        # Soft clipping + baseline unemployment\n",
    "        return nn.functional.softplus(unemp, beta=1) + self.unemp_bias[:, None]\n",
    "        \n",
    "    def forward(self, x):\n",
    "        signal = self.x2signal(x)\n",
    "        unemp = self.signal2unemp(signal)\n",
    "        return unemp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SharedConvUnempFitter:\n",
    "    \"\"\"\n",
    "    Wrapper to handle the data feeding and training of the actual unemployment model.\n",
    "    \"\"\"\n",
    "    def __init__(self,\n",
    "                 policy_df=None,\n",
    "                 unemployment_df=None,\n",
    "                 lr=0.01, similarity_regularization_coeff=0.0,\n",
    "                 filter_size=600, lambdas=np.array([30, 60, 130, 260, 540])\n",
    "                 ):\n",
    "        assert unemployment_df is not None\n",
    "        \n",
    "        self.n_filters = len(lambdas)\n",
    "        self.filter_size = int(filter_size)\n",
    "        self.similarity_regularization_coeff = float(similarity_regularization_coeff)\n",
    "        self.last_training_time_index = unemployment_df[\n",
    "            :fitted_params_dict['settings']['LAST_DATE_IN_TRAIN_SET']\n",
    "        ].shape[0]\n",
    "        \n",
    "        # Use this to crop out nans\n",
    "        keep = np.logical_not(np.isnan(unemployment_df[\n",
    "            :fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "        ].values.T[0]))\n",
    "        \n",
    "        # Set up the data\n",
    "        self.x_data, self.x_th = self.preprocess_policy(policy_df[\n",
    "            :fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "        ].values[keep].T)\n",
    "        \n",
    "        self.y_data = unemployment_df[:fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']].values[keep].T\n",
    "        self.y_th = torch.from_numpy(self.y_data.astype(np.float32))\n",
    "        \n",
    "        # Crop out any nan region\n",
    "        \n",
    "        # Create the model\n",
    "        self.x_dim = self.x_data.shape[1]\n",
    "        self.model = SharedConvUnemp(\n",
    "            self.x_dim, self.n_filters, self.filter_size, signal_bias=False,\n",
    "            initial_lambda_guess=torch.tensor(lambdas.astype(np.float32))\n",
    "        )\n",
    "        \n",
    "        # Loss\n",
    "        self.loss = nn.MSELoss()\n",
    "        self.train_loss_history = []\n",
    "        self.val_loss_history = []\n",
    "        \n",
    "        # Create the optimizer\n",
    "        self.optim = torch.optim.Adam(self.model.parameters(), lr=float(lr))\n",
    "        \n",
    "    def preprocess_policy(self, raw_policy_data):\n",
    "        # Expects policy data size is [n_states, t]\n",
    "        pad_pol = np.pad(raw_policy_data, [(0, 0), (self.filter_size, 0)], constant_values=1)\n",
    "        \n",
    "        dpad = np.zeros_like(pad_pol)\n",
    "        dpad[:, 1:] = pad_pol[:, 1:] - pad_pol[:, :-1]\n",
    "\n",
    "        x_data = dpad[None]\n",
    "        \n",
    "        x_data = x_data.transpose(1, 0, 2)\n",
    "        x_th = torch.from_numpy(x_data.astype(np.float32))\n",
    "        \n",
    "        return x_data, x_th\n",
    "        \n",
    "    def predict(self, numpy=True):\n",
    "        y_hat = self.model(self.x_th.detach())\n",
    "        if numpy:\n",
    "            y_hat = y_hat.data.numpy()\n",
    "        return y_hat\n",
    "        \n",
    "    def get_train_loss(self):\n",
    "        y_hat = self.predict(numpy=False)\n",
    "        y_hat_train = y_hat[:, :self.last_training_time_index]\n",
    "        y_train = self.y_th[:, :self.last_training_time_index]\n",
    "        loss = self.loss(y_hat_train, y_train)\n",
    "        return loss\n",
    "    \n",
    "    def get_val_loss(self):\n",
    "        y_hat = self.predict(numpy=False)\n",
    "        y_hat_val = y_hat[:, self.last_training_time_index:]\n",
    "        y_val = self.y_th[:, self.last_training_time_index:]\n",
    "        loss = self.loss(y_hat_val, y_val)\n",
    "        return loss\n",
    "    \n",
    "    def get_losses(self):\n",
    "        y_hat = self.predict(numpy=False)\n",
    "        \n",
    "        y_hat_train = y_hat[:, :self.last_training_time_index]\n",
    "        y_train = self.y_th[:, :self.last_training_time_index]\n",
    "        train_loss = self.loss(y_hat_train, y_train)\n",
    "        \n",
    "        y_hat_val = y_hat[:, self.last_training_time_index:]\n",
    "        y_val = self.y_th[:, self.last_training_time_index:]\n",
    "        val_loss = self.loss(y_hat_val, y_val)\n",
    "        \n",
    "        return train_loss, val_loss\n",
    "        \n",
    "    def update(self):\n",
    "        self.optim.zero_grad()\n",
    "        \n",
    "        # Get the training and val losses\n",
    "        train_loss, val_loss = self.get_losses()\n",
    "        self.train_loss_history.append(float(train_loss))\n",
    "        self.val_loss_history.append(float(val_loss))\n",
    "        \n",
    "        # Add any similarity regularization loss\n",
    "        if self.similarity_regularization_coeff:\n",
    "            similarity_loss = self.model.get_similarity_regularization_loss()\n",
    "            train_loss = train_loss + self.similarity_regularization_coeff*similarity_loss\n",
    "        \n",
    "        # Update\n",
    "        train_loss.backward()\n",
    "        self.optim.step()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform the actual calibration fit\n",
    "\n",
    "Initialize and fit the unemployment model (this will likely take a few minutes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# (if you want to tweak the regularization setting, you can do so here)\n",
    "# fitted_params_dict['settings']['FILTER_SIZE_UNEMPLOYMENT'] = ...\n",
    "# fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_UNEMPLOYMENT'] = ...\n",
    "\n",
    "unemployment_fitter = SharedConvUnempFitter(\n",
    "    policy_df=policy_df,\n",
    "    unemployment_df=unemployment_df,\n",
    "    filter_size=fitted_params_dict['settings']['FILTER_SIZE_UNEMPLOYMENT'],\n",
    "    similarity_regularization_coeff=fitted_params_dict['settings']['SIMILARITY_REGULARIZATION_UNEMPLOYMENT'],\n",
    "    lambdas=np.logspace(np.log10(30), np.log10(540), 5),\n",
    "    lr=0.01,\n",
    ")\n",
    "\n",
    "# Recommend training for 350 steps with lr=0.01 and similarity regularization=0.5\n",
    "for _ in tqdm(range(350)):\n",
    "    unemployment_fitter.update()\n",
    "\n",
    "_, ax = plt.subplots(1, 1, figsize=(12, 5))\n",
    "ax.plot(unemployment_fitter.train_loss_history, label='Training');\n",
    "ax.plot(unemployment_fitter.val_loss_history, label='Validation');\n",
    "ax.set_ylabel('Loss (MSE)');\n",
    "ax.set_xlabel('Training Steps');\n",
    "ax.legend();\n",
    "ax.set_ylim(bottom=0);\n",
    "ax.grid(b=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How well does the predicted US unemployment match the real data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, ax = plt.subplots(1, 1, figsize=(12, 5))\n",
    "\n",
    "y_hat = unemployment_fitter.predict().mean(0)\n",
    "t = np.arange(len(y_hat))\n",
    "\n",
    "ax.plot(t, unemployment_fitter.y_data.mean(0), 'b', label='Real Data');\n",
    "ax.plot(\n",
    "    t[:unemployment_fitter.last_training_time_index],\n",
    "    y_hat[:unemployment_fitter.last_training_time_index],\n",
    "    'r-', label='Predicted (Train)'\n",
    ");\n",
    "ax.plot(\n",
    "    t[unemployment_fitter.last_training_time_index:],\n",
    "    y_hat[unemployment_fitter.last_training_time_index:],\n",
    "    'g-', label='Predicted (Val)'\n",
    ");\n",
    "ax.grid(b=True);\n",
    "\n",
    "ax.set_xlabel('Time (days)', fontsize=16);\n",
    "ax.set_ylabel('Avg. State Unemployment Rate (%)', fontsize=16);\n",
    "ax.set_title('Real vs. Predicted Unemployment', fontsize=20);\n",
    "ax.legend(fontsize=20);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How well does the predicted unemployment match the real data -- *for each state*?\n",
    "\n",
    "**WARNING: Make sure the unemployment fits above look reasonable for all states before saving!** If not, re-run the fits above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Note, since there are 51 \"states\", this won't plot Wyoming.\n",
    "_, axes = plt.subplots(10, 5, figsize=(16, 40), sharey=True, sharex=True)\n",
    "axes = axes.flatten()\n",
    "\n",
    "y_hat = unemployment_fitter.predict()\n",
    "t = np.arange(y_hat.shape[1])\n",
    "for IDX, ax in enumerate(axes):\n",
    "    ax.plot(t, unemployment_fitter.y_data[IDX], 'b-', label='Real Data');\n",
    "    ax.plot(\n",
    "        t[:unemployment_fitter.last_training_time_index],\n",
    "        y_hat[IDX, :unemployment_fitter.last_training_time_index], \n",
    "        'r-', label='Predicted (Train)'\n",
    "    );\n",
    "    ax.plot(\n",
    "        t[unemployment_fitter.last_training_time_index:],\n",
    "        y_hat[IDX, unemployment_fitter.last_training_time_index:],\n",
    "        'g-', label='Predicted (Val)'\n",
    "    );\n",
    "    ax.grid(b=True, axis='y');\n",
    "    ax.set_title(unemployment_df.columns[IDX]);\n",
    "    if ax.is_first_col():\n",
    "        ax.set_ylabel('Unemployment Rate (%)');\n",
    "        ax.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## When satisfied with the fits, export them as part of the calibration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update fitted_params_dict\n",
    "# Note: we cast some arrays as np.float64 in order to be able to write out to a json file\n",
    "fitted_params_dict.update(\n",
    "    {\n",
    "        \"POLICY_START_DATE\": datetime.strftime(policy_df.index[0], format=DATE_FORMAT),\n",
    "        \"FILTER_LEN\": unemployment_fitter.filter_size,\n",
    "        \"CONV_LAMBDAS\": [float(x) for x in unemployment_fitter.model.conv_lambdas.data.numpy()],\n",
    "        \"UNEMPLOYMENT_BIAS\": [float(x) for x in unemployment_fitter.model.unemp_bias.data.numpy()],\n",
    "        \"GROUPED_CONVOLUTIONAL_FILTER_WEIGHTS\": unemployment_fitter.model.signal.weight.data.numpy().tolist()\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Social Welfare\n",
    "\n",
    "The simulator tracks **social welfare**, which is the metric that we want to maximize.\n",
    "Our simulation defines social welfare as  \n",
    "$W = \\alpha H + (1-\\alpha)E$  \n",
    "where $H$ and $E$ are indices representing the health outcome ($H$ decreases with COVID deaths) and economic outcome ($E$ decreases with lost GDP). \n",
    "$\\alpha$ is a weighting term (between 0 and 1).\n",
    "\n",
    "We don't know what the actual goals/priorities were of real-world policymakers, but we want to calibrate $\\alpha$ (separately for each US State) so that the resulting definition of social welfare is most consistent with public health policy decisions made in the real world.\n",
    "\n",
    "*There is no definitive way to extract these priorities from the data. Our method is just one possible choice.*\n",
    "\n",
    "The goal of this fitting procedure is to estimate what the outcomes would have been if only interested in either health (minimizing deaths) or in the economy (minimizing GDP loss), and then to compare these against the real-world policy outcomes in order to estimate the underlying policy priorities.\n",
    "\n",
    "Another way to look at this is that we want to first measure the *shape* of the health/economy trade-off curve for each State. Then, we want to back out the health/economy prioritization (i.e. $\\alpha_i$) that would make the outcomes under the real-world policy preferable to all other outcomes along that trade-off curve.\n",
    "\n",
    "**We'll go into specifics as we proceed below...**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** this procedure also gives us the numbers we need to *normalize* $H$ and $E$ so that the Health and Economic Indices have a meaningful scale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The env requires the fitted params to run, and we require the env to calibrate the fitted params.\n",
    "# Save some placeholders, which we'll update after calibration, then re-save.\n",
    "fitted_params_dict.update(\n",
    "    {\n",
    "        \"VALUE_OF_LIFE\": 10000000,\n",
    "        \"INFERRED_WEIGHTAGE_ON_AGENT_HEALTH_INDEX\": [0.5]*51,\n",
    "        \"INFERRED_WEIGHTAGE_ON_PLANNER_HEALTH_INDEX\": 0.5,\n",
    "        \"MAX_MARGINAL_AGENT_ECONOMIC_INDEX\": [1]*51,\n",
    "        \"MAX_MARGINAL_PLANNER_ECONOMIC_INDEX\": 1,\n",
    "        \"MAX_MARGINAL_AGENT_HEALTH_INDEX\": [1]*51,\n",
    "        \"MAX_MARGINAL_PLANNER_HEALTH_INDEX\": 1,\n",
    "        \"MIN_MARGINAL_AGENT_ECONOMIC_INDEX\": [0]*51,\n",
    "        \"MIN_MARGINAL_PLANNER_ECONOMIC_INDEX\": 0,\n",
    "        \"MIN_MARGINAL_AGENT_HEALTH_INDEX\": [0]*51,\n",
    "        \"MIN_MARGINAL_PLANNER_HEALTH_INDEX\": 0,\n",
    "    }\n",
    ")\n",
    "with open(os.path.join(data_dir, fitted_params_filename), \"w\") as fp: \n",
    "    json.dump(fitted_params_dict, fp)\n",
    "\n",
    "# Define the configuration of the environment that will be built\n",
    "N_ALPHA_CALIBRATION_DAYS = (\n",
    "    datetime.strptime(\n",
    "        fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET'], DATE_FORMAT\n",
    "    ) - datetime.strptime(\n",
    "        fitted_params_dict['settings']['env']['start_date'], DATE_FORMAT\n",
    "    )\n",
    ").days\n",
    "\n",
    "env_config = {\n",
    "    \"collate_agent_step_and_reset_data\": True,\n",
    "    \"scenario_name\": \"CovidAndEconomySimulation\",\n",
    "    \"path_to_data_and_fitted_params\": data_dir,\n",
    "    \n",
    "    \"components\": [\n",
    "        {\"ControlUSStateOpenCloseStatus\": {\n",
    "            \"action_cooldown_period\": 28\n",
    "        }},\n",
    "        {\"FederalGovernmentSubsidy\": {\n",
    "            \"num_subsidy_levels\": 20,\n",
    "            \"subsidy_interval\": 90,\n",
    "            \"max_annual_subsidy_per_person\": 20000,\n",
    "        }},\n",
    "        {\"VaccinationCampaign\": {\n",
    "            \"daily_vaccines_per_million_people\": 3000,\n",
    "            \"delivery_interval\": 1,\n",
    "            \"vaccine_delivery_start_date\": \"2021-01-12\",\n",
    "        }},\n",
    "    ],\n",
    "    \"flatten_masks\": False,\n",
    "    \"flatten_observations\": False,\n",
    "    \"health_priority_scaling_agents\": 1.0,\n",
    "    \"health_priority_scaling_planner\": 1.0,\n",
    "    \"multi_action_mode_agents\": False,\n",
    "    \"multi_action_mode_planner\": False,\n",
    "    \"world_size\": [1, 1],\n",
    "    \"n_agents\": 51,\n",
    "    \"episode_length\": N_ALPHA_CALIBRATION_DAYS,\n",
    "}\n",
    "   \n",
    "# NOTE!!!!\n",
    "# The calibration will be specific to these choices!\n",
    "# Downstream environments that use this calibration should also use these parameters!\n",
    "env_config.update(fitted_params_dict['settings']['env'])\n",
    "\n",
    "# Build the environment from this partially-finished calibration.\n",
    "uncalibrated_env = ai_economist.foundation.make_env_instance(**env_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To estimate $\\hat{\\alpha}_i$ (the hat symbol denotes that this is an estimate), we first collect simulated health/economic outcomes under 3 policies: the actual stringency choices, minimum stringency, and maximum stringency.\n",
    "\n",
    "\n",
    "We use these outcomes to estimate the Pareto frontier in the $\\left( H_i, E_i \\right)$ coordinate space.\n",
    "\n",
    "\n",
    "By definition, the $\\left( H_i, E_i \\right)$ coordinates for the minimum- and maximum-stringency policies define the endpoints of this frontier, at $\\left(0, 1\\right)$ and $\\left(1, 0\\right)$, respectively.  \n",
    "(This definition comes from how we normalize the health/economic indices -- that is, they are scaled to reflect the range of outcomes spanning the minimum- and maximum-stringency policies.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Collect the outcomes under the actual policies and 2 extremes: fully-closed and fully-open\n",
    "\n",
    "index_results = {}\n",
    "\n",
    "for p in ['closed', 'open', 'actual']:\n",
    "\n",
    "    uncalibrated_env.reset();\n",
    "    for _ in range(uncalibrated_env.episode_length):\n",
    "        if p == 'actual':\n",
    "            t_str = uncalibrated_env.current_date.strftime(DATE_FORMAT)\n",
    "            actions = {\n",
    "                str(idx): policy_df[state][t_str]\n",
    "                for idx, state in uncalibrated_env.us_state_idx_to_state_name.items()\n",
    "            }\n",
    "        \n",
    "        elif p == 'closed':\n",
    "            actions = {str(idx): 10 for idx in range(51)}\n",
    "        \n",
    "        elif p == 'open':\n",
    "            actions = {str(idx): 1 for idx in range(51)}\n",
    "        \n",
    "        else:\n",
    "            raise NotImplementedError\n",
    "\n",
    "        uncalibrated_env.step(actions);\n",
    "\n",
    "    health_and_economic_indices = {}\n",
    "    for agent in uncalibrated_env.all_agents:\n",
    "        health_and_economic_indices[agent.idx] = (\n",
    "            float(agent.state[\"Health Index\"] / uncalibrated_env.episode_length), \n",
    "            float(agent.state[\"Economic Index\"] / uncalibrated_env.episode_length),\n",
    "        )\n",
    "    \n",
    "    index_results[p] = health_and_economic_indices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We assume the Pareto frontier for State $i$ has form $E_i = (1-H_i)^{x_i}$ and that the coordinates associated with the actual-stringency policy are found along this frontier.\n",
    "\n",
    "We set the shape parameter $x_i$ based on this latter assumption, and take $\\hat{\\alpha}_i$ as the value that maximizes social welfare along the estimated Pareto frontier:\n",
    "$$\n",
    "\\hat{\\alpha_i} = \\max_{\\alpha_i} \\alpha_i H_i^{\\pi} + \\left(1-\\alpha_i\\right) E_i^{\\pi} =\n",
    "\\max_{\\alpha_i} \\alpha_i H_i^{\\pi} + \\left(1-\\alpha_i \\right)\\left(1- \\left(1-H_i^{\\pi}\\right)^{x_i} \\right),\n",
    "$$\n",
    "\n",
    "where the Economic Index $E_i^{\\pi}$ and Health Index $H_i^{\\pi}$ values are obtained from running the policies $\\pi$ in the simulation.\n",
    "\n",
    "\n",
    "In other words, given the estimate of the Pareto frontier, we find the $\\hat{\\alpha}_i$ that best rationalizes the outcomes obtained under the actual policy, i.e. the $\\hat{\\alpha}_i$ under which these outcomes ($E_i^{\\pi}$  and $H_i^{\\pi}$) are considered optimal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This function implements the process described above and adds some plotting so we can visualize things\n",
    "\n",
    "def estimate_alpha_and_plot_rew_examples(state_idx, do_plot=True, ax=None):\n",
    "        \n",
    "    act_h, act_e = index_results[\"actual\"][state_idx]  # actual health index, actual economic index\n",
    "    max_h, min_e = index_results[\"closed\"][state_idx]  # max health index, min economic index\n",
    "    min_h, max_e = index_results[\"open\"][state_idx]    # min health index, max economic index\n",
    "\n",
    "    norm_idx_pairs = []\n",
    "    for index_dict in index_results.values():\n",
    "        h_index, e_index = index_dict[state_idx]\n",
    "        nh = (h_index-min_h)/(max_h - min_h)\n",
    "        ne = (e_index-min_e)/(max_e - min_e)\n",
    "        norm_idx_pairs.append([nh, ne])\n",
    "            \n",
    "    # Split out the normalized health / economic indices\n",
    "    norm_idx_pairs = np.array(norm_idx_pairs)\n",
    "    nhs = norm_idx_pairs[:, 0]\n",
    "    nes = norm_idx_pairs[:, 1]\n",
    "\n",
    "    # We assume the coordinates along the pareto curve have this form:\n",
    "    #    (h, e) = (h, (1-h)**pwr)\n",
    "    # Fit the power terms of the estimated pareto curve\n",
    "    def loss_fn(pwr):\n",
    "        nes_hat = (1-(nhs**pwr))**(1/pwr)\n",
    "        return np.sum((nes_hat - nes)**2)\n",
    "    res = minimize(\n",
    "        fun=loss_fn,\n",
    "        x0=2,\n",
    "        bounds=[(1.001, None)],\n",
    "    )\n",
    "    pwr = res.x\n",
    "    \n",
    "    # Given the supplied or fit powers, produce the estimated pareto curve (the set of hs/es coordinates)\n",
    "    policies = np.linspace(0, 1, 1001)\n",
    "    hs = policies**(1/pwr)\n",
    "    es = (1-policies)**(1/pwr)\n",
    "    \n",
    "    # Find the alpha such that the optimal nh/ne coordinate is closest to the ACTUAL policy nh/ne coordinate\n",
    "    nh = (act_h-min_h)/(max_h-min_h)\n",
    "    ne = (act_e-min_e)/(max_e-min_e)\n",
    "    alphas = np.linspace(0, 1, 1001)\n",
    "    d_opt2actual = []\n",
    "    # For each possible alpha ...\n",
    "    for alpha in alphas:\n",
    "        # ... find the optimal nh/ne coordinate for this alpha ...\n",
    "        opt_nh_ne_index = np.argmax(alpha*hs + (1-alpha)*es)\n",
    "        opt_nh = hs[opt_nh_ne_index]\n",
    "        opt_ne = es[opt_nh_ne_index]\n",
    "        # ... and store its distance to the ACTUAL h/e coordinate.\n",
    "        d = np.sqrt(((nh-opt_nh)**2) + (ne-opt_ne)**2)\n",
    "        d_opt2actual.append(d)\n",
    "    # The inferred alpha is that where the distance measured above is lowest\n",
    "    alpha = float(alphas[np.argmin(d_opt2actual)])\n",
    "        \n",
    "    if not do_plot:\n",
    "        return alpha\n",
    "    \n",
    "    if ax is None:\n",
    "        _, ax = plt.subplots(1, 1, figsize=(6, 6))\n",
    "    \n",
    "    # Plot a reward heatmap for the given alpha\n",
    "    h_full, e_full = np.meshgrid(np.linspace(0, 1, 101), np.linspace(0, 1, 101))\n",
    "    r_full = alpha*h_full + (1-alpha)*e_full\n",
    "    ax.imshow(r_full, aspect='auto', origin='lower')\n",
    "    if ax.is_last_row():\n",
    "        ax.set_xlabel('Normalized Health Index');\n",
    "    if ax.is_first_col():\n",
    "        ax.set_ylabel('Normalized Economic Index');\n",
    "    \n",
    "    # Add the observed h/e coordinates from actual policies\n",
    "    for nh, ne in norm_idx_pairs:\n",
    "        ax.plot(nh*100, ne*100, 'bo', markersize=12);\n",
    "            \n",
    "    # Add the estimated pareto boundary\n",
    "    ax.plot(hs*100, es*100, 'w');\n",
    "    \n",
    "    # Mark the optimal point along the boundary for the given alpha\n",
    "    rs = alpha*hs + (1-alpha)*es\n",
    "    ax.plot(hs[rs.argmax()]*100, es[rs.argmax()]*100, 'o', markerfacecolor=\"None\", \n",
    "            markersize=8, markeredgecolor='red', markeredgewidth=2);\n",
    "    \n",
    "    ax.set_title('{}; alpha={:4.2f}'.format(\n",
    "        'USA' if state_idx=='p' else uncalibrated_env.us_state_idx_to_state_name[str(state_idx)],\n",
    "        alpha,\n",
    "    ))\n",
    "    \n",
    "    ax.set_xticklabels(\n",
    "        [\"{:3.2f}\".format(x/100).rstrip('0').rstrip('.') for x in ax.get_xticks()]\n",
    "    )\n",
    "    ax.set_yticklabels(\n",
    "        [\"{:3.2f}\".format(y/100).rstrip('0').rstrip('.') for y in ax.get_yticks()]\n",
    "    )\n",
    "    \n",
    "    return alpha"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Here are some visualizations to make things more intuitive...\n",
    "\n",
    "Blue dots are normalized outcomes from the actual policy, and the fully-closed and fully-open policies.\n",
    "\n",
    "The white line connecting these dots estimates the pareto frontier.\n",
    "\n",
    "The red circle is the best outcome on the pareto frontier for the estimated alpha.\n",
    "\n",
    "The estimated alpha is the one that places the red circle closest to the actual-policy blue dot.\n",
    "\n",
    "The heatmap background shows the reward at each coordinate (given $\\hat{\\alpha}_i$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Single-State plot so things are easier to see.\n",
    "alpha = estimate_alpha_and_plot_rew_examples(state_idx=50);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Now, for all rest of the States!\n",
    "\n",
    "_, axes = plt.subplots(10, 5, figsize=(16, 35), sharex=True, sharey=True)\n",
    "\n",
    "for i, ax in enumerate(axes.flatten()):\n",
    "    alpha = estimate_alpha_and_plot_rew_examples(state_idx=i, ax=ax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform the actual calibration fit\n",
    "\n",
    "At the end of this step, we have finalized the fitted_params_dict and are ready to export the final calibration!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The fully-closed and fully-open policies give us coordinates for normalizing the indices\n",
    "fitted_params_dict.update(\n",
    "    {\n",
    "        \"MAX_MARGINAL_AGENT_ECONOMIC_INDEX\": [index_results['open'][i][1] for i in range(51)],\n",
    "        \"MAX_MARGINAL_PLANNER_ECONOMIC_INDEX\": index_results['open']['p'][1],\n",
    "        \"MAX_MARGINAL_AGENT_HEALTH_INDEX\": [index_results['closed'][i][0] for i in range(51)],\n",
    "        \"MAX_MARGINAL_PLANNER_HEALTH_INDEX\": index_results['closed']['p'][0],\n",
    "        \"MIN_MARGINAL_AGENT_ECONOMIC_INDEX\": [index_results['closed'][i][1] for i in range(51)],\n",
    "        \"MIN_MARGINAL_PLANNER_ECONOMIC_INDEX\": index_results['closed']['p'][1],\n",
    "        \"MIN_MARGINAL_AGENT_HEALTH_INDEX\": [index_results['open'][i][0] for i in range(51)],\n",
    "        \"MIN_MARGINAL_PLANNER_HEALTH_INDEX\": index_results['open']['p'][0],\n",
    "    }\n",
    ")     \n",
    "\n",
    "# Apply the alpha-estimation procedure to fill in the rest of the fitted params dictionary\n",
    "fitted_params_dict.update(\n",
    "    {\n",
    "        \"INFERRED_WEIGHTAGE_ON_AGENT_HEALTH_INDEX\": [\n",
    "            estimate_alpha_and_plot_rew_examples(state_idx=i, do_plot=False) for i in range(51)\n",
    "        ],\n",
    "        \"INFERRED_WEIGHTAGE_ON_PLANNER_HEALTH_INDEX\": estimate_alpha_and_plot_rew_examples(\n",
    "            state_idx='p', do_plot=False\n",
    "        ),\n",
    "    }\n",
    ")\n",
    "\n",
    "# We have replaced the placeholders in the original fitted_params_file. All done -- time to re-save!\n",
    "with open(os.path.join(data_dir, fitted_params_filename), \"w\") as fp: \n",
    "    json.dump(fitted_params_dict, fp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  \n",
    "#  \n",
    "# Inspection\n",
    "\n",
    "Use this section to inspect the behavior of the calibrated simulator. As a starting point, we provide some code:\n",
    "- to re-build an environment instance (now that all the calibration fits have been exported),\n",
    "- to run that environment using the real-world policies,\n",
    "- and to compare real-world COVID-19 cases/deaths against what the calibrated simulation predicts given the real-world policies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the actual policy in the simulation to see what kind of simulated outcomes we get and how they compare.\n",
    "\n",
    "# (this will now use the fully-calibrated settings)\n",
    "calibrated_env = ai_economist.foundation.make_env_instance(**env_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run sim, setting agent actions based on the real-world policy\n",
    "calibrated_env.reset();\n",
    "for _ in range(calibrated_env.episode_length):\n",
    "    t_str = calibrated_env.current_date.strftime(DATE_FORMAT)\n",
    "    actions = {\n",
    "        str(idx): policy_df[state][t_str]\n",
    "        for idx, state in calibrated_env.us_state_idx_to_state_name.items()\n",
    "    }\n",
    "    calibrated_env.step(actions);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Visualize active cases and deaths, for the real-world and for the simulation w/ real-world policies\n",
    "_, (ax0, ax1) = plt.subplots(1, 2, figsize=(16, 5))\n",
    "\n",
    "\n",
    "infected_df = dataframes[\"infected\"]\n",
    "deaths_df = dataframes[\"smoothed_deaths\"]\n",
    "\n",
    "ax0.plot(infected_df[\n",
    "    fitted_params_dict['settings']['env']['start_date']:fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "].sum(1).values, label='Actual Data');\n",
    "ax0.plot(calibrated_env.world.global_state[\"Infected\"][:, :].sum(1), 'r', label='Simulated');\n",
    "ax0.set_title('Active COVID-19 Cases');\n",
    "ax0.set_xlabel('Days Since Start Date');\n",
    "ax0.grid(b=True, axis='y');\n",
    "ax0.legend();\n",
    "\n",
    "ax1.plot(deaths_df[\n",
    "    fitted_params_dict['settings']['env']['start_date']:fitted_params_dict['settings']['LAST_DATE_IN_VAL_SET']\n",
    "].sum(1).values, label='Actual Data');\n",
    "ax1.plot(calibrated_env.world.global_state['Deaths'][:, :].sum(1), 'r', label='Simulated');\n",
    "ax1.set_title('Cumulative COVID-19 Deaths');\n",
    "ax1.set_xlabel('Days Since Start Date');\n",
    "ax1.grid(b=True, axis='y');\n",
    "ax1.legend();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}