This is how I started JupyterLab using SageMaker, then launched a FargateCluster using dask-cloudprovider, following Jacob Tomlinson's excellent blog post.
First I created a SageMaker notebook instance using the AWS Console. The default is ml.t2.tiny
with 5GB EBS disk, but that wasn't enough memory or disk for me to create a custom conda environment with xarray, hvplot etc. So I chose ml.t3.large
with 40GB storage.
Under SageMaker=>Notebook=>Git Repositories, I added this sagemaker-fargate-test repo so I would have my sample notebooks when I start my SageMaker JupyterLab.
I then fired up the SageMaker instance JupyterLab, opened a terminal and typed:
conda activate base
conda update conda -y
conda env create -f ~/SageMaker/sagemaker-fargate-test/pangeo_env.yml
to update conda and create my custom pangeo
environment.
I then did aws configure
and added my amazon keys. This creates the ~/.aws
directory with credentials, which I copied to the persisted ~/SageMaker
directory. This was a hacky way of giving my SageMaker Notebook instance the credentials to create the FargateCluster.
I then created a SageMaker "Lifecycle configuration" script, which runs when the SageMaker notebook instance starts. This script just copies the .condarc
and the .aws
credentials directory from persisted space to the $HOME directory. This is the lifecycle_start_notebook.sh
script in this repo.
The last remaining step was to create a dask worker container for FargateCluster to run. To create this container, I just added some packages to the daskdev/dask container Dockerfile.
The sample Hurricane Ike Notebook then ran successfully. Here's a snapshot of the Dask dashboard: