ElastiCluster-ClusterJob Computing Model

Introduction

This is a supplemetary page to the paper “Ambitious Data Science Can Be Painless” by Monajemi et al. Here you will learn how to conduct massive computational experiments in the cloud using an approach developed by Hatef Monajemi and Riccardo Murri during the first iteration of Stats285 course at Stanford University in the Fall of 2017; You will use elasticluster to build your own personal cluster in the cloud and then use clusterjob (CJ) to run jobs on this cluster. Both of these steps are very straight-forward and painless.
This documents contains the detail of setting up your (GPU-accelerated) cluster and testing that it works properly with CJ. Once these steps are completed, you can then enjoy conducting your own massive data science experiments in the cloud painlessly.

Questions and Comments

Please send your questions to CJ’s Google group. For other inquires, please send an email to hatefmonajemi@gmail.com or riccardo.murri@gmail.com

Building your cluster in the cloud

To create your own cluster in the cloud, you should take the following 4 steps:

Install ClusterJob
Setup your cloud account
Create your cluster using 0-install ElastiCluster script
Test your cluster works with CJ

Part-1: Install ClusterJob

Sign up for ClusterJob on clusterjob.org
Follow CJ Installation Guide to install it on your machine.

Part-2: Setup Your Cloud Account

Part-3: Create Your Cluster Using Elasticluster

Get elasticluster 0-install script from GitHub or download it from this website

curl -O https://raw.githubusercontent.com/gc3-uzh-ch/elasticluster/master/elasticluster.sh
chmod +x elasticluster.sh

Provide your desired configuration
```
 elasticluster.sh list-templates
 vim ~/.elasticluster/config
```
You may use the default Elasticluster config template or elasticluster_config that is provided with this companion page. You can get elasticluster_config via the following commands:
```
 curl -O  https://monajemi.github.io/datascience/assets/files/elasticluster_config
 cp elasticluster_config ~/.elasticluster/config
```
You must change the contents of the elasticluster config file ~/.elasticluster/config to reflect your own credentials and choice of resources. As an example, on Google Cloud, you should retrive your project_id, client_id, and client_secret by visiting the Credential Page and update the contents of ~/.elasticluster/config by providing these credentials.

See config example on GitHub

gcloud provides useful commands to see the available options, for example:
gcloud compute machine-types list --zones us-west1-a
lists all the machine types that are availbale in zone us-west1-a
This infomation can be found online on Google
Also, gcloud compute images list list all the available images.

Advanced Tips

Spin up your cluster
```
./elasticluster.sh -vvvv start gce
```
Note that cluster named gce is fully defined in your ~/.elasticluster/config. For convenient use, you may add an alias to your ~/.bashrc or ~/.bash_profile:
alias elasticluster='/PATH/TO/SCRIPT/./elasticluster.sh'

if you run into error, and asked to run the setup again, do so using
```
 elasticluster -vvvv setup gce
```

You can also monitor the progress at your cloud console ( e.g., Google Cloud Consol , EC2 Consol

if everything goes well, you will see your cluster is ready!. This is the moment you should shout Yay! and congratulate yourself. You now have your own cluster!
Get the IP address of the frontend node using:
```
  elasticluster list-nodes gce
```
example: 35.199.171.137
Login to your cluster to test it
```
  ssh <USERNAME>@<FRONTEND_IP>
```
example: ssh hatefmonajemi@35.199.171.137 (for GCP, your gmail ID is your username)
To destroy your cluster:
```
  elasticluster -vvvv stop gce
```
Note that this command will destroy your cluster and you lose all the data on it. Make sure you get your data to a safe storage place before you destroy your cluster.

Part-4: Test your cluster works with CJ

After you have launched your cluster successfully, it is time to test it by running a sample job on it using ClusterJob. Follow the instructions on Test Your Cluster With CJ.