ElastiCluster-ClusterJob Computing Model
Introduction
This is a supplemetary page to the paper “Ambitious Data Science Can Be Painless” by Monajemi et al. Here you will learn how to conduct massive computational experiments in the cloud using an approach developed by Hatef Monajemi and Riccardo Murri during the first iteration of Stats285 course at Stanford University in the Fall of 2017; You will use elasticluster to build your own personal cluster in the cloud
and then use clusterjob (CJ) to run jobs on this cluster. Both of these steps are very straight-forward and painless.
This documents contains the detail of setting up your (GPU-accelerated) cluster and testing that it works properly with CJ. Once these steps are completed, you can then enjoy conducting your own massive data science experiments in the cloud painlessly.
Questions and Comments
Please send your questions to CJ’s Google group. For other inquires, please send an email to hatefmonajemi@gmail.com or riccardo.murri@gmail.com
Building your cluster in the cloud
To create your own cluster in the cloud, you should take the following 4 steps:
- Install ClusterJob
- Setup your cloud account
- Create your cluster using 0-install ElastiCluster script
- Test your cluster works with CJ
Part-1: Install ClusterJob
- Sign up for ClusterJob on clusterjob.org
- Follow CJ Installation Guide to install it on your machine.
Part-2: Setup Your Cloud Account
- Setup instructions for Google Compute Engine
- Setup instructions for Amazon EC2
- Setup instructions for Microsoft Azure
Part-3: Create Your Cluster Using Elasticluster
- Get elasticluster 0-install script from GitHub or download it from this website
curl -O https://raw.githubusercontent.com/gc3-uzh-ch/elasticluster/master/elasticluster.sh chmod +x elasticluster.sh
- Provide your desired configuration
elasticluster.sh list-templates vim ~/.elasticluster/config
You may use the default Elasticluster config template or
elasticluster_config
that is provided with this companion page. You can getelasticluster_config
via the following commands:curl -O https://monajemi.github.io/datascience/assets/files/elasticluster_config cp elasticluster_config ~/.elasticluster/config
You must change the contents of the elasticluster config file
~/.elasticluster/config
to reflect your own credentials and choice of resources. As an example, on Google Cloud, you should retrive yourproject_id
,client_id
, andclient_secret
by visiting the Credential Page and update the contents of~/.elasticluster/config
by providing these credentials.gcloud
provides useful commands to see the available options, for example:
gcloud compute machine-types list --zones us-west1-a
lists all the machine types that are availbale in zone us-west1-a
This infomation can be found online on Google
Also,gcloud compute images list
list all the available images.
- Spin up your cluster
./elasticluster.sh -vvvv start gce
Note that cluster named
gce
is fully defined in your~/.elasticluster/config
. For convenient use, you may add an alias to your~/.bashrc
or~/.bash_profile
:
alias elasticluster='/PATH/TO/SCRIPT/./elasticluster.sh'
if you run into error, and asked to run the setup again, do so using
elasticluster -vvvv setup gce
-
You can also monitor the progress at your cloud console ( e.g., Google Cloud Consol , EC2 Consol
if everything goes well, you will see
your cluster is ready!
. This is the moment you should shout Yay! and congratulate yourself. You now have your own cluster! - Get the IP address of the
frontend
node using:elasticluster list-nodes gce
example:
35.199.171.137
- Login to your cluster to test it
ssh <USERNAME>@<FRONTEND_IP>
example:
ssh hatefmonajemi@35.199.171.137
(for GCP, your gmail ID is your username) - To destroy your cluster:
elasticluster -vvvv stop gce
Note that this command will destroy your cluster and you lose all the data on it. Make sure you get your data to a safe storage place before you destroy your cluster.
Part-4: Test your cluster works with CJ
After you have launched your cluster successfully, it is time to test it by running a sample job on it using ClusterJob. Follow the instructions on Test Your Cluster With CJ.