ElastiCluster-ClusterJob Computing Model
Introduction
This is a supplemetary page to the paper “Ambitious Data Science Can Be Painless” by Monajemi et al. Here you will learn how to conduct massive computational experiments in the cloud using an approach developed by Hatef Monajemi and Riccardo Murri during the first iteration of Stats285 course at Stanford University in the Fall of 2017; You will use elasticluster to build your own personal cluster in the cloud
and then use clusterjob (CJ) to run jobs on this cluster. Both of these steps are very straight-forward and painless.
This documents contains the detail of setting up your (GPU-accelerated) cluster and testing that it works properly with CJ. Once these steps are completed, you can then enjoy conducting your own massive data science experiments in the cloud painlessly.
Questions and Comments
Please send your questions to CJ’s Google group. For other inquires, please send an email to hatefmonajemi@gmail.com or riccardo.murri@gmail.com
Building your cluster in the cloud
To create your own cluster in the cloud, you should take the following 4 steps:
- Install ClusterJob
- Setup your cloud account
- Create your cluster using 0-install ElastiCluster script
- Test your cluster works with CJ
Part-1: Install ClusterJob
- Sign up for ClusterJob on clusterjob.org
- Follow CJ Installation Guide to install it on your machine.
Part-2: Setup Your Cloud Account
- Setup instructions for Google Compute Engine
- Setup instructions for Amazon EC2
- Setup instructions for Microsoft Azure
Part-3: Create Your Cluster Using Elasticluster
- Get elasticluster 0-install script from GitHub or download it from this website
curl -O https://raw.githubusercontent.com/gc3-uzh-ch/elasticluster/master/elasticluster.sh chmod +x elasticluster.sh - Provide your desired configuration
elasticluster.sh list-templates vim ~/.elasticluster/configYou may use the default Elasticluster config template or
elasticluster_configthat is provided with this companion page. You can getelasticluster_configvia the following commands:curl -O https://monajemi.github.io/datascience/assets/files/elasticluster_config cp elasticluster_config ~/.elasticluster/configYou must change the contents of the elasticluster config file
~/.elasticluster/configto reflect your own credentials and choice of resources. As an example, on Google Cloud, you should retrive yourproject_id,client_id, andclient_secretby visiting the Credential Page and update the contents of~/.elasticluster/configby providing these credentials.gcloudprovides useful commands to see the available options, for example:
gcloud compute machine-types list --zones us-west1-a
lists all the machine types that are availbale in zone us-west1-a
This infomation can be found online on Google
Also,gcloud compute images listlist all the available images.
- Spin up your cluster
./elasticluster.sh -vvvv start gceNote that cluster named
gceis fully defined in your~/.elasticluster/config. For convenient use, you may add an alias to your~/.bashrcor~/.bash_profile:
alias elasticluster='/PATH/TO/SCRIPT/./elasticluster.sh'if you run into error, and asked to run the setup again, do so using
elasticluster -vvvv setup gce
-
You can also monitor the progress at your cloud console ( e.g., Google Cloud Consol , EC2 Consol
if everything goes well, you will see
your cluster is ready!. This is the moment you should shout Yay! and congratulate yourself. You now have your own cluster! - Get the IP address of the
frontendnode using:elasticluster list-nodes gceexample:
35.199.171.137 - Login to your cluster to test it
ssh <USERNAME>@<FRONTEND_IP>example:
ssh hatefmonajemi@35.199.171.137(for GCP, your gmail ID is your username) - To destroy your cluster:
elasticluster -vvvv stop gceNote that this command will destroy your cluster and you lose all the data on it. Make sure you get your data to a safe storage place before you destroy your cluster.
Part-4: Test your cluster works with CJ
After you have launched your cluster successfully, it is time to test it by running a sample job on it using ClusterJob. Follow the instructions on Test Your Cluster With CJ.