Test your cluster with CJ
- add your cluster info to
~/CJ_install/ssh_config
. You may use the followng CJ commands to configure or update your cluster.
cj config <CLUSTER_NAME>
cj config <CLUSTER_NAME> --update
cj config-update <CLUSTER_NAME>
Here is an example:
[gce]
Host 35.199.171.137
User hatefmonajemi
Bqs SLURM
Repo /home/hatefmonajemi/CJRepo_Remote
MAT ""
MATlib ""
Python python3.4
Pythonlib pytorch:torchvision:cuda80:pandas:matplotlib:-c soumith
R ""
Rlib ""
[gce]
note that
Host
is the IP address of your frontend node (e.g.,35.199.171.137
)
- go to
~/CJ_install/example/Python
and runsimpleExample
on your cluster:
# update cj
cj update
# test CJ run
cj run simpleExample.py gce -m "Python on CPU test"
cj state
cj ls
- go to
~/CJ_install/example/Python/pytorch/mnist
and runmnist.py
on your cluster using GPU:
cj run mnist.py gce -alloc "--gres=gpu:1" -m "Pytorch on GPU test"
# see state of your job
cj state
# see the run log
cj runlog
# get a summary of all jobs on your cluster
cj summary gce
If everything makes sense, move on to running your own data science experiments.