Wiki of DrStrange: Group GPU Accelerated Computing Server

Welcome to Wiki for DrStrange, the GPU accelerated computing server in the School of Computer Science and Engineering at the University of New South Wales. This page should include all basic things about the machine, and more are to be added.

Contents
  1. Overview
  2. Getting Started
    1. Accesing the Server
      1. Generating SSH Key Pair
      2. Login
    2. Starting Computing
      1. GNU Screen
      2. Checking GPU Status
      3. Masking GPUs
      4. TensorFLow: Allowing GPU Memory Growth
      5. Installing Python Packages in User's Home Directory with pip

Overview

DrStrange is a midrange system/ minicomputer customized and purchased for the purpose of conducting massive computing tasks for our own group, specifically floating point computation intensive deep learning tasks. It currently consists of 2 12-core/ 24-thread Intel(R) Xeon(R) CPU E5-2697 v2 CPUs, 6 NVIDIA TITAN X Pascal GPUs, a total 768 GiB memory, and an Intel P3608 SSD( 1.6TB, NVMe PCIe 3.0 X8 HET MLC) with 7 Seagate Enterprise Capacity 2.5 HDDs(2TB SATA 6Gb/s 7200rpm 128MB). DrStrange runs standard CentOS 7.

Global Available software:
GNU GCC 4.8.5
Python 2.7/3.6 with Numpy 1.12.1 , pandas 0.20.1 , tensorflow-gpu 1.0.1 , scikit-tensor 0.1 , scipy 0.19.0
openjdk 1.8.0_131

Getting Started

Accessing the Server

DrStrange runs linux, the server can be accessed using SSH, thus SSH client is required. Unix based systems, such as MacOS(Mac OS X), OpenBSD and most Linux distributions, usually already have OpenSSH client included out of box. However, installation of one of such client on Microsoft Windows systems is compulsory. SSH client can be found by your preference, yet here in this Wiki, PuTTY is chosen for demonstration.

Note: Password authentication for SSH is disabled due to security reasons. Public key for SSH is required to be sent to one of the administrators, by the time of account creation, change of private key, or requesting authentication of new hosts( Copying private key file to new hosts is also feasible to enable authentication, but not recommended).
Generate a key for SSH

Generating SSH Key Pair

If you don't have a key pair for SSH yet, please following the instructions to generate the key pair. Otherwise, you can jump to Login section.

Unix based systems aformentioned often have OpenSSH client bultin, which also ususally shipped with key generators, and its binary excutable is typically called `ssh-keygen`. A simple exmaple of usage can be as following:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/USERNAME/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/USERNAME/.ssh/id_rsa.
Your public key has been saved in /home/USERNAME/.ssh/id_rsa.pub.
The key fingerprint is:
00:50:7f:29:e2:2c:37:af:b2:8f:cf:e4:50:2f:c8:82 USERNAME@YOURCOMPUTER
The key's randomart image is:
+--[ RSA 2048]----+
|  .oo            |
|     o   .       |
|    . + o        |
|   o . +         |
|  . *   S        |
|.. = +           |
|E + o o          |
| ..* o           |
|  o=*            |
+-----------------+
                        

Tutorial for "KEY GENERATOR FOR PUTTY ON WINDOWS" can be found here.

Login

This section is for ones not familar or have little with Linux systems. Experienced users can skip by clicking here to start computing.

$ ssh USERNAME@drstrange.cse.unsw.edu.au
                        

As our server is authenticated by SSH keys, you would now logged in, and get the message below:

		You are accesing a computer owned by:

	            Service Oriented Computing Group,
		    School of Computer Science and Engineering,
		    UNSW Sydney

	    ***** This service is for authorised clients only *****

 ****************************************************************************
 *                                                                          *
 * WARNING:     It is a criminal offence to:                                *
 *                                                                          *
 *              i.  Obtain access to data without permission                *
 *                        (Penalty 2 years imprisonment)                    *
 *              ii. Damage, delete, alter or insert data without permission *
 *                        (Penalty 10 years imprisonment)                   *
 *                                                                          *
 ****************************************************************************
Last login: Tue Jul  4 15:56:25 2017 from WHO.KNOW.THIS.COMPUTER
                        

Tutorial for "How To Use SSH Keys with PuTTY (Windows users)" can be found here .

Starting Computing

As the server is accessed via network, and connection outside CSE building can be unstable, let me first introduce you GNU Screen.

GNU Screen

Screen is a full-screen window manager that multiplexes a physical terminal between several processes, typically interactive shells. Thus using Screen can help your remote processes survive a network disruption. Screen is installed on Drstrange, and you can simply fork screen by

$ screen
a new shell ocuping the whole terminal window should appear, and don't panic, this is a new instance of shell forked by Screen, your jobs before is not killed and still running. In this Screen-forked shell you can do things as ususal.
The magic here is you can detach a Screen, and later reattach it. By default, a Screen session can be deattached by key combination `Ctrl-a` followed by `d`.
To show a list of Screens invoked,
$ screen -ls
and it will give you an output similar to this:
$screen -ls
There is a screen on:
	11123.pts-1.drstrange	(Detached)
1 Socket in /var/run/screen/S-USERNAME.
                        
In this example 11123 is the pid(Process ID) of the detached Screen, which can be reattached by
$ screen -r 11123
and screen with pid 11123 will be reattached, and display as full terminal windows size.

Checking GPU Status

It is generally recommended to check GPU status before running a gpu accelarated program, so that we can avoid interrupting jobs owned by someone else. Also you may not want to fork your job on a busy GPU unit. Check GPU status typically can be done by command `nvidia-smi`. While here in our machine, a more user friendly python script is configured and of which the usage is encouraged.
You can check the usage by

$gpustat -cpu
which will give you GPU number, GPU name, temprature, GPU ultilization rate, memory usage, user, process name accordingly in different colors.

Masking GPUs

Some GPU programs are designed to utilize multiple GPUs, and this can largely save time consumed on trainning massive models. Somehow this often preventing other users from conducting experiments, especially TensorFlow based ones, which are by default allocating all available memory at start. As our machine have 6 NVIDIA Titan X pascal GPUs, which is quite powerful, using all 6 GPUs at once is often over-kill. To avoid such situation, a mask can be introduced: `CUDA_VISIBLE_DEVICES`.
`CUDA_VISIBLE_DEVICES` is an environment variable, which can be simply added in frot of linux commands at each run. Syntax can be:

CUDA_VISIBLE_DEVICES=1      //Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1    //Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES=”0,1”  //Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3  //Devices 0, 2, 3 will be visible; device 1, 4, 5 is masked
                             
For example, you have a python script call `eg.py`, which you want spawn it only on `GPU device 0`, this can be done via:
$ CUDA_VISIBLE_DEVICES=0 python eg.py

TensorFLow: Allowing GPU Memory Growth

Since Tensorflow is considerably popular in our group and many have not programed properly with it so far, I feel necessary to propose another handy way to use GPU resource more efficiently, that is to control the usage of memory, to only grow the memory usage as is needed by the process. This can be done by set parameter `allow_growth` to be `True` when starting the session.

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
                        
Also, a `per_process_gpu_memory_fraction` option setting up limit of memory usage of process is also quite handy, especially used along with `CUDA_VISIBLE_DEVICES`. An example to only allocate 40% of the total memory of each GPU can be:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
                        
More details are available here . You may also find Pete Warden's blog interesting.

Installing Python Packages in User's Home Directory with pip

As users on DrStrange normally have no privilege permission( or known as "root"), global installation of Python packages is not practical. Yet enquiry or suggestion on installing common useful packages are encouraged to send to administrators or me Chaoran myself, it is also very handy at many times, to install packages under users' home themselves, which requires no privilege permission, and affect nothing on others.
This can be done easily by adding a `--user` option to a normal pip3(pip2) command. For example:

$ pip3 install --user [PACKAGE_NAME]
where `[PACKAGE_NAME]` is the one you want to install.

author

About Chaoran Huang

Chaoran Huang is a PhD candidate in our group, and he manages the GPU server. More details are available at his personal webpage.
Back to Top