The University of Sheffield
Sheffield-WRGRID

Submitting Jobs and the Sun Grid Engine ( SGE )

What is a job? In computing terminology a job is a completely defined computational task.  Most iceberg users' requirement being simply to run an application, job and application is usually synonymous within the context of these guides.  

SGE controls the way computing jobs requested to run on iceberg ( via qsub, qsh or qrsh commands) are scheduled.

Every job is allocated time and memory resource. Currently-

Any job exceeding its allocated time or memory gets terminated without any warning!

 SGE commands

Running Jobs (applications) Interactively:  qsh and qrsh

Using programs with "Graphical User Interfaces" as well as code development activities such as editing, compiling are best done interactively.

 Running Jobs (applications) in Batch Mode: qsub , runfluent , runmatlab , runabaqus, runansys, runcfx5par

Time consuming, non-graphical jobs are better suited for batch mode of working via the qsub command.

Batch processing involves preparing and submitting a computing job to the cluster to be run later without any user intervention. SGE job scheduling system will then;

 qsub Command

    USUAL FORMAT:    qsub scriptfile

    FULL SYNTAX:   qsub  [sge-options]  scriptfile  [-- optional_parameters_to_script]

qsub submits a batch job to the iceberg cluster. To submit a job you must;

Your script file will contain a set of Linux commands that will be executed in the order they appear in this file.
We strongly recommend that you specify the shell you are using in the very first line 'some times refered to as the BANG line' of your scriptfile.
Therefore this first line should read as;

  #!/bin/bash   for normal job scripts that uses the bash shell ( which is the default shell in iceberg )
  or
 #!/bin/csh      for those few jobs using the c-shell.

Note: If you are using the module commands to access your software, it is essential to specify this bang-line correctly.
    
The scriptfile can optionally contain lines starting with #$ that provides information about job requirements as shown in the table below.
For example to request 8 Gigabytes of memory for the job you will have a line that reads;
   #$  -l mem= 8G
  Although we highly recommend it for clarity, it is not essential to place the #$ lines in the beginning of the file as the job scheduler has two passes over the scriptfile.

You can either use a text editor on iceberg to create your scriptfile or prepare it somewhere else and transfer it onto iceberg by using secure ftp.
nedit , gedit , vi , emacs, pico and nano are a few good text editors available on the worker nodes. Note however that the iceberg head-node does not currently have gedit or pico installed. 

List of Useful Options for qsub Command

A few useful qsub options
-l arch=intel*
-l arch=amd*
This flag restricts your job to run only on a node with the specified architecture.
You will not normally need to use this flag unless your software requires the use
of a certain architecture (i.e Intel or AMD ). 
By not specifying this flag your job will be able to run on the first currently available node.
-l h_rt=hh:mm:ss Specify maximum run time (wall-clock) in
hours(hh) , minus(mm) and seconds(ss)
-l mem=nnG For serial jobs, specifies total "virtual" memory requirements in Gigabytes.
For parallel jobs, specifies memory allocation per processor ( and NOT total memory).
Note that this is the case for OpenMP jobs as well.
For example: an OpenMP job with 2 threads needing 24 GBytes in total will need
to specify -l mem=12G . 
Currently default is 6G and maximum total allowed is 128G. 
-l rmem=nnG

For serial jobs, specifies the total "real" 'or resident' memory requirements in Gigabytes.
Real memory specified (rmem)should always be less than or equal to virtual memory (mem) and of course will also be less than the actual physically available RAM memory on a worker node. 
The relative size of the real over virtual memory allocation will effect the amount of paging 'page-fault' that takes place during the execution of a job thus impacting on the efficiency of usage. 
For parallel jobs, specifies real-memory allocation per processor ( and NOT total memory).
Note that this is the case for OpenMP jobs as well.
Currently default is 2G and maximum total allowed is 48G. 

-pe openmpi-ib nn
-pe ompigige  nn
-pe openmp  nn
Specify the parallel MPI environment to use 
and the number of processors needed (nn)
Currently nn must not exceed 32

-m bea
-M email_address

 Notification options: -m can be followed by any 
combination of b (begin) e (end) or (a) abort
-M must specify your email address. Either none or both m and M must be specified.
-o filename
-e filename
-j 
 The output from a job is usually sent to two files (e for error and o for normal)
generated from the script and and the JOB_ID number.
This behavior can be overridden by these parameters.
-j option joins normal and error outputs and is HIGHLY recommended.

-v variable=value
-V

 Passes the defined environment variable to the jobs execution environment
 Passes all the environment variables of the current shell to the job.

-help

 Gives a full listing of qsub parameters/options. 

 Easy Ways of Running Some Applications in Batch Mode 

We have made a few home-grown commands available on iceberg to make submitting batch jobs easy for a few commonly used applications. These are; 

  runfluent , runansysruncfx5par, runmatlab  and runabaqus 

For further information on how to use any of these three commands type its name at the bash prompt on iceberg.

Checking the Progress of your Jobs (i.e. applications) : qstat,  Qstat, qmon  

     FORMAT:   Qstat    

Detailed information about all the sge commands is available in the man pages. Type man qstat or man qsub for example.

qmon command is a sophisticated "and at times too complicated" graphical user interface to SGE that can monitor the progress of jobs, submit jobs and also perform a large number of administrative tasks that only the system administrators have rights to and will need to. The first ICON (Job Control Icon) of the GUI for of qmon can be used to check the progress of all jobs on iceberg.  

Cancelling already submitted jobs

   FORMAT:   qdel job_ID


Fair Share Mechanism on iceberg

 

 SGE Job Queues

Queue Name Max. Real Time Allowed Queue Specific Features
short 8 hours For Quick Jobs
long 168 hours For long running serial jobs
parallel 168 hours For MPI parallel jobs
openmp 168 hours Multi-threaded and OpenMP

 Important: Any job that exceeds the "Maximum Allowed Time" limit is killed as soon as time is exceeded. Therefore it is important that you specify ample time for your jobs. Normally there is no need to specify a particular queue for your job. You simply specify the maximum time and maximum memory ( and in the case of parallel jobs number of processors) needed and SGE will put your job into the correct queue.