Parallel Job Examples

Multiprocessor Jobs Using Job Arrays

SGE provides a particularly useful way of utilising multiple processors using so called task arrays. This approach is paticularly useful for high throughput tasks where there is very little couping between each of the processors.

This approach is beneficial when it is necessary to conduct parametric studies to optimize various parameters in experiments. Computing the optimum solution involves repeated execution of same set of operations with a range of values for the input data. The Sun Grid Engine (SGE) has a job array option, enabling the submission of all the jobs with one command file and lets the scheduler execute the jobs as and when resources are available. This is achieved by using the -t option to qsub as shown below: 

#$ -t lower-upper:interval

  where lower, upper and interval are whole integers. Example: -t  0-20:2
The -t option defines the task index range, where lower is the lower index, upper the higher index and interval the interval value. Interval is assumed to be 1 if not specified.
SGE runs your executable once for each number specified in the task index range and it generates a variable, $SGE_TASK_ID, which your executable can use to determine its task number each time it is run. It can select input files or other options to run according to its task index number.

Below is an example, that demonstrates the usage of the job array feature. In this example, there will be 8 tasks created and in each task the user's program myprog.exe will run and read its data from a file named data.n and save its output into a file named output.n, where n is the task_id. Therefore at the end of a successful run there will be 8 output files (output.1, output.2,  ... output.8)  corresponding to 8 data files ( data.1, data.2, .... , data.8 ) respectively.

#!/bin/bash
#
# -t 1-8
#
echo "Task id is $SGE_TASK_ID"
 ./myprog.exe < data.$SGE_TASKID   > output.$SGE_TASK_ID
 
 

The TASK_ID can also be read into your C or Fortran program via the getenv function.
C Program Example:

#include
int main(int argc,char *argv[])
{
char* TestID = new char [400];
sprintf(TestID ,"%s ", getenv("SGE_TASK_ID"));

printf(" Test ID is %s \n " , TestID);
return 0;
}

Running and Building Multiprocessor Applications

Applications exploiting iceberg's multi core processor architecture can be developed using shared and distributed programming techniques. Iceberg provides a number of parallel environments that can run jobs using multiple cores and processors. The table below shows the available parallel environments and the corresponding compilers.

Parallel Environment name Compiler Type
openmpi-ib /usr/mpi/openmpi-1.2.8/pgi MPI using infiniband
mvapich2-ib /usr/mpi/mvapich2-1.2p1/pgi MPI using mvapich2 over infiniband
openmp   Threaded programming using OpenMP on multi core servers
ompigige /usr/local/packages5/openmpi-pgi Parallel jobs using MPI with gigabit ethernet (embarrassingly parallel jobs only)

Shared Memory Programming with OpenMP

Source code containing OpenMP compiler directives can be compiled on symmetric multiprocessor nodes such as the dual processor dual core Sun X2200 servers and the dual processor quad core Sun X2200 servers that make up iceberg. SMP source code is compiled using the PGI compilers with the -mp option. To compile C, C++, Fortran77 or Fortran90 code, use the mp flag as follows

pgf77 [compiler options]   -mp    filename
pgf90 [compiler options]   -mp    filename
pgcc [compiler options]     -mp    filename
pgCC [compiler options]    -mp    filename
A simple makefile example is shown here, openmp makefile openmp makefile

The number of parallel execution threads at execution time is controlled by setting the environment variable OMP_NUM_THREADS to the appropriate value. For the csh or tcsh shell this is set using,
setenv OMP_NUM_THREADS=2
or for the sh or bash shell use,
export OMP_NUM_THREADS=2

To start an interctive shell with NPROC processors enter,
qsh -pe openmp NPROC -v OMP_NUM_THREADS=NPROC
Note that although the number of processors required is specified with the -pe option, it is still necessary to ensure that the OMP_NUM_THREADS environment varaible is set to the correct value.

To submit a multi-threaded job to sun grid engine it is necessary to submit it to a special parallel environment that ensures that the job occuppies the required number of slots. Using the SGE command qsub the openmp parallel environment is requested using the -pe option as follows,
qsub -pe openmp 2 -v OMP_NUM_THREADS=2 myjobfile.sh
The following job script, job.sh is submitted using, qsub job.sh
Where job.sh is,
#!/bin/bash
#$ -pe openmp 4
#$ -v OMP_NUM_THREADS=4

#
 ./executable

Parallel Programming With MPI

Iceberg is designed with the aim of running MPI (message passing interface ) parallel jobs, the sun grid engine is able to handle MPI jobs. In a message passing parallel program each process executes the same binary code but executes a different path through the code this is SPMD (single program multiple data) execution. Iceberg uses a number of parallel environments, these are indicated in the table above. Some of these environments can use the Infiniband fast interconnect. A simple MPI hello world program is shown below.

#include
#include

int main(int argc,char *argv[])
{

int rank; /* my rank in MPI_COMM_WORLD */
int size; /* size of MPI_COMM_WORLD */

/* Always initialise mpi by this call before using any mpi functions. */
MPI_Init(& argc , & argv);

/* Find out how many processors are taking part in the computations. */
MPI_Comm_size(MPI_COMM_WORLD, &size);

/* Get the rank of the current process */
MPI_Comm_rank(MPI_COMM_WORLD, & rank);
if (rank == 0)
printf("Hello MPI world from C!\n");

printf("There are %d processes in my world, and I have rank %d\n",size, rank);
MPI_Finalize();
}

When run on 4 processors he above program produces the following output,

Hello MPI world from C!

There are 4 processes in my world, and I have rank 2
There are 4 processes in my world, and I have rank 0
There are 4 processes in my world, and I have rank 3
There are 4 processes in my world, and I have rank 1

To compile C, C++, Fortran77 or Fortran90 MPI code using the portland compiler, type,

mpif77 [compiler options] filename
mpif90 [compiler options] filename
mpicc [compiler options] filename
mpiCC [compiler options] filename
A simple makefile example is shown here, mpi makefile. The makefile provides an example of how the different mpi compilers with PGI and gcc may be used to build an application (remove comments from the makefile as required).

To submit a an MPI job to sun grid engine it is necessary to use the correct parallel environment for the compiler used to build the application. The PE's are shown in the table above. The PE ensures that the job occupies the required number of slots. Using the SGE command qsub the required parallel environment is requested using the -pe option as follows,
qsub -pe mvapich2-ib 4 myjobfile.sh
qsub -pe openmpi-ib 4 myjobfile.sh
qsub -pe ompigige 4 myjobfile.sh

It is important to note that the job script must invoke the executable in the correct way, example job scripts are shown below for openmpi-ib, mvapich2-ib and openmpi using the gigabit ethernet. Example job scripts are shown below, the scripts show alternate invocations for executing applications compiled using either the PGI or gnu compilers. In the examples replace # of MPI processes with an integer number.

The following job scripts, job.sh may be submitted using, qsub job.sh
Where job.sh is,

To use the openmpi parallel environment with infiniband:

#!/bin/bash
#$ -l h_rt=1:00:00
#$ -pe openmpi-ib [# of MPI processes ]

#
module   add   mpi/pgi/openmpi
mpirun   ./executable


To use the mvapich2 environment with infiniband:
#!/bin/bash
#$ -l h_rt=1:00:00
#$ -pe mvapich2-ib [# of MPI processes]
#
module add mpi/pgi/mvapich2
mpirun   ./executable



To use the openmpi parallel environment with the gigabit ethernet :

#!/bin/bash
#$ -l h_rt=1:00:00

#$ -pe   ompigige   [# of MPI processes]

#
module    add    mpi/pgi/openmpi
mpirun    ./executable


The downside of message passing codes is that they are harder to write than scalar or shared memory codes. Also, data transfer rates between computers via ethernet connections are usually an order of magnitude slower than the data transfer rates between the CPU and memory thus creating potential data transfer bottlenecks in MPI programs.  

The solution to this problem for a high performance cluster such as iceberg is to use a high performance network solution, such as the 40Gbit/sec infiniband network. The availability of such high bandwidth networking makes possible a scalable parallel machine.

For example using 10 cpu's a well written piece of code would expect to run approximately 10 times faster. However it is important to take into account Amdahls law. This law states that the speedup of a parallel algorithm is effectively limited by the number of operations which MUST be performed sequentially, i.e its Serial Fraction.

Useful Links