Finding out the resource requirements of your jobs
Three most important parameters that control the scheduling of jobs on iceberg are;
- Real-Time requirements of the job ( -l h_rt= )
- Real-Memory requirements of the job ( -l rmem= )
Time Allocation ( -l h_rt= )
- Default time allocation for all jobs is 8 hours of wall_clock_time.
- You may specify the time allocation for your job by using the -l h_rt= parameter to qsub or qsh
- Currently the maximum time allowed for interactive (qsh) jobs is 8 hours.
- Currently the maximum time allowed for batch jobs ( qsub ) is 168 hours.
- It is always safer to over estimate rather than under estimate the job time allocation.
- Any job exceeding the specified time allocation will be terminated abruptly and without any warning.
- Do NOT specify more that 168 hours for batch jobs and more than 8 hours for interactive jobs, as such jobs will never run.
Note: h_rt stands for hard-limit for real-time.
Memory Allocation ( -l rmem= )
- The default real memory allocation ( -l rmem= ) for each job is 2 GBytes.
- Jobs attempting to consume more than the allocated real memory ( rmem ) will send warning to users but will continue to run.
- Do NOT specify more than 256 GB of real memory ( rmem ) .
- For parallel jobs the memory allocations are treated as per-core rather than for the entire job. For example, a 4-way parallel job with -l rmem=12G will be allocated 12*4 = 48G of memory in total.
Every iceberg user is allocated a file-storage area of their own. Please read the section on filestore allocation on iceberg for further information. Any attempt to exceed this allocation during the execution of a job can have disastrous consequences. This is because any program or package writing into files will produce a fatal error and stop if the filestore limit happens to be exceeded during that operation.
Filestore limits are not associated with jobs and can not be specified while submitting a job. Users must make sure that there is sufficient spare space in their filestore areas before submitting any job that is going to produce large amounts of output.
quota command can be used to check your current filestore allocation and usage.
Finding out your job's requirements
There are number of ways of finding out the time and memory consumption of your job.
Here are few methods we recommend:
1. By using the emailing parameters of the qsub command:
Submit your job 'qsub' by specifying very generous memory and time requirements to ensure that it runs to completion" and also using the -M and -m e parameters to receive an email-report. The mail message will list the maximum memory usage ( MAX VMem ) as well as the Wall_Clock_time used by the job.
Here is an example job script;
#$ -l h_rt=120:00:00 -l rmem=8G
#$ -m eba -M email@example.com
myprog < mydata.txt > myresults.txt
When it is run, you will receive an email reporting the memory and time usage figures.
2. By using the qtop command:
While your job is running simply type qtop. This will produce a neat table for each running job giving the Virtual_Memory, RSS_Memory and Real-Time usage-so-far.
3. By using the Qstat command:
You can also detect the memory used by your job while it is running by using the qstat command as follows:
- While a job is still running find out its JOB_ID by -
- And check its current usage of memory by-
qstat -F -j job_id | grep mem
The reported figures will indicate
- the currently used memory ( vmem )
- Maximum memory needed since startup ( maxvmem)
- cumulative memory_usage*seconds ( mem )
It is the maxvmem figure that you will need to use to determine the -l rmem= parameter for your next job.
4. By using timing commands in your script:
Another way of deducing the wall_clock_time used by a job is to use the date or the timeused command within the script file. The date command is part of the Linux operating system whereas the timeused command is specific to iceberg and provides the usage figures directly rather than having to manually calculate it from two subsequent date commands.
Here are some examples;
USING THE DATE COMMAND:
#$ -l h_rt=10:00:00
my_program < my_input
When the above script is submitted (via qsub), the job output file will contain the date and time at each invocation of the date command. You can then calculate the difference between these date/times to determine the actual time taken.
USING THE TIMEUSED COMMAND
#$ -l h_rt=10:00:00
my_program < my_input
When the above script is submitted the first invocation of the timeused command will initialise the timer counter due to the fact that TIMECOUNTER variable is set to 0. The subsequent invocations will report the time in hours,minutes and seconds since the first invocation.