====== Job submit/management commands ======
===== Job submit command =====
==== Job submit command (qsub command) ====
\\
Submit command places a job on the queue of the accelerator server.\\
Submitting options also can be described in the execution script. \\ For more details, see each manual.\\
Please submit job to appropriate queue with qsub command, not to execute your program on the front end node directly.\\
If you execute a program on the front end node, it may be canceled by the administrator because it has effects on other users.\\
\\
**Usage**\\
$ qsub [-q queue name] [-l select=number of nodes] [-N Job name] [-M e-mail address] [-m mail point] [-l walltime=limit of walltime] [-l license name=number of use licenses] [execution script file]
\\
**List of options**\\
^Option ^Value ^
|q queue name |Set the queue name. \\ As for details of queue, see [[user_manual:acceleratorserver:queues_list|List of queues]]. |
|-l select=number of nodes |Specify number of the nodes to be used. \\ No this option means default (see [[user_manual:acceleratorserver:queues_list|List of queues]]).|
|-N job name |Set the job name. \\ The name of job is up to 236 characters. \\ The real-time job reference system displays up to 64 characters. \\ If no ‘-N’ option is specified, the system assign the default job name.|
|-M e-mail address |Set e-mail address to be received. \\ To receive an e-mail, ‘-m’ option is required. |
|-m mail point |Specify the set of events that causes mailing to be sent to the list of users specified in the ‘-M’ option. \\ To receive an e-mail, ‘-m’ option is required. |
|-l walltime=limit of walltime |Specify the limit of walltime.\\ If no this option is specified, the walltime is default value specified in the queue. (see [[user_manual:acceleratorserver:queues_list|List of queues]])\\ Appropriate value makes a queued job running frequently. |
|-l license name=number of use licenses |Specify the number of licenses when you use applications that requires managed licenses. \\ If no this options specified, the job is regarded not to use applications that licenses are managed. \\ As for specifying licenses, see how to execute applications.|
\\
**Example**\\ Execute following jobs \\
・queue: A_004, number of nodes: 2, limit of walltime: 1 hour, script file: hello.sh
$ qsub -q A_004 -l select=2 -l walltime=1:00:00 hello.sh
\\
Description in the execution script.
#!/bin/sh
#PBS -q A_004
#PBS -l select=2
#PBS -l walltime=1:00:00
:
:
:
\\
・queue: DA_002g, script file: hello.sh, recipient address: userA@test.com \\ Mail is sent when the job begins execution (option is ‘-m b’) and terminates (option is ‘-m e’).
$ qsub -q DA_002g -M userA@test.com -m be hello.sh
\\
Description in the execution script.
#!/bin/sh
#PBS -q DA_002g
#PBS -M userA@test.com
#PBS -m be
:
:
:
\\
・queue: A_004, job name: TEST, script file: hello.sh
$ qsub -q A_004 -N TEST hello.sh
\\
Description in the execution script.
#!/bin/sh
#PBS -q A_004
#PBS -N TEST
:
:
:
------
==== Format of an execution script ====
\\
This section describes the format of execution script files to run programs on the accelerator server.\\
To execute the application that requires an execution script file, create the file in advance.\\
/work area have better I/O performance than /home area, so copy your data to the /work area, execute it, and move the result to the source directory with reference to the following example.\\ For more details, see each manual.\\
\\
**Execute a non MPI program**\\
#!/bin/sh
#PBS -l select=1
#PBS -q queue
#PBS -N jobname
# Copy the job input directory to /work area and move there
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
# Execute your program
program > output file 2> error file
# Move the result to the source directory after execution
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
・Example to execute a program ‘a.out’.
#!/bin/sh
#PBS -l select=1
#PBS -q A_004
#PBS -N sample
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
./a.out > result.out 2> result.err
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
**Execute a MPI program**\\
#!/bin/sh
#PBS -l select=nodes
#PBS -q queue
#PBS -N jobname
# Copy the job input directory to /work area and move there
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
# Execute your program
mpirun [ -np MPI total tasks | -ppn MPI tasks per node ] -hostfile $PBS_NODEFILE program > output file 2> error file
# Move the result to the source directory after execution
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
・Example to run a program on 2 node and 72 MPI processes using Intel compiler.
#!/bin/sh
#PBS -l select=2
#PBS -q A_004
#PBS -N mpi
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
mpirun -np 72 -ppn 36 -hostfile $PBS_NODEFILE ./a.out > result.out 2> result.err
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
------
==== Interactive mode ====
\\
Submit a job with interactive mode.\\
Add option -I (uppercase i) to the qsub command and specify IA_001g, CA_001, CA_001g, IC_001 or CC_001 for the queue.\\
\\
**Usage**\\
$ qsub -I -q queue
\\
**Example**\\
$ qsub -I -q IA_001g
qsub: waiting for job 22351.gpu1 to start
qsub: job 22351.gpu1 ready
-bash-4.2$./a.out
\\
------
==== Submitting jobs to Shared-Queue CA_001, CA_001g and CC_001 ====
\\
CA_001, CA_001g and CC_001 are a queue used by sharing the node with other jobs.\\
Jobs executed on CA_001 or CA_001g are assigned 1CPU and 1GPU by default, up to 18 CPU and 5 GPU. Execution in interactive mode is also available.\\
Jobs executed on CC_001 are assigned 1CPU by default, up to 18 CPU. Execution in interactive mode is also available.\\
\\
**Usage**\\
・CA_001 or CA_001g\\
$ qsub -q queue [ -I ] [ -l select=1[:ncpus= number of CPU][:ngpus= number of GPU] [execution script file] ]
\\
・CC_001\\
$ qsub -q queue [ -I ] [ -l select=1[:ncpus= number of CPU] [execution script file] ]
\\
**Example**\\
・queue: CA_001, Command to execute interactive mode with 2 CPU and 1 GPU
$ qsub -I -q CA_001 -l select=1:ncpus=2:ngpus=1
qsub: waiting for job 22351.gpu1 to start
qsub: job 22351.gpu1 ready
-bash-4.2$ ./a.out
\\
・queue: CA_001g, Script to execute program a.out with 18 CPU and 5 GPU\\
To use the queue CA_001g, please submit a job from /work_da area.
#!/bin/sh
#PBS -l select=1:ncpus=18:ngpus=5
#PBS -q CA_001g
#PBS -N sample
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
mpirun -np 18 -ppn 18 -hostfile $PBS_NODEFILE ./a.out > result.out 2> result.err
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
・ queue: CC_001, Command to execute interactive mode with 2 CPU
$ qsub -I -q CC_001 -l select=1:ncpus=2
qsub: waiting for job 90289.gpu1 to start
qsub: job 90289.gpu1 ready
-bash-4.2$./a.out
\\
・ queue: CC_001, Script to execute program a.out with 18 CPU
#!/bin/sh
#PBS -l select=1:ncpus=18
#PBS -q CC_001
#PBS -N sample
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
mpirun -np 18 -ppn 18 -hostfile $PBS_NODEFILE ./a.out > result.out 2> result.err
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm
-rf $WORKDIR; fi
\\
===== Job management command =====
==== Display your own job information ====
\\
**Description**\\
Display your own job information on the supercomputer.\\
\\
**Usage**\\
$ statj [-x] [ [job_identifier | destination] ...]
\\
**List of options**\\
^Option ^Value ^
|-x |Displays status of finished, queued, and running jobs. |
\\
**Example**\\
userA@gpu2:~> statj
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------- --------------- --------- ----------------
3413.gpu1 userA A_004 STDIN 231503 1 36 690gb 24:00 R 00:00
------
==== Display job information ====
\\
**Description**\\
Display information of the jobs on the accelerator server.
\\
**Usage**\\
Default format:
qstat [-a] [-p] [-J] [-t] [-x] [ [job_identifier | destination] ...]
Long format:
qstat -f [-p] [-J] [-t] [-x] [ [job_identifier | destination] ...]
\\
**List of options**\\
^Option ^Value ^
|-a |Display memory usage, elapsed time, status of jobs, etc. |
|-p |Display the percentage of the job completion. |
|-J |Display limits status of job array. |
|-t |Displays status of jobs. |
|-x |Displays status of finished, queued, and running jobs. |
|-f |Display status in long format. |
\\
**Example**\\
userA@gpu2:~> qstat -a
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
3390.gpu1 userA A_004 abinit 193347 4 144 2760gb 72:00 R 47:28
3401.gpu1 userA A_004 prog9_1 121974 4 144 2760gb 72:00 R 47:26
userA@gpu2:~> qstat -p
Job id Name User % done S Queue
---------------- ---------------- ---------------- -------- - -----
3390.gpu1 abinit userA 2 R A_004
3401.gpu1 prog9_1 userA 0 R A_004
userA@gpu2:~> qstat -t
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
3390.gpu1 abinit userA 00:00:01 R A_004
3401.gpu1 prog9_1 userA 00:00:01 R A_004
userA@gpu2:~ > qstat -x
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
2235.gpu1 prog9_2 userA 00:00:03 F A_016
2236.gpu1 vasp4 userA 00:00:01 F A_016
2237.gpu1 prog9_1 userA 00:00:01 F A_016
The rest is omitted
...
userA@gpu2:~> qstat -f 3390.gpu1
Job Id: 3390.gpu1
Job_Name = abinit
Job_Owner = userA@gpu2
resources_used.cpupercent = 10
resources_used.cput = 00:00:01
resources_used.mem = 12836kb
resources_used.ncpus = 72
The rest is omitted
...
------
==== Display queue status ====
\\
**Description**\\
Display information about queues on the accelerator server.\\
\\
**Usage**\\
Default format:
statq [destination ...]
Long format:
statq -f [destination ...]
\\
**List of options**\\
^Option ^Value ^
|-f |Display status in long format. |
\\
**Example**\\
userA@gpu2:~> statq
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
workq 0 0 no yes 0 0 0 0 0 0 Exec
A_004 0 0 yes yes 0 0 0 0 0 0 Exec
A_008 0 1 yes yes 0 1 0 0 0 0 Exec
A_016 0 0 yes yes 0 0 0 0 0 0 Exec
DA_002g 0 0 yes yes 0 0 0 0 0 0 Exec
DC_002 0 0 yes yes 0 0 0 0 0 0 Exec
C_002 0 0 yes yes 0 0 0 0 0 0 Exec
The rest is omitted
...
userA@gpu2:~> statq -f
Queue: workq
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
enabled = False
started = True
The rest is omitted
...
------
==== Display server status ====
\\
**Description**\\
Display information about servers of the accelerator server.\\
\\
**Usage**\\
Default format:
qstat -B [destination ...]
Long format:
qstat -B -f [destination ...]
\\
**List of options**\\
^Option ^Value ^
|-B |Display server status. |
|-f |Display status in long format. |
**Example**\\
userA@gpu2:~> qstat -B
Server Max Tot Que Run Hld Wat Trn Ext Status
---------------- ----- ----- ----- ----- ----- ----- ----- ----- -----------
gpu1 0 1155 0 1 0 0 0 0 Active
userA@gpu2:~> qstat -Bf
Server: sdb
server_state = Active
server_host = sdb
scheduling = True
max_queued = [u:PBS_GENERIC=200]
The rest is omitted
...
------
==== Cancel the job before the job finished ====
\\
**Description**\\
Use qdel command for cancelling the job on the accelerator server.\\
\\
**Usage**\\
qdel [ -x ] [ -Wsuppress_email= ] job_identifier [job_identifier ...]
\\
**List of options**\\
^Option ^Value ^
|-x |Delete job and job history of the specified job. |
|-Wsuppress_email |Set limit on number of e-mails sent on deleting jobs. |
**Example**
userA@gpu2:~/work/20180712_sample> qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
3413.gpu1 abinit userA 00:00:00 R A_004
3414.gpu1 STDIN_gpu2_22 userA 00:00:00 R A_004
userA@gpu2:~/work/20180712_sample> qdel 3414.gpu1
userA@gpu2:~/work/20180712_sample> qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
3413.gpu1 abinit userA 00:00:00 R A_004
userA@gpu2:~/work/20180712_sample>
\\
===== Display information about used and remained time of job execution (jobtime command) =====
**Description**\\
Use jobtime command for listing information of your completed jobs.\\
\\
**Usage**\\
$ jobtime
\\
**Information**\\
^ Section ^ Details ^
|Last Updated |updated time |
|User |user ID |
|Total |available time of job execution |
|Used |used time |
|Remained |remained time |
\\
**Example**\\
userA@gpu2:~ > jobtime
# Last Updated: 2018/10/01 13:45
# User Total Used Remained (H)
username 500 222.32 277.68
\\
===== References for submitting job and script =====
==== How to execute a MPI job ====
\\
**Description**\\
Intel MPI is available for MPI environment.\\
\\
**Usage**\\
Use mpirun command to execute a job.
mpirun [ -np parallel number ] [ -ppn parallel number per node ] -hostfile $PBS_NODEFILE execution program
\\
To improve job performance, specify values so that:\\
[ Parallel number(the value of "-np") ] = [ Number of nodes(the value of "#PBS -l select=") ] × [ Parallel number per node(the value of "-ppn") ]
\\
**Example**\\
#!/bin/bash
#PBS -j oe
#PBS -l select=1
DIRNAME=`basename $PBS_O_WORKDIR`
WORKDIR=/work/$USER/$PBS_JOBID
mkdir -p $WORKDIR
cp -raf $PBS_O_WORKDIR $WORKDIR
cd $WORKDIR/$DIRNAME
mpirun -np 36 -hostfile $PBS_NODEFILE
/usr/local/app/ABINIT/current/src/98_main/abinit < input.files >
result.out 2> result.err
cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
\\
~~NOCACHE~~