====== Job submit/management commands ====== ===== Job submission ===== ==== qsub : Job submit command ==== \\ Submit command places a job on the job queue of the supercomputer.\\ Submitting options also can be described in the execution script.\\ Please submit job to appropriate queue with qsub command, not to execute your program on the front end node directly.\\ If you execute a program on the front end node, it may be canceled by the administrator because it has effects on other users.\\ \\ **Usage**\\ $ qsub [-q queue name] [-l select=number of nodes] [-N job name] [-M e-mail address] [-m mail point] [-l walltime=limit of walltime] [-l license name=number of use licenses] [execution script file] \\ **List of options**\\ ^Option ^Value ^ |-q queue name |Set the queue name.\\ As for details of queue, see a [[user_manual:supercomputer:queues_list|List of queues]]. | |-l select=number of nodes |Specify number of the nodes to be used.\\ No this option means default.(see [[user_manual:supercomputer:queues_list|List of queues]].)| |-N job name |Set the job name.\\ The name of job is up to 236 characters.\\ The real-time job reference system displays up to 64 characters.\\ If no ‘-N’ option is specified, the system assign the default job name.| |-M e-mail address |Set e-mail address to be received.\\ To receive an e-mail, ‘-m’ option is required. | |-m mail point |Specify the set of events that causes mailing to be sent to the list of users specified in the ‘-M’ option.\\ To receive an e-mail, ‘-m’ option is required. | |-l walltime=walltime |Specify the limit of walltime.\\ If no this option is specified, the walltime is default value specified in the queue. (see [[user_manual:supercomputer:queues_list|List of queues]].)\\ Appropriate value makes a queued job running frequently. | |-l license name=number of use licenses |Specify the number of licenses when you use applications that requires managed licenses.\\ If no this options specified, the job is regarded not to use applications that licenses are managed.\\ As for specifying licenses, see how to execute applications. | \\ **Example**\\ Execute following jobs\\ \\ ・queue: DP_002, number of nodes: 2, limit of walltime: 10 minutes, script file:hello.sh $ qsub -q DP_002 -l select=2 -l walltime=00:10:00 hello.sh \\ Description in the execution script. #!/bin/sh #PBS -q DP_002 #PBS -l select=2 #PBS -l walltime=00:10:00 ... \\ ・queue: P_016, script file:hello.sh, recipient address: userA@test.com\\ Mail is sent when the job begins execution (option is ‘-m b’) and terminates (option is ‘-m e’). $ qsub -q P_016 -M userA@test.com -m be hello.sh \\ Description in the execution script. #!/bin/sh #PBS -q P_016 #PBS -M userA@test.com #PBS -m be ... \\ ・queue: DP_002, application: QuantumATK, script file: atk.sh $ qsub -q DP_002 -l atk=1 -l atkdp=35 atk.sh \\ Description in the execution script. #!/bin/sh #PBS -q DP_002 #PBS -l atk=1 -l atkdp=35 ... ------ ==== aprun : Program execution command ==== \\ To execute a program, use the aprun command.\\ Be sure to use the aprun command to execute a program.\\ If executed without using aprun, the program will be executed on the I/O node, which will be annoying to other users. \\ **Usage**\\ $ aprun [-n MPI total tasks] [-d OpenMP threads][-N MPI tasks per node] [-S MPI tasks per CPU socket] [-j 0|1|N] [--cc placement method] program \\ **List of options**\\ ^Option ^Description ^ |-n MPI total tasks |Set the total number of MPI tasks. | |-d OpenMP threads |Set the number of OpenMP threads.\\ (Set OMP_NUM_THREADS together.)| |-N MPI tasks per node |Set the number of MPI tasks per node. | |-S MPI tasks per CPU socket |Set the number of MPI tasks per CPU socket. | |-j 0|1|N| |Set the number of thread per CPU core.\\ 0: Use HyperThreading (default)\\ 1: No use HyperThreading\\ N: Use HyperThreading, place N threads per core.| |--cc placement method |Set the tasks / threads placement method.\\ depth: Bind the threads so that the process is close to the allocated CPU core\\ (useful when executing OpenMP, MPI + OpenMP programs).| *To improve job performance, specify values so that:\\ [MPI total tasks(the value of "-n")] = [Number of nodes(the value of "#PBS -l select=")] × [MPI tasks per node(the value of "-N")] ------ ==== Format of an execution script ==== \\ This section describes the format of execution script files to run programs on the supercomputer.\\ To execute the application that requires an execution script file, create the file in advance.\\ /work area have better I/O performance than /home area, so copy your data to the /work area, execute it, and move the result to the source directory with reference to the following example.\\ \\ **Execute a non MPI program**\\ #!/bin/sh #PBS -l select=1 #PBS -q queue #PBS -N jobname # Copy the job input directory to /work area and move there DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME # Execute your program aprun program >output file 2>error file # Move the result to the source directory after execution cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi \\ ・Example To execute a program ‘a.out’. #!/bin/sh #PBS -l select=1 #PBS -q P_016 #PBS -N sample DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME aprun ./a.out > result.out 2> result.err cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi \\ **Execute a MPI program using ESM mode of Cray XC**\\ #!/bin/sh #PBS -l select=nodes #PBS -q queue #PBS -N jobname # Copy the job input directory to /work area and move there DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME # Execute your program aprun [ -n MPI total tasks ] [ -N MPI tasks per node ] program > output file 2> error file # Move the result to the source directory after execution cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi \\ ・Example To run a program on 1 node and 2 MPI processes. #!/bin/sh #PBS -l select=1 #PBS -q P_016 #PBS -N mpi1 DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME aprun -n 2 -N 2 ./a.out > result.out 2> result.err cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi \\ ・Example To run a program on 2 node and 2 MPI processes. #!/bin/sh #PBS -l select=2 #PBS -q P_016 #PBS -N mpi2 DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME aprun -n 2 -N 1 ./a.out > result.out 2> result.err cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi ------ ==== Interactive mode ==== \\ Submit a job with interactive mode.\\ Add option -I (uppercase i) to the qsub command and specify IP_001 for the queue. Be sure to use the aprun command to execute a program. **Usage**\\ $ qsub -I -q IP_001 \\ **Example**\\ $ qsub -I -q IP_001 qsub: waiting for job 220331.sdb to start qsub: job 220331.sdb ready Directory: /home/userA Mon Sep 23 01:03:04 JST 2019 userA@mom1:~> cd $PBS_O_WORKDIR userA@mom1: /work/userA/testdir> aprun -n 32 -j 1 ./a.out \\ ===== Job management commands ===== ==== statj : Display your own job information ==== \\ Display your own job information on the supercomputer.\\ \\ **Usage**\\ $ statj [-x] [ [job_identifier | destination] ...] \\ **List of options**\\ ^Option ^Value ^ |-x |Displays status of finished, queued, and running jobs. | \\ **Example**\\ userA@super2:~> statj Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3413.sdb userA P_016 STDIN 231503 1 36 690gb 24:00 R 00:00 ------ ==== qstat : Display job information ==== \\ Display information of the jobs on the supercomputer.\\ \\ **Usage**\\ Default format: qstat [-a] [-p] [-J] [-t] [-x] [ [job_identifier | destination] ...] Long format: qstat -f [-p] [-J] [-t] [-x] [ [job_identifier | destination] ...] \\ **List of options**\\ ^Option ^Value ^ |-a |Display memory usage, elapsed time, status of jobs, etc. | |-p |Display the percentage of the job completion. | |-J |Display limits status of job array. | |-t |Displays status of jobs. | |-x |Displays status of finished, queued, and running jobs. | |-f |Display status in long format. | \\ **Example**\\ userA@super2:~> qstat -a Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3390.sdb userA P_016 abinit 193347 4 144 2760gb 72:00 R 47:28 3401.sdb userA P_016 prog9_1 121974 4 144 2760gb 72:00 R 47:26 userA@super2:~> qstat -p Job id Name User % done S Queue ---------------- ---------------- ---------------- -------- - ----- 3390.sdb abinit userA 2 R P_016 3401.sdb prog9_1 userA 0 R P_016 userA@super2:~> qstat -t Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 3390.sdb abinit userA 00:00:01 R P_016 3401.sdb prog9_1 userA 00:00:01 R P_016 userA@super2:~ > qstat -x Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 2235.sdb prog9_2 userA 00:00:03 F P_016 2236.sdb vasp4 userA 00:00:01 F P_016 2237.sdb prog9_1 userA 00:00:01 F P_016 ... userA@super2:~> qstat -f 3390.sdb Job Id: 3390.sdb Job_Name = abinit Job_Owner = userA@nid00204 resources_used.cpupercent = 10 resources_used.cput = 00:00:01 resources_used.mem = 12836kb resources_used.ncpus = 72 ... ------ ==== statq : Display queue status ==== \\ Display information about queues on the supercomputer.\\ \\ **Usage**\\ Default format: statq [destination ...] Long format: statq -f [destination ...] \\ **List of options**\\ ^Option ^Value ^ |-f |Display status in long format. | \\ **Example**\\ userA@super2:~> statq Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ---- workq 0 0 no yes 0 0 0 0 0 0 Exec DP_002 0 0 yes yes 0 0 0 0 0 0 Exec P_016 0 1 yes yes 0 1 0 0 0 0 Exec P_032 0 0 yes yes 0 0 0 0 0 0 Exec P_064 0 0 yes yes 0 0 0 0 0 0 Exec LP_032 0 0 yes yes 0 0 0 0 0 0 Exec LP_064 0 0 yes yes 0 0 0 0 0 0 Exec ... userA@super2:~> statq -f Queue: workq queue_type = Execution total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun :0 enabled = False started = True ... ------ ==== qstat -B : Display server status ==== \\ Display information about servers of the supercomputer.\\ \\ **Usage**\\ Default format: qstat -B [destination ...] Long format: qstat -B -f [destination ...] \\ **List of options**\\ ^Option ^Value ^ |-B |Display server status. | |-f |Display status in long format. | **Example**\\ userA@super2:~> qstat -B Server Max Tot Que Run Hld Wat Trn Ext Status ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----------- sdb 0 1155 0 1 0 0 0 0 Active userA@super2:~> qstat -Bf Server: sdb server_state = Active server_host = sdb scheduling = True max_queued = [u:PBS_GENERIC=200] ... ------ ==== qdel : Cancel the job before the job finished ==== \\ Use qdel command for cancelling the job on the supercomputer.\\ \\ **Usage**\\ qdel [ -x ] [ -Wsuppress_email= ] job_identifier [job_identifier ...] \\ **List of options**\\ ^Option ^Value ^ |-x |Delete job and job history of the specified job. | |-Wsuppress_email |Set limit on number of e-mails sent on deleting jobs. | **Example** userA@super2:~/work/20180712_sample> statj Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ---------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3413.sdb userA P_016 abinit 3710 3 216 2304gb 72:00 R 00:00 3414.sdb userA DP_002 STDIN 13588 1 72 768gb 00:10 R 00:00 userA@super2:~/work/20180712_sample> qdel 3414.sdb userA@super2:~/work/20180712_sample> statj Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ---------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 3413.sdb userA P_016 abinit 3710 3 216 2304gb 72:00 R 00:00 \\ ===== Display information about used and remained time of job execution ===== ==== jobtime : Display information about used and remained time of job execution ==== \\ **Usage**\\ $ jobtime \\ **Information**\\ ^ Section ^ Details ^ |Last Updated |updated time | |User |user ID | |Total |available time of job execution | |Used |used time | |Remained |remained time | \\ **Example**\\ userA@super2:~ > jobtime # Last Updated: 2018/10/01 13:45 # User Total Used Remained (H) username 500 222.32 277.68 \\ ===== References for submitting job and script ===== ==== Specify the parameters affecting performance of job ==== \\ Specifying number of processes per core by apron command, performance of the job can be improved, because Hyper-Threading is enabled on the supercomputer.\\ \\ **Usage**\\ Specify 1 thread per physical core. aprun -j 1 program \\ **Example**\\ #!/bin/bash #PBS -j oe #PBS -l select=1 #PBS -q P_016 DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME aprun -n 36 -N 36 -j 1 ./xhpl_skl_diag_cray_opt > result.out 2> result.err cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf$WORKDIR; fi \\ ~~NOCACHE~~