This page explains how to run a job on the calculation server(s) of supercomputing system.
We assume that you can login to front end node of Large-Scale Parallel Computing Server (super.sc.imr.tohoku.ac.jp) with ssh command.
Jobs (calculations on supercomputer) running on MASAMUNE-IMR are controlled by PBS Professional job schedular.
To run a job on a server, users need to submit it to the corresponding job queue by running a script or command.
Then PBS schedules the order, time, node, etc. of queued jobs and assign the jobs to the calculation nodes accordingly.
To submit a job to a queue, you need qsub command and a script file which describes what you want to do.
You can submit a job by running the following command.
$ qsub [-q queue_name] [-l select=the_number_of_nodes] [-N job_name] [-M email_address] [-m specification_of_email_notice] [-l walltime=upper_limit_of_running_time] [-l license_name=the_number_of_license] script_file
Fortunately, these options can be contained in the script file. We highly recommend that you specify the options in this way.
The script file is typically like below.
#!/bin/sh #PBS -l select=the_number_of_nodes #PBS -q queue_name #PBS -N job_name (Write what you want to do here.)
The first line of this script means this script is run as shell script.
The second or later starting “#PBS” specify PBS options.
Let's make a basic job script and run the job.
Make a script file as follows and save as “hello.sh”.
This script submits a job named “hello” to P_016 queue.
The job uses 1 node.
#!/bin/sh #PBS -l select=1 #PBS -q P_016 #PBS -N hello # move entire directory to /work area, and go to the directory. DIRNAME=`basename $PBS_O_WORKDIR` WORKDIR=/work/$USER/$PBS_JOBID mkdir -p $WORKDIR cp -raf $PBS_O_WORKDIR $WORKDIR cd $WORKDIR/$DIRNAME # Run a program. # Standard output and error output are redirected to result.out and result.err respectively. aprun echo "Hello world!" > result.out 2> result.err # After running a program, move the results back to the original directory. cd; if cp -raf $WORKDIR/$DIRNAME $PBS_O_WORKDIR/.. ; then rm -rf $WORKDIR; fi
Then submit the script with qsub command as follows.
If it succeeded, the job ID will appear (123456.sdb in this case).
$ qsub hello.sh 123456.sdb
You can check the job status with statj command.
$ statj sdb: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 123456.sdb username P_016 hello -- 1 72 768gb 24:00 Q --
Each column shows the following information.
column | description |
---|---|
Job ID | ID assigned to the job |
Username | user ID |
Queue | queue name |
Jobname | job name |
SessID | session ID |
NDS | the number of occupied nodes |
TSK | the number of occupied CPU cores |
Req'd Memory | requested memory size |
Req'd Time | requested running time |
S | status of the job Q: being queued R: running E: exiting H: being held |
Elap Time | elapsed time |
When the job is finished, the following files will be saved in the current directory.
result.out result.err hello.o123456 hello.e123456
Files named “{jobID}.o{jobID}” and “{jobID}.e{jobID}” are standard output file and error output file, respectively.
These file will be empty since standard output and error output are redirected into result.out and result.err.
Let's see the result.
$ cat result.out Hello world! Application 5598879 resources: utime ~0s, stime ~1s, Rss ~9980, inblocks ~0, outblocks ~0
Congratulations! Now you can run your program on MASAMUNE-IMR system.
You might want to use other services of MASAMUNE-IMR. User manual provides more server-specific information.
If you want to use pre-installed application, Application list / Usage will be of your help.