CCMS

Center for Computational Materials Science
Institute for Materials Research,
Tohoku University

FAQ

  1. General
  2. Large-scale parallel computing server
  3. Accelerator server
  4. Parallel computing and informatics server
  5. Application software
  6. Node-time amount monitoring system
  7. Supercomputing system account

General

Q.A-1 How to transfer files between PC and supercomputing system.
A.A-1 If you have SFTP client software, you can transfer files by port forwarding. You can get information here.
Q.A-2 How to check the disk quota and amount of data on own home directory.
A.A-2 Execute the following command on the Accelerator server (gpu.sc.imr.tohoku.ac.jp) or Visualization server (vis.sc.imr.tohoku.ac.jp).
lfs quota -p $UID -h /home
Disk quotas for prj UID (pid XXX):
Filesystem used quota limit grace files quota limit grace
/home 196k 500G 550G - 15 0 0 -
used: amount of current disk usage
quota: quota size
Q.A-3 How to expand the quota size of my home directory.
A.A-3 Contact us. We will send you an application for a quota increase request.
Q.A-4 When we log into the supercomputing system, the following error message appears.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: UNPROTECTED PRIVATE KEY FILE! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0775 for '/home/UID/.ssh/keys/id_rsa' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/home/UID/.ssh/keys/id_rsa": bad permissions
Permission denied (publickey).
A.A-4 Change the permission by using the following command.
chmod 600 /home/UID/.ssh/keys/id_rsa
Q.A-5 Accesing internet through proxy.
A.A-5 Set the proxy server name to proxy.imr.tohoku.ac.jp and the port number to 8080.
Execute the following commands if you set it using environment variable.
export http_proxy=http://proxy.imr.tohoku.ac.jp:8080/
export ftp_proxy=http://proxy.imr.tohoku.ac.jp:8080/
export https_proxy=http://proxy.imr.tohoku.ac.jp:8080/
Q.A-6 How is the order of job execution determined?
A.A-6 Priority is calculated for each user based on the amount of used resources in the past, and jobs are executed from the user with the highest priority.
The weight of the amount of used resources decrays over time.
Q.A-7 How to reduce the waiting time?
A.A-7 The backfill*1 is enabled on our system. If you reduce the walltime (-l walltime), waiting time can decrease.
*1: A scheduling optimization which allows jobs to run out of order if they can be executed without delaying the highest priority job.

Large-scale parallel computing server

Q.B-1 During the execution of job, the following message appears.
forrtl: severe (41): insufficient virtual memory
A.B-1 This error will be caused by the shortage of memory space. By decreasing the number of process per node using -N option of aprun command, you can virtually increase the memory space per process.
Q.B-2 During the execution, the following error message appears.
apsched: claim exceeds reservation's node-count
A.B-2 This error message appears when the requested computation resources exceeded the upper limit. Please specify the number of processes/threads per node is 72 or less.
Q.B-3 The following error message appears when I run program interactively.
apsched: request exceeds max alloc
A.B-3 Please run it like below by using debug queue.
$ qsub -I -q IP_001
$ aprun -n 36 -N 36 -j 1 ./a.out

Accelerator server

Q.C-1 The following error message appears when I submit job to the debug queue.
ERROR: Submitted job uses the area of Lustre File System.
Please submit the job from GPFS area for performance improvement.
1. Create working directory in GPFS
(e.g. Lustre -> /work, GPFS -> /work_da )
2. Prepare input files in working directory
3. Submit a job from working directory
4. Move the files to your home directory
A.C-1 The working directory of debug queue is /work_da. Create own working directory into /work_da and submit jobs from it.
Q.C-2 The following error message appears when I submit job.
qsub: request rejected as filter hook 'qsub_filter' encountered an exception. Please inform Admin
A.C-2 One cannot submit jobs to the accelerator server from the large-scale parallel computing server. Submit jobs from gpu.sc.imr.tohoku.ac.jp.

Parallel computing and informatics server

Q.D-1 How to run program on the parallel computing and informatics server.
A.D-1 One can submit jobs from the accelerator server. Submit jobs from gpu.sc.imr.tohoku.ac.jp and specify #PBS -q C_002.

Application software

Q.E-1 How to use the Gaussian utilities (formchk, cubegen, etc.) on Large-scale parallel computing server.
A.E-1 Execute the following command to set the environment of Gaussian.
source /work/app/Gaussian/g16.profile
Q.E-2 Which directory contains the pseudo potential files for VASP?
A.E-2 It is found at /work/app/VASP_potential on the Large-scale parallel computing server.
Q.E-3 Which directory contains the precalculated kernel (vdw_kernel.bindat) for the calculation of vdw-DFT?
A.E-3 It is found at /work/app/VASP_potential on the Large-scale parallel computing server.
Q.E-4 During the execution of VASP, the following error message appears.
M_divide: can not subdivide 36 nodes by X
A.E-4 Specify the divisor of number of cores in NPAR tag of INCAR file.
Q.E-5 During the execution of VASP, the following error message appears.
vdW-DF calculation: either NGZhalf or NGXhalf needs to be used for compilation
A.E-5 Use the executable of /work/app/VASP5/current/bin/vasp_std.
Q.E-6 During the execution of VASP, the following error message appears.
ERROR: non collinear calculations require that VASP is compiled without the flag -DNGXhalf and -DNGZhalf
A.E-6 Use the executable of /work/app/VASP5/current/bin/vasp_ncl.
Q.E-7 During the execution of VASP on the accelerator server, the following error message appears.
A pointer passed to DEALLOCATE points to an object that cannot be deallocated
A.E-7 Use the executable of /usr/local/app/VASP5/vasp.5.4.4_mod/bin/vasp_gpu.

Node-time amount monitoring system

Q.F-1 Are there rules for Node-time amount monitoring are applied only when using Large-scale parallel computing server?
A.F-1 It is applied to Large-scale parallel computing server, accelerator server, and parallel computing and informatics server.
Q.F-2 How can user check the assigned and remained node-time?
A.F-2 You can check them by jobtime command. You can also check them from the Job Time menu of Real-time job reference system.
Q.F-3 How is the accumulated node-time estimated?
A.F-3 They are estimated by the following formulae summing up for all the jobs executed on the servers.
[elapsed time] x [number of nodes]
Q.F-4 What would happen on a running job if the assigned node-time is consumed during the execution?
A.F-4 The job is not terminated and can be executed.
Q.F-5 Is the execution time accumulated when the job is terminated due to a trouble of the system?
A.F-5 This is counted in the accumulated node-time, however the same amount is additionally assigned of user’s original node-time. Therefore, node-time is not affected.
Q.F-6 The following error message appears when a job is submitted.
ERROR: No assigned time remained. Check results by jobtime command.
A.F-6 This error message appears when the assigned node-time is consumed. Check your node-time by jobtime command. If the remained indicates 0, no more job can be submitted to the servers.
Q.F-7 What can be done when the node-time is consumed?
A.F-7 The user should consult subject leader. Only subject leader can reassign node-time of his/her subject group. You can see the subject leader’s email address from the Job Time menu of Real-time job reference system.
Q.F-8 How can Subject leader check the consumed node-time of his/her subject group?
A.F-8 You can check them from the Subject menu of Real-time job reference system.
Q.F-9 Is it possible to change the node-time assigned to collaborators of the subject group?
A.F-9 Subject leader can adjust the node-time of each user of his/her own subject group within the initially assigned total amount using Real-time job reference system.
Q.F-10 Is it possible to increase the amount of node-time for subject?
A.F-10 Please contact us.
Q.F-11 If a member of subject group will use application software on own PC, how the node-time is assigned?
A.F-11 Specify 0 to him/her.

Supercomputing system account

Q.G-1 How to access to the supercomputing system.
A.G-1 Please see here about qualification and procedures.
Q.G-2 About home directories and data after the expiration of your account.
A.G-2 Your home directory and data are deleted upon the expiration of your account and upon your request of the termination of the account. Be sure to renew your account by the specified date each year. You are also encouraged to download the data you need each year.