Applications
From Molecular Modeling Wiki
Contents |
General Information
Scope of this page
The description of applications on this page is valid for newer clusters starting from lithium, through uranium, barium, thallium, and newer. It is also partially valid for francium and helium, but there may be some differences.
Parallel calculations
Multiprocessor clusters are built to be able to run calculations in a parallel way. However, using parallel calculations on a cluster requires a special approach when integrating parallelized applications with a queueing system, and almost each application needs to be configured and tuned individually. It is necessary to follow strict rules to use the cluster in an efficient way - please read the application notes below to find specific demands of the installed applications.
List
Turbomole 5.10 / 6.1 [1]
Amber 10 / 11 [2]
Molpro 2008 [3]
Molpro 2009 / 2010 [4]
Gaussian 09 [5]
Turbomole 5.10 / 6.1 / 6.2
The TurboMole program is supplied in a form of binary files (no source code) which are compiled and linked with an HP implementation of MPI libraries, what imposes strict requiements to the environment setup of parallel calculations. TurboMole needs to have some variables defined and files created to be able to run in parallel, especially when parallel run across machines is reqired. The following instruction describe the use of TurboMole 6.2. To use TurboMole 6.1 or 5.10, please modify the paths replacing all occurences of 6.2 with 6.1 or 5.10.
To submit a parallel TurboMole job:
- Copy a sample script
or/usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_mp_scratch
to your data directory and rename it. Unless you have a very good reason to select the second option use the first sample file (/usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_mp
dscf_sample_mp_scratch
), which runs all calculations in the scratch directory. This variant should be faster and more friendly to the internal cluster network. - Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
- If your TuboMole control file contains $scfintunit section, specify the file location as file=/scratch/<username>/<filename> where <username> is your login name and <filename> is an arbitrary file name. Please note that saving the file into your home directory will most likely lead to a very slow calculation and may even cause entire cluster slowdown.
- Submit the calculation with
where <queue_name>is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.qsub -q <queue_name> -pe hpmpi <n> <scriptname>
To submit a single-processor TurboMole job:
- Copy a sample script
or/usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_sp_scratch
to your data directory and rename it. Unless you have a very good reason to select the second option use the first sample file (/usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_sp
dscf_sample_sp_scratch
), which runs all calculations in the scratch directory. This variant should be faster and more friendly to the internal cluster network. - Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
- If your TuboMole control file contains $scfintunit section, specify the file location as file=/scratch/<username>/<filename> where <username> is your login name and <filename> is an arbitrary file name. Please note that saving the file into your home directory will most likely lead to a very slow calculation and may even cause entire cluster slowdown.
- Submit the calculation with
where <queue_name> is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.qsub -q <queue_name> <scriptname>
Note: | The older TurboMole versions (before 5.10) use different system of parallel libraries and thus cannot be run (in multiprocessor mode) the same way as decribed above. Please contact us if you need to run older version in parallel. |
Amber 10 / 11
The Amber suite consists of a set of programs and utilities. It is generaly impossible to create a universal submission script, so it is user's responsibility to prepare a script and submit it properly. The following instruction describe the use of Amber 11. To use Amber 10, please modify the paths replacing all occurences of 11 with 10.
To submit a parallel Amber job:
- Copy a sample script
to your data directory and rename it./usr/local/programs/common/amber/amber11/sub/amber_sample_mp
- Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
- Submit the calculation with
where <queue_name> is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.qsub -q <queue_name> -pe mpi_alt <n> <scriptname>
To submit a single-processor Amber job:
- Copy a sample script
to your data directory and rename it./usr/local/programs/common/amber/amber11/sub/amber_sample_sp
- Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
- Submit the calculation with
where <queue_name> is a queue name and <scriptname> is a name of the script you used in step 1.qsub -q <queue_name> <scriptname>
Molpro 2008
The Molpro program (actual version 2008.1, patch level 42 as of July 29, 2009) can be run on a single processor or on multiple processors using multiprocessing (based on TCGMSG over MPI), multithreading (using parallelized versions of BLAS/LAPACK MKL libraries) or using combination of both methods.
The clear MPI multiprocessor calculation creates as many processes as specified during the submission; these processes communicate using MPI messages and if the cluster construction permits, they can run on different cluster nodes. If each process contains just one thread (see below), the CPU load created by this process should be close to 100%.
The clear multithreaded run creates just one process, but in parts, where the BLAS/LAPACK routines are parallelized, the process splits to as many threads as specified during submission. The whole process (all threads) must run on a single computer and have access to a single memory pool. For multithreaded calculations with M threads the CPU load of one process should be close to M*100%.
Both methods may be combined - Molpro can run multiple MPI processes each multithreaded. The optimal setup of multiprocessing/multithreading largely depends on the type of calculation (processor time and memory requirements and the type of calculations) and the cluster hardware and capabilities (number of processors, memory per CPU, possibility of internode parallelization); in most cases, the optimum must be just found by testing various combinations.
To submit a parallel Molpro job, use M08_MP script
M08_MP <input> <queue_name> <number_of_cpus> <number_of_threads>
where <input> is a name of the input file, <queue_name> is a queue name, <number_of_cpus> is the total number of processors the calculation will occupy by any parallelization method, and <number_of_threads> is a number of threads per process. As follows from this description, the <number_of_cpus> must always be a multiple of <number_of_threads>.
- Example 1: to submit a calculation that runs as MPI multiprocessor on <N> processors (one thread per process) use
M08_MP <input> <queue_name> <N> 1
- Example 2: to submit a calculation that runs with no MPI multiprocessing but multithreaded on <M> threads use
M08_MP <input> <queue_name> <M> <M>
- Example 3: to submit a calculation that runs <N> processes with <M> threads each use
M08_MP <input> <queue_name> <N*M> <M>
To submit a single-processor Molpro job, use M08_SP script
M08_SP <input> <queue_name>
where <input> is a name of the input file and <queue_name> is a queue name.
For the input file syntax and keywords description, see Molpro User's Manual
Molpro 2009
The Molpro program (actual version 2009.1, patch level 21 as of October 27, 2009) can be run exactly the same way as Molpro 2008 above; corresponding scripts are M09_SP and M09_MP.
Please note that the syntax of the input file has changed in this version. As explained on Molpro web page ... This change will, unfortunately, render many inputs incompatible with 2008.1 and this change will affect almost all input files ... Please read New features of MOLPRO2009.1 to get more information.
Note: | On October 27, 2009, the Molpro program was upgraded to patchlevel 21 and recompiled with the max number of basis functions changed from 2000 to 3000. |
Molpro 2009 / 2010
The Molpro program (actual version 2010.1, patch level 2, as of Sep 20, 2010, and 2009.1, patch level 21, as of Oct 27, 2009) can be run exactly the same way as Molpro 2008 above; corresponding scripts are M10_SP and M10_MP (or M09_SP and M09_MP for 2009.1 version).
Please note that the syntax of the input file has changed in 2009 version. As explained on Molpro web page ... This change will, unfortunately, render many inputs incompatible with 2008.1 and this change will affect almost all input files ... Please read New features of MOLPRO2009.1 to get more information.
Gaussian 09
Tha Gaussian calculation can be run as a single-processor or shared-memory multiprocessor job (within a single node). The number of processors used for the calculation is defined in the Gaussian input file with the %nprocshared directive.
To submit a Gaussian job, run the G09 script
G09 <input> <queue_name>
The script looks for the %nprocshared directive in the input file and submits the job to the queuing system according to its value. If %nprocshared is missing in the input file, or if %nprocshared=1, the job becomes a standard single-processor Gaussian job.