Applications

From Molecular Modeling Wiki

Revision as of 09:32, 29 September 2010 by Admin (Talk | contribs)
Jump to: navigation, search

Contents

General Information

Scope of this page

The description of applications on this page is valid for newer clusters starting from lithium, through uranium, barium, thallium, and newer. It is also partially valid for francium and helium, but there may be some differences.

Parallel calculations

Multiprocessor clusters are built to be able to run calculations in a parallel way. However, using parallel calculations on a cluster requires a special approach when integrating parallelized applications with a queueing system, and almost each application needs to be configured and tuned individually. It is necessary to follow strict rules to use the cluster in an efficient way - please read the application notes below to find specific demands of the installed applications.

List

Turbomole 5.10 / 6.1 [1]
Amber 10 / 11 [2]
Molpro 2008 [3]
Molpro 2009 [4]
Gaussian 09 [5]

Turbomole 5.10 / 6.1 / 6.2

The TurboMole program is supplied in a form of binary files (no source code) which are compiled and linked with an HP implementation of MPI libraries, what imposes strict requiements to the environment setup of parallel calculations. TurboMole needs to have some variables defined and files created to be able to run in parallel, especially when parallel run across machines is reqired. The following instruction describe the use of TurboMole 6.2. To use TurboMole 6.1 or 5.10, please modify the paths replacing all occurences of 6.2 with 6.1 or 5.10.

To submit a parallel TurboMole job:

  1. Copy a sample script
    /usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_mp_scratch
    or
    /usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_mp
    to your data directory and rename it. Unless you have a very good reason to select the second option use the first sample file (dscf_sample_mp_scratch), which runs all calculations in the scratch directory. This variant should be faster and more friendly to the internal cluster network.
  2. Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
  3. If your TuboMole control file contains $scfintunit section, specify the file location as file=/scratch/<username>/<filename> where <username> is your login name and <filename> is an arbitrary file name. Please note that saving the file into your home directory will most likely lead to a very slow calculation and may even cause entire cluster slowdown.
  4. Submit the calculation with
    qsub -q <queue_name> -pe hpmpi <n> <scriptname>
    where <queue_name>is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.


Note Note: Please note that the parallel environment (the parameter you enter after -pe option in qsub) for TurboMole is hpmpi (not mpi) - the parallel environments must be used as described here and may not be chosen freely!


Note Note: To run the NumForce script please se the numforce_sample_mp sample submission script in the directory noted above. As NumForce is not a native MPI parallel application, it needs a special treatment to be able to run many calculation in pseudoparallel mode - most notably it needs -mfile <file> parametr to be specified on the command line. See sample script to know how.

To submit a single-processor TurboMole job:

  1. Copy a sample script
    /usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_sp_scratch
    or
    /usr/local/programs/common/turbomole/turbomole-6.2/sub/dscf_sample_sp
    to your data directory and rename it. Unless you have a very good reason to select the second option use the first sample file (dscf_sample_sp_scratch), which runs all calculations in the scratch directory. This variant should be faster and more friendly to the internal cluster network.
  2. Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
  3. If your TuboMole control file contains $scfintunit section, specify the file location as file=/scratch/<username>/<filename> where <username> is your login name and <filename> is an arbitrary file name. Please note that saving the file into your home directory will most likely lead to a very slow calculation and may even cause entire cluster slowdown.
  4. Submit the calculation with
    qsub -q <queue_name> <scriptname>
    where <queue_name> is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.


Note Note: The older TurboMole versions (before 5.10) use different system of parallel libraries and thus cannot be run (in multiprocessor mode) the same way as decribed above. Please contact us if you need to run older version in parallel.


Amber 10 / 11

The Amber suite consists of a set of programs and utilities. It is generaly impossible to create a universal submission script, so it is user's responsibility to prepare a script and submit it properly. The following instruction describe the use of Amber 11. To use Amber 10, please modify the paths replacing all occurences of 11 with 10.

To submit a parallel Amber job:

  1. Copy a sample script
    /usr/local/programs/common/amber/amber11/sub/amber_sample_mp
    to your data directory and rename it.
  2. Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
  3. Submit the calculation with
    qsub -q <queue_name> -pe mpi_alt <n> <scriptname>
    where <queue_name> is a queue name, <n> is a number of processors, and <scriptname> is a name of the script you used in step 1.


Note Note: Please note that the parallel environment (the parameter you enter after -pe option in qsub) for Amber is mpi_alt (not mpi) - the parallel environments must be used as described here and may not be chosen freely!


To submit a single-processor Amber job:

  1. Copy a sample script
    /usr/local/programs/common/amber/amber11/sub/amber_sample_sp
    to your data directory and rename it.
  2. Edit the script and modify the part starting with a comment MODIFY HERE; do not modify other parts, unless you know what you do.
  3. Submit the calculation with
    qsub -q <queue_name> <scriptname>
    where <queue_name> is a queue name and <scriptname> is a name of the script you used in step 1.


Molpro 2008

The Molpro program (actual version 2008.1, patch level 42 as of July 29, 2009) can be run on a single processor or on multiple processors using multiprocessing (based on TCGMSG over MPI), multithreading (using parallelized versions of BLAS/LAPACK MKL libraries) or using combination of both methods.

The clear MPI multiprocessor calculation creates as many processes as specified during the submission; these processes communicate using MPI messages and if the cluster construction permits, they can run on different cluster nodes. If each process contains just one thread (see below), the CPU load created by this process should be close to 100%.

The clear multithreaded run creates just one process, but in parts, where the BLAS/LAPACK routines are parallelized, the process splits to as many threads as specified during submission. The whole process (all threads) must run on a single computer and have access to a single memory pool. For multithreaded calculations with M threads the CPU load of one process should be close to M*100%.

Both methods may be combined - Molpro can run multiple MPI processes each multithreaded. The optimal setup of multiprocessing/multithreading largely depends on the type of calculation (processor time and memory requirements and the type of calculations) and the cluster hardware and capabilities (number of processors, memory per CPU, possibility of internode parallelization); in most cases, the optimum must be just found by testing various combinations.

To submit a parallel Molpro job, use M08_MP script

M08_MP <input> <queue_name> <number_of_cpus> <number_of_threads>

where <input> is a name of the input file, <queue_name> is a queue name, <number_of_cpus> is the total number of processors the calculation will occupy by any parallelization method, and <number_of_threads> is a number of threads per process. As follows from this description, the <number_of_cpus> must always be a multiple of <number_of_threads>.

  • Example 1: to submit a calculation that runs as MPI multiprocessor on <N> processors (one thread per process) use
    M08_MP <input> <queue_name> <N> 1
  • Example 2: to submit a calculation that runs with no MPI multiprocessing but multithreaded on <M> threads use
    M08_MP <input> <queue_name> <M> <M>
  • Example 3: to submit a calculation that runs <N> processes with <M> threads each use
    M08_MP <input> <queue_name> <N*M> <M>

To submit a single-processor Molpro job, use M08_SP script

M08_SP <input> <queue_name>

where <input> is a name of the input file and <queue_name> is a queue name.

For the input file syntax and keywords description, see Molpro User's Manual


Molpro 2009

The Molpro program (actual version 2009.1, patch level 21 as of October 27, 2009) can be run exactly the same way as Molpro 2008 above; corresponding scripts are M09_SP and M09_MP.

Please note that the syntax of the input file has changed in this version. As explained on Molpro web page ... This change will, unfortunately, render many inputs incompatible with 2008.1 and this change will affect almost all input files ... Please read New features of MOLPRO2009.1 to get more information.

Note Note: On October 27, 2009, the Molpro program was upgraded to patchlevel 21 and recompiled with the max number of basis functions changed from 2000 to 3000.


Gaussian 09

Tha Gaussian calculation can be run as a single-processor or shared-memory multiprocessor job (within a single node). The number of processors used for the calculation is defined in the Gaussian input file with the %nprocshared directive.

To submit a Gaussian job, run the G09 script

G09 <input> <queue_name>

The script looks for the %nprocshared directive in the input file and submits the job to the queuing system according to its value. If %nprocshared is missing in the input file, or if %nprocshared=1, the job becomes a standard single-processor Gaussian job.



Personal tools