MyCluster¶

MyCluster from Zenotech is used to set up, run, monitor and manage parallel jobs on a range of high performance computing environments without the user having to know the details of the queueing systems and job scheduler commands.

MyCluster is distributed as part of zCFD, and is automatically available from within the zCFD command line environment. Alternatively, MyCluster can be downloaded and installed separately.

To use MyCluster within the zCFD command line environment, you first need to configure it with your user details. This is so that MyCluster can keep track of your jobs, and email you with alerts. This step only needs to be done once per user.

First enter your details:

(zCFD): mycluster --firstname <Your First Name>
(zCFD): mycluster --lastname <Your Last Name>
(zCFD): mycluster --email <Your Email Address>

MyCluster generates and uses job files for zCFD (and other software packages) for each different computer system. To set up a job file for zCFD, enter (for example):

(zCFD): mycluster --create=my-job.job \
                  --jobname=my-job \
                  --project=my-project \
                  --jobqueue=my-job-queue \
                  --ntasks=12 \
                  --taskpernode=2 \
                  --script=mycluster-zcfd.bsh \
                  --maxtime=12

This will create a job file called my-job.job.

The my-job.job file can be called whatever you like, but avoid the use of spaces and other special characters in the title. The project is a billing code used on the system. If you do not know what this code should be, ask your system administrator.

The jobqueue is the name of the local job scheduler on the system (for example slurm, moab or pbs). To find out which local job scheduler is running on your system, type:

(zCFD): mycluster -q

This will also provide a list of job queues, and show how much resource is currently available on each.

The ntasks setting is the total number of processors (sockets) that you want to use. For zCFD this will be equal to the number of mesh partitions that are generated. The number of tasks per node should match the number of compute devices per node (CPU sockets, Xeon Phi devices or Nvidia GPU devices). Usually there are two sockets per node, so set taskpernode equal to 2. Your system administrator should advise if the local setup is different to this.

The script is software specific. The mycluster-zcfd.bsh script is included (along with scripts for a number of other CFD codes) in the MyCluster distribution in the share/ directory, so you do not need to alter the script setting if you are running zCFD.

The maxtime setting is used to prevent failed jobs from over-running and to allow schedulers to prioritise short jobs where they can be accommodated between larger jobs. The default setting is 12 hours (maxtime=12) but you can specify any amount of time in hours (up to the local queue maximum) can be used. The actual limit will be shown in the job file, and it is worth checking whether this is adequate.

Once created, the job file can be re-used for similar simulation tasks on the same computer. If any of the parameters change (size of job, project, etc.) then you will need to create a new job file.

To run zCFD using MyCluster, make sure that the job file is in the directory containing the mesh and input files, change to that directory and enter (for example):

Example usage:

(zCFD): (export PROBLEM=my-mesh.h5;\
         export CASE=my-case;\
         mycluster --submit=my-job.job)

where my-mesh.h5 is the name of the HDF5 mesh file and my-case.py is the name of the python input parameters file. This command will submit the job to the queue.

To track the progress of your jobs via MyCluster, enter:

(zCFD): mycluster -p

To delete a job from the queue, check the Job ID using mycluster -p and then enter:

(zCFD): mycluster -d <Job ID>

A full list of MyCluster commands can be obtained:

(zCFD): mycluster --help