Overview
This article outlines the process of running a large simulation job across several computers on a local network or cluster that requires more memory that is not available on a single computer or node on your cluster.
Important
- MPI is required.
- This feature is currently supported and available for FDTD, varFDTD, and EME in MODE starting with the 2024 R1 release.
- Distributed computing with GPU solver on FDTD is not supported, for more information on using GPU for FDTD simulations, see the Getting started with running FDTD on GPU Knowledge Base article.
- Machines should be the same operating system and the same version of Lumerical.
- Configure your cluster or machines as shown on this page.
- Ensure that simulations run on each computer individually.
- On a "local office network", this might not speed up your simulation due to the network's bandwidth.
Run from the Lumerical CAD
- Windows: Configure Intel MPI as shown in this KB.
-
Linux: See this page if you haven't installed MPI on Linux. Intel MPI is used in the example. Set the MPI environment on Linux.
/opt/intel/impi/2019.9.304/intel64/bin/mpivars.sh
Resource Configuration
- Create or open your simulation project in the Lumerical CAD/GUI,
- Open "Resources",
- Change the Name of the resource for reference,
- "Edit" the resource,
- Select "Remote: Intel MPI" (2024 and newer) or "Custom" (2023 R2 and older) as the job launching preset.
- Enter the MPI and FDTD executable in their corresponding fields.
- Check "no default options".
- Enter the total number of MPI processes (-n), processes per node (-ppn), and the hostnames/IP (-hosts, comma separated) of the computers, in the "extra command line options", under "MPI options".
- Save to apply your changes.
Note: "Run Tests" will not work with the manual MPI settings (no default options).
Figure: Distributed computing using -hosts list
Using a machine file
- Create a 'machine file' containing the host names of the computers that will be running the simulation and corresponding ranks per host separated by a ":" (colon).
#example of host/machine file contents
localhost:4
remote2:8
node3:4
- Follow Steps 1 to 7 above.
- Enter the "-machine" argument followed by the machinefile's path and filename in the "extra command line options", under "MPI options".
- Save and apply settings.
Figure: Distributed computing using machine file for different MPI processes per host
Run from Terminal
- Open Terminal.
- Change the directory to the location of your simulation file.
- Set the MPI environment. We are using Intel MPI in the example.
- Run using the MPI runtime executable with the corresponding MPI processes (-n) and ranks per node (-ppn) and remote machines (-hosts) and the corresponding FDTD engine executable.
cd /path/to/simulationfile/
/opt/intel/impi/2019.9.304/intel64/bin/mpivars.sh
mpirun -n 16 -ppn 4 -hosts localhost,host2,host3,host4 /opt/lumerical/[[verpath]]/bin/fdtd-engine-impi-lcl -t 1 simulationfile.fsp
Using a machine file
- When using a machine file, enter the "-machine" argument followed by the machine file's filename and the corresponding FDTD engine executable.
cd /path/to/simulationfile/
/opt/intel/impi/2019.9.304/intel64/bin/mpivars.sh
mpirun -machine /path/machinefile.txt /opt/lumerical/[[verpath]]/bin/fdtd-engine-impi-lcl -t 1 simulationfile.fsp
See Controlling Process Placement with the Intel® MPI Library for more information.
Calculating how many solver licenses are required
The number of solver/accelerator/HPC licenses required to run your jobs depends on the number of cores used, the number of concurrent simulations, the type of job, and the type of license you have purchased. See the Understanding solver, accelerator, and HPC license consumption page for details.
Issues with MPI
Please refer to the OpenMPI documentation or Intel MPI support for issues related to running simulation jobs with MPI.
See also
Compute resource configuration use cases – Ansys Optics
Resource configuration elements and controls – Ansys Optics
Lumerical solve, accelerator and Ansys HPC license consumption – Ansys Optics
Configuring resources for parallel jobs across several computers on Windows – Ansys Optics