Getting started with running FDTD on GPU

The FDTD solver in Ansys Lumerical FDTD™ supports running 3D FDTD simulations on GPU starting in the 2023 R2 release. Calculations using the GPU can significantly speed up simulations. The FDTD solver supports single GPU, multi-GPU, as well as multi-node Multi GPU calculations.

This page summarizes requirements and limitations of the current FDTD GPU solver, and points to various pages on specific setup instructions for different GPU applications.

Note: Operations other than solver, such as meshing and script commands, still use the CPU.

Hardware requirements

GPU calculations with FDTD leverage CUDA 12, which in turn requires specific versions of the Nvidia CUDA driver, as well as a specific Compute Capability version. The driver and Compute Capability requirements are as follows:

Driver version:
- 525.60.13 or later for Linux
- 527.41 or later for Windows
Compute Capability: Compute Capability 5.0 or higher (Maxwell microarchitecture or newer). Please consult your GPU specifications to find the Compute Capability suitable for your GPU. Drivers of older devices were discontinued in January 2023.

All GPU cards following the requirements above are compatible with GPU simulations in FDTD. For a list of cards that were explicitly tested, refer to the Lumerical section of this document on GPU compute capabilities for all Ansys products.

Note: When using an Nvidia Blackwell chip with FDTD, the Nvidia Fabric manager may be required when using NVLink, even if only one instance is used. In this case, you may encounter errors such as “GPU detected by NVML unrecognized by CUDA12.9 runtime: system not yet initialized.” To solve this issue, follow the official Nvidia documentation or documentation from your system provider on enabling the Nvidia Fabric Manager. Additionally, you can also use nvidia-smi -q to confirm whether your Blackwell GPU is initialized by examining its CliqueId field.

Licensing

Business (Standard) and Enterprise licenses all support running FDTD on GPU, except for multi-node multi-GPU computations, which are only available on business and enterprise licenses.

The calculation of license consumption for GPU depends on the number of streaming multiprocessors (SMs) in the GPU(s) in the resource. For each GPU, you must use all its SMs.

In addition to the SMs in the GPU(s), the number of total cores in the CPU also affect the number of required licenses, because operations other than running the solver, such as meshing and script commands, still run on the CPU. If more licenses are needed by the total number of cores in the CPU than the SMs in the GPU, the higher license count of the two is required to run the software. You can set the threads column of the resource to “auto” to use as many threads as allowed by the number of licenses checked out by the GPU.

For more information on the number of licenses required, follow the corresponding section on GPU Jobs in the Ansys optics solve, accelerator, and Ansys HPC license consumption Knowledge Base article. License sharing applies to GPU in the same way it applies to CPU.

To calculate the number of licenses required, you can use the in-product License estimation utility, or the Ansys HPC licensing calculator. For more information, see the License consumption calculation tools section of the Ansys optics solve, accelerator, and Ansys HPC license consumption Knowledge Base article.

Supported simulation objects and limitations

The FDTD GPU solver only supports 3D FDTD simulations.

In general, features available for the CPU in 3D FDTD simulations are all available for the GPU, except for those with caveats and limitations discussed in this section. In local simulations, you can use the GPU memory check feature to check if all simulation objects are compatible with GPU.

Sources

Total Field Scattered Field (TFSF) source: FDTD GPU does not support TFSF sources and an error is shown if a TFSF source is present in the simulation.
Broadband Fixed Angle Source Technique (BFAST) source: FDTD GPU does not support BFAST sources and an error is shown if BFAST is present in the simulation.

Monitors

Time monitors:
- Time monitors reduce performance on GPU. We suggest using them only for debugging and preliminary simulations.
- Spatial interpolation will be defaulted to ‘none’ when using the GPU. Note: Object properties of time monitors will not be changed under analysis mode, for example:
```
setnamed("time monitor", "spatial interpolation", "nearest cell");
run("FDTD", "GPU");
getresult("time monitor", "E"); #< should match having "none" spatial interpolation 
switchtolayout;
?getnamed("time monitor", "spatial interpolation"); #< should still be "nearest cell"
```
Frequency domain monitors:
- Partial and total spectral averaging is not supported
- Time apodization will be defaulted to ‘none’ when using the GPU
  
  Note: Object properties of frequency domain monitors will not be changed under analysis mode, in the same way as time monitors above.
Movie monitors: FDTD GPU does not support movie monitors, and they do not record movies if a simulation runs with a movie monitor in it.

Materials

FDTD GPU does not support the following categories of materials, and an error is displayed if they are present in the simulation:

2D standard optical conductivity models: Except for PEC material assigned to a 2D geometry, FDTD GPU does not support 2D standard optical conductivity models.
np density index perturbation: FDTD GPU does not support materials with the np density grid attribute. Temperature-dependent index perturbation is supported.
Other advanced permittivity models: FDTD GPU does not support other advanced permittivity models, except for those with the temperature grid attribute.
Flexible Material Plugin Framework: FDTD GPU does not support materials from the Flexible Material Plugin Framework.

Other

PML Types: FDTD GPU does not support the “Uniaxial anisotropic PML (legacy)” PML type, and an error is shown if a simulation runs with this present.
Checkpoints: Checkpoints in FDTD simulation is not supported on GPU.
Parallel engine options: Parallel engine options in the FDTD Solver simulation object are ignored when using the GPU.

GPU simulation types and limitations

You can run FDTD GPU with single simulations and parameter sweeps locally, with a job scheduler, or using Ansys Cloud Burst Compute™. The GPU solver also supports concurrent parametric computing with the local and remote resources, as well as distributed computing either with multiple GPUs on a single node, or with multi-node multi-GPU distributed computing.

For each type of simulation, there are various caveats and limitations, which are discussed in the points below. The subsequent section, GPU resource pages, provides links to instructional pages on how to run different types of GPU simulations.

The following performance tips and caveats applies to all GPU simulations:

The number of SMs used in each GPU is not configurable. You can check the number of SMs using Nvidia’s documentation, or third-party websites. For your local computer, you can also use one of several methods shown in the Check GPU streaming multiprocessor count page. As with CPU, the overall memory bandwidth is more important for performance than the number of SMs (see FDTD benchmark on CPU).
For multiple jobs running on a resource with a single GPU, it is recommended to run them in series rather than in parallel. Running in parallel requires as many licenses as the number of jobs but it takes about the same time as running in serial. For example, running two jobs simultaneously in a GPU with 16 SMs or less requires two licenses, but it takes approximately the same time to run one job after the other in the same machine with one license only.
You cannot use GPU and CPU solvers simultaneously.

The following caveats and limitations apply to running simulations that use multiple GPUs on either the local computer or a single remote node:

The solver supports running with multiple GPUs on a single node for both Windows and Linux. By default, the solver uses all available GPUs, but you can specify the GPUs to use with resource configuration.

The following caveats and limitations apply to running simulations that are distributed between multiple GPUs on multiple nodes (multi-node multi-GPU):

Multi-node multi-GPU is only available with Enterprise and Business licensing.
Multi-node multi-GPU is supported with all MPI, but a CUDA-aware MPI is required to maximize performance. Further configuration information is in the Resource configuration for multi-node multi-GPU simulations article.
When using the license estimation utility, ensure that the total number of SMs across all nodes is entered into the SM estimate box.

The following caveats and limitations apply to concurrent computing, for both the local computer and remote nodes:

Concurrent jobs, each assigned to their own GPU, run at full speed. Concurrent jobs assigned to the same GPU may suffer from poor performance.
If there are multiple GPUs on a remote machine, you may select a particular GPU by specifying the appropriate CUDA_VISIBLE_DEVICEs environment variable to your MPI.

GPU resource pages