Since the 2023 R2 release, FDTD supports GPU calculations. This page summarizes the requirements and current limitations of the FDTD GPU solver.
To run your FDTD simulations on GPU, you will need the Nvidia CUDA driver version 450.80.02 or later (Linux), and version 452.39 or later (Windows). Additionally, your Nvidia GPU must comply with the following:
- GPU must offer Compute Capability greater or equal to 3.0 (Kepler microarchitecture or newer).
- Drivers of older devices were discontinued in January 2019.
- Unified Memory must be available and enabled
- always enabled on desktop, laptop and bare-metal servers
- usually enabled on cloud instances that advertise 'GPU pass-through', including AWS EC2 instances
- other virtual environment service providers should consult the NVidia Virtual GPU Software User Guide.
- the hypervisor must be configured to provide GPU pass-through (where the physical device is dedicated to a particular virtual machine)
- unified-memory may need to be enabled for each specific vGPU.
To monitor GPU usage, use the 'GPU-Util' value reported by the NVidia System Management command-line utility. Windows users should note that the Windows Task Manager only reports graphics-related GPU utilization.
The GPU solver license consumption is similar to the CPU solver (see Ansys optics solve, accelerator and Ansys HPC license consumption). For license usage calculation, we treat Streaming Multiprocessors (SM) the same as core for CPU. For instance, for Ansys Standard/Business licensing, a Lumerical Accelerator (engine license) is required for every 32 SMs with no partial counting. For example, a GPU with 40 SMs requires two licenses to run any job.
FDTD jobs will use all the available SMs in the GPU, i.e., the number of SMs per job is not user configurable. For this reason, there must be enough licenses available for all the GPU SMs.
For multiple jobs it is recommended to run them in series rather than in parallel. Running in parallel requires as many licenses as the number of jobs but it takes about the same time as running in serial. For example, running two jobs simultaneously in a GPU with 32 SMs or less requires two licenses, but it takes approximately the same time as running one job after the other in the same machine with one license only.
The number of SMs in the GPU can be found in NVidia's documentation, on third party websites, by running the Job Manager Configuration test for the GPU resource (localhost only),
in the log file when the FDTD GPU engine is run
start loading CUDA query DLL...
load CUDA query DLL successfully.
GPU streaming multiprocessors(SMs): 16
or after the FDTD GPU engine is run in the FDTD result 'total gpu sms'
Note: as with CPU, the overall memory bandwidth is more important for performance than the number of cores (see FDTD benchmark on CPU).
The FDTD GPU solver can only run 3D FDTD simulations. The “express mode” option should be enabled in the FDTD object properties (advanced options tab).
The GPU solver is suitable for narrowband simulations, or simulations with non-dispersive materials.
Any movie monitors are disabled.
User toggles Job Manager from 'CPU' to 'GPU'
- If there are multiple GPU on the local machine, the user may select a particular GPU in the job manager. When running the engine, the job manager will configure the standard CUDA_VISIBLE_DEVICES environment variable.
- If there are multiple GPU on a remote machine, the user may select a particular GPU by specifying the appropriate 'extra command line options' for mpiexec.exe. For example, to select GPU 3 on a remote machine with Microsoft MPI, provide the extra command line option, /env CUDA_VISIBLE_DEVICES 3
Script Access / Automation
FDTD solver property 'express mode'
setnamed("FDTD", "express mode", true);
FDTD resource manager:
setresource("FDTD", 3, "GPU Device", "Auto");
setresource("FDTD", "GPU", false);
setresource("FDTD", "CPU", true); //< equivalent to previous line
To allow for remote-host GPU, the value of "GPU Device" is not verified. Default is "Auto". The user will want to set an integer value.
Only PML boundary condition is supported. Bloch, periodic, symmetric, anti-symmetric, PEC, PMC boundary conditions are not supported.
Mode source / ports:
- Frequency dependent mode profiles are not supported.
Total Field Scattered Field (TFSF) source:
- TFSF source is not supported.
- Time monitors limit GPU performance. We suggest using them only for debugging and preliminary simulations.
- Spatial interpolation is not supported.
Frequency domain monitors:
- Partial and total spectral averaging is not supported
- Apodization is not supported
- FDTD GPU engine cannot be used if any Layer Builder uses a restricted process file
- 'Uniaxial anisotropic PML (legacy)' PML type in the simulation bounday conditions is not supported.