Since the 2023 R2 release, FDTD supports GPU calculations. This page summarizes the requirements and current limitations of the FDTD GPU solver.
Hardware requirements
To run your FDTD simulations on GPU, you will need the Nvidia CUDA driver version 450.80.02 or later (Linux), and version 452.39 or later (Windows). Additionally, your Nvidia GPU must comply with the following:
- GPU must offer Compute Capability greater or equal to 3.0 (Kepler microarchitecture or newer).
- Drivers of older devices were discontinued in January 2019.
-
Unified Memory must be available and enabled
- always enabled on desktop, laptop and bare-metal servers
- usually enabled on cloud instances that advertise 'GPU pass-through', (including Azure VMs and AWS EC2 instances)
- other virtual environment service providers should consult the NVidia Virtual GPU Software User Guide.
- the hypervisor must be configured to provide GPU pass-through (where the physical device is dedicated to a particular virtual machine)
- unified-memory may need to be enabled for each specific vGPU.
To monitor GPU usage, use the 'GPU-Util' value reported by the NVidia System Management command-line utility. Windows users should note that the Windows Task Manager only reports graphics-related GPU utilization.
A list of supported GPU cards can be found in this document.
Licensing Requirements
The GPU solver license consumption is similar to the CPU solver (see Ansys optics solve, accelerator and Ansys HPC license consumption). For license usage calculation, we use Streaming Multiprocessors (SM) instead of core for CPU.
- For Ansys Standard/Business licensing, a Lumerical Accelerator (engine license) is required for every 16 SMs with no partial counting. For example, a GPU with 40 SMs requires three licenses to run any job.
- For Ansys Enterprise licensing, a Lumerical Solve license enables 4 SMs. Additional SMs will requires either Ansys HPC licenses (1 per SM) or Ansys HPC Pack licenses (\( \# SM = 2 \cdot 4^{\# anshpc\_pack} \)).
Important
- FDTD jobs will use all the available SMs in the GPU.
- The number of SMs per job is not user-configurable.
- For this reason, there must be enough licenses available for the number of SMs on your GPU.
- Multi GPU support is available on 1 machine (local host) in Linux.
- Multi GPU across/spanning several machines/nodes is not supported.
For multiple jobs, it is recommended to run them in series rather than in parallel. Running in parallel requires as many licenses as the number of jobs but it takes about the same time as running in serial. For example, running two jobs simultaneously in a GPU with 16 SMs or less requires two licenses, but it takes approximately the same time to run one job after the other in the same machine with one license only.
The number of SMs in the GPU can be found in NVidia's documentation, on third-party websites, or by running the Job Manager Configuration test for the GPU resource (localhost only),
in the log file when the FDTD GPU engine is run
start loading CUDA query DLL...
load CUDA query DLL successfully.
GPU streaming multiprocessors(SMs): 16
or after the FDTD GPU engine is run in the FDTD result 'total gpu sms'
Note: as with CPU, the overall memory bandwidth is more important for performance than the number of cores (see FDTD benchmark on CPU).
Simulation Requirements
The FDTD GPU solver can only run 3D FDTD simulations. The “express mode” option should be enabled in the FDTD object properties (advanced options tab).
The GPU solver is suitable for narrowband simulations, or simulations with non-dispersive materials.
Note: While “express mode” is active, certain incompatible features will be automatically adjusted and/or disabled. These disabled features are summarized in the “Current Limitations” section below, and warnings are shown in the object editor window. Additional warnings are also provided when checking the simulation, and in the simulation log when incompatible features are detected.
Resource configuration
- Toggle the Job Manager from 'CPU' to 'GPU'
- If there are multiple GPU on the local machine, by default all GPUs will be used. The user may select a particular GPU in the job manager. When running the engine, the job manager will configure the standard CUDA_VISIBLE_DEVICES environment variable.
If "custom" is selected as "GPU Device", a list of GPU devices can be specified for that resource.
- If there are multiple GPU on a remote machine, the user may select a particular GPU by specifying the appropriate 'extra command line options' for mpiexec.exe. For example, to select GPU 3 on a remote machine with Microsoft MPI, provide the extra command line option, /env CUDA_VISIBLE_DEVICES 3
Notes
All other computation and calculations done by the CAD will be using CPU resources. Set the number of threads according to the number of cores on the local computer. Using 'auto' will try to use all the available cores on the machine when doing other calculations.
Script Access / Automation
FDTD solver property 'express mode'
setnamed("FDTD", "express mode", true);
FDTD resource manager:
setresource("FDTD","GPU", true);
setresource("FDTD", 3, "GPU Device", "Auto");
setresource("FDTD", "GPU", false);
setresource("FDTD", "CPU", true); #< equivalent to previous line
To allow for remote-host GPU, the value of "GPU Device" is not verified. Default is "Auto". The user will want to set an integer value.
Current limitations
Sources
Mode source / ports:
- Frequency dependent mode profiles are not supported.
- Gaussian Sources, Plane Wave Sources, Mode Source and Ports will have their behaviour automatically adjusted when using the GPU:
- A single spatial field profile is always used – when multiple spatial profiles are present, the profile at the simulation center frequency is used.
- Settings related to maximum convolution time window are ignored.
- Gaussian Sources, Plane Wave Sources, Mode Source and Ports will have their behaviour automatically adjusted when using the GPU:
Total Field Scattered Field (TFSF) source:
- TFSF source is not supported and will result in an error.
Monitors
Time monitor:
- Time monitors limit GPU performance. We suggest using them only for debugging and preliminary simulations.
- Spatial interpolation will be defaulted to ‘none’ when using the GPU.
Note: Object properties of time monitors will not be changed under analysis mode, for example:
setnamed("time monitor", "spatial interpolation", "nearest cell");
run("FDTD", "GPU");
getresult("time monitor", "E"); #< should match having "none" spatial interpolation
switchtolayout;
?getnamed("time monitor", "spatial interpolation"); #< should still be "nearest cell"
Frequency domain monitors:
- Partial and total spectral averaging is not supported
- Time apodization will be defaulted to ‘none’ when using the GPU.
Note: Object properties of frequency domain monitors will not be changed under analysis mode, in the same way as time monitors above.
Movie monitor:
- Movie monitors will not record a movie when using the GPU.
Materials
- Dielectric, (n,k), and PEC materials are fully supported.
- Other material models are automatically adjusted when using the GPU:
- 3D standard optical permittivity models with frequency dependence are treated as an (n,k) material evaluated at the simulation center frequency.
- 3D standard optical permittivity models with negative permittivity at the simulation center frequency, including most metals, will result in an error.
- Except for PEC material assigned to a 2D geometry, 2D standard optical conductivity models are not currently supported and will result in an error.
- Advanced permittivity models, including Index Perturbation and materials created from the Flexible Material Plugin Framework, are not supported and will result in an error.
Other
Layer builder
- FDTD GPU engine cannot be used if any Layer Builder uses a restricted process file
PML Type
- 'Uniaxial anisotropic PML (legacy)' PML type in the simulation bounday conditions is not supported.
See also
- List of licensed features by product
- Ansys optics solve, accelerator, and Ansys HPC license consumption
- List of supported GPU cards