Since the 2023 R2 release, FDTD supports GPU calculations. This page summarizes the requirements and current limitations of the FDTD GPU solver. We are continuing to improve the support for GPU acceleration, please visit this page or the release notes for updates.
Note: In 2025 R1, parts of the Resource Configuration window and associated script commands have been updated. Instructions for the old Resource Configuration window have been kept where applicable but will be phased out in the future.
Hardware requirements
Starting in 2025 R1, Lumerical products will require CUDA 12, which comes with an updated requirement on driver version and compute capabilities. If you have recently updated, ensure your hardware is compatible with the requirements below.
To run your FDTD simulations on GPU, you will need the Nvidia CUDA driver version 525.60.13 or later (Linux), and version 527.41 or later (Windows). Additionally, your Nvidia GPU must comply with the following:
- GPU must offer Compute Capability greater or equal to 5.0 (Maxwell microarchitecture or newer).
- Drivers of older devices were discontinued in January 2023.
-
Unified Memory must be available and enabled
- always enabled on desktop, laptop and bare-metal servers
- usually enabled on cloud instances that advertise 'GPU pass-through', (including Azure VMs and AWS EC2 instances)
- other virtual environment service providers should consult the NVidia Virtual GPU Software User Guide.
- the hypervisor must be configured to provide GPU pass-through (where the physical device is dedicated to a particular virtual machine)
- unified-memory may need to be enabled for each specific vGPU.
To monitor GPU usage, use the 'GPU-Util' value reported by the NVidia System Management command-line utility. Windows users should note that the Windows Task Manager only reports graphics-related GPU utilization.
A list of supported GPU cards can be found in this document.
Licensing Requirements
The GPU solver license consumption is similar to the CPU solver (see Ansys optics solve, accelerator and Ansys HPC license consumption). For license usage calculation, we use Streaming Multiprocessors (SM) instead of core for CPU.
- For Ansys Standard/Business licensing, a Lumerical FDTD solve license is required for every 16 SMs with no partial counting. e.g. a GPU with 40 SMs requires three licenses to run the simulation.
- For Ansys Enterprise licensing, a Lumerical Enterprise solve license enables running no 4 SMs. Additional SMs requires either the Ansys HPC licenses (1 per SM) or Ansys HPC Pack licenses computed by: (\( \# SM = 2 * 4^{\# anshpc\_pack} \)).
Important
- FDTD jobs will use all the available SMs in the GPU.
- The number of SMs per job is not user-configurable.
- For this reason, there must be enough licenses available for the number of SMs on your GPU.
- Running one simulation with multiple GPUs is supported on one machine in Linux. It is not supported on Windows.
- Concurrent computing is supported with the following caveats:
- Concurrent computing using multiple GPUs on a single machine is supported on Linux.
- On Windows, concurrent computing is only supported if each job is assigned their own GPU.
- Concurrent jobs, each assigned to their own GPU, will run at full speed. Concurrent jobs assigned to the same GPU may suffer from poor performance.
- Concurrent computing across multiple machines is supported on Linux and Windows, see this Knowledge Base article for more details on configuring resources for concurrent computing.
- Distributed computing across multiple machines is not supported
- GPU and CPU computation cannot be used simultaneously, whether through concurrent or distributed computing
For multiple jobs running on a resource with a single GPU, it is recommended to run them in series rather than in parallel. Running in parallel requires as many licenses as the number of jobs but it takes about the same time as running in serial. For example, running two jobs simultaneously in a GPU with 16 SMs or less requires two licenses, but it takes approximately the same time to run one job after the other in the same machine with one license only.
The number of SMs in the GPU can be found in Nvidia's documentation, on third-party websites, or by running the Job Manager Configuration test for the GPU resource (localhost only),
in the log file when the FDTD GPU engine is run
start loading CUDA query DLL...
load CUDA query DLL successfully.
GPU streaming multiprocessors(SMs): 16
or after the FDTD GPU engine is run in the FDTD result 'total gpu sms'
Note: as with CPU, the overall memory bandwidth is more important for performance than the number of cores (see FDTD benchmark on CPU).
Simulation Requirements
The FDTD GPU solver can only run 3D FDTD simulations and is suitable for narrowband simulations. The procedure for configuring resources and running simulations are listed below.
Note: Express mode, previously required for GPU simulations and could not simulate dispersive materials, has been removed starting in 2025 R1.
Resource configuration
- Ensure a GPU device is listed and activated in the resource list; on new installations, a new GPU resource will be created if at least one GPU is present
-
- “GPU” automatically selects a GPU resource, “GPU custom” allows the user to edit run options, and “GPU X” selects a specific GPU.
Tip: The “Name” field of each row can be customized by clicking to help you distinguish each resource. - To manually add a GPU resource that is not present, click on the add button on the right-hand side of the list,, and change its type to GPU.
Note: In 2024 R2.3 or older versions, toggle the global resource configuration flag from 'CPU' to 'GPU' near the top of the window to activate GPU mode.
- “GPU” automatically selects a GPU resource, “GPU custom” allows the user to edit run options, and “GPU X” selects a specific GPU.
- If there are multiple GPUs on the local machine, by default all GPUs will be used. When running the engine, the job manager will configure the standard CUDA_VISIBLE_DEVICES environment variable. You can specify the GPU devices to be used in a Multi-GPU setup by changing the “Device Type” to “GPU Custom”, and editing the “gpu devices” box in the advanced options of the resource, which is accessed by clicking the “Edit” button on the right.
- If there are multiple GPU on a remote machine, you may select a particular GPU by specifying the appropriate 'extra command line options' for mpiexec.exe. For example, to select GPU 3 on a remote machine with Microsoft MPI, provide the extra command line option, /env CUDA_VISIBLE_DEVICES 3
- When the “Threads” column is set to “auto”, the software will use as many threads as allowed by the number of licenses checked out (determined by the number of SM in the GPU, see the “Licensing” section above for more information).
Note: All computation other than the solver, including meshing and script commands still uses the CPU. The number of threads used for the meshing process can be set via the “Threads” column in the Resource Configuration window, according to the number of cores on the local computer. The number of threads used for design environment operations such as script commands or farfield projection can be set in the “design environment” tab in the resource manager. For more information on setting CPU threads, see the Knowledge Base article on Resource configuration elements and controls.
Running Simulation - Single Simulations
To run the simulation using GPU, select the GPU toggle in the “Run Simulation” group under the “FDTD” tab. Then, select the desired GPU using the dropdown menu, and press run to run a single simulation on GPU. For more information on the dropdown selection, see the Knowledge Base article on the new modern user interface.
Script Access / Automation – Resource and Single Simulation
Setting FDTD resource manager:
setresource("FDTD", 1, "device type", "CPU");
setresource("FDTD", 2, "device type", "GPU"); #< any dropdown option can work in place of “CPU” or “GPU”, such as “GPU custom”, “GPU 0”…etc.
Note: For 2024 R2.3 or earlier, use the code below
setresource("FDTD","GPU", true);
setresource("FDTD", 3, "GPU Device", "Auto");
setresource("FDTD", "GPU", false);
setresource("FDTD", "CPU", true); #< equivalent to previous line
Note: In all versions, the field “Device Type” (“GPU Device” for 2024 R2.3 and earlier) field for remote machines are not verified. Please set an integer value in the advanced window with “GPU custom” as the option for “Device Type”.
Checking GPU specifications:
gpu_specs = gpuspecs; #Return a cell array with GPU specifications of all installed GPUs
Check simulation requirements:
sim_req = runsystemcheck("FDTD","GPU") #Return a structure with simulation and memory requirements for GPU simulation of the current file
Running Simulation – Optimizations and Sweeps
The type of resource to be used for a sweep or an optimization is controlled by a CPU/GPU drop-down list is near the run button at the “Optimization and Sweeps” toolbar.
- The selected sweep must be of the solver type “FDTD” to change this option, the solver type is changed in the property window of the sweep entry.
- For optimizations and Monte Carlo analysis, the option can only be changed if the FDTD solver is the first enabled solver in the Object Tree.
- Pressing run with “CPU” option runs the selected sweep on enabled CPU resources.
- Pressing run with “GPU” option runs the selected sweep on enabled GPU resources.
- In the context menu, the “Run” button will run the sweep with the option selected in the CPU/GPU drop-down list.
Script Access / Automation – Running Sweeps
Automation is accessed using a mode flag in the runsweep command, as shown below.
runsweep; #< run all sweeps in CPU mode
runsweep("CPU") #< run all sweeps in CPU mode
runsweep("GPU") #< run all sweeps in GPU mode, if there is a sweep incompatible with GPU computing (i.e., non FDTD solver), this will result in error
runsweep("thickness sweep", "GPU") #< run sweep with name of “thickness sweep” using GPU
Note: When accessing sweeps using script, sweep names “CPU” and “GPU” is ambiguous and will result in errors for the runsweep command.
Accessing Results
Unlike CPU simulations, GPU simulation results are not stored inside the simulation file. These results are automatically stored in an auxiliary folder with the same name as the simulation file, created in the same directory as the simulation file.
The simulation file and auxiliary folder can be moved together to another folder or another filesystem.
GPU simulation results cannot be accessed if the simulation file is moved without the auxiliary folder, or if the auxiliary folder is deleted. If the simulation is switched to the layout mode and saved, these results will be automatically deleted if they are present.
Current limitations
Sources
Mode source / ports:
- Frequency dependent mode profiles are not supported.
- Gaussian Sources, Plane Wave Sources, Mode Source and Ports will have their behaviour automatically adjusted when using the GPU:
- A single spatial field profile is always used – when multiple spatial profiles are present, the profile at the simulation center frequency is used.
- Settings related to maximum convolution time window are ignored.
- Gaussian Sources, Plane Wave Sources, Mode Source and Ports will have their behaviour automatically adjusted when using the GPU:
Total Field Scattered Field (TFSF) source:
- TFSF source is not supported and will result in an error.
Monitors
Time monitor:
- Time monitors limit GPU performance. We suggest using them only for debugging and preliminary simulations.
- Spatial interpolation will be defaulted to ‘none’ when using the GPU.
Note: Object properties of time monitors will not be changed under analysis mode, for example:
setnamed("time monitor", "spatial interpolation", "nearest cell");
run("FDTD", "GPU");
getresult("time monitor", "E"); #< should match having "none" spatial interpolation
switchtolayout;
?getnamed("time monitor", "spatial interpolation"); #< should still be "nearest cell"
Frequency domain monitors:
- Partial and total spectral averaging is not supported
- Time apodization will be defaulted to ‘none’ when using the GPU.
Note: Object properties of frequency domain monitors will not be changed under analysis mode, in the same way as time monitors above.
Movie monitor:
- Movie monitors will not record a movie when using the GPU.
Materials
- As of 2025 R1, 3D standard optical permittivity models are fully supported, including Sampled 3D, Dielectric, (n,k), and PEC materials.
- Except for PEC material assigned to a 2D geometry, 2D standard optical conductivity models are not currently supported and will result in an error.
- Advanced permittivity models, including Index Perturbation and materials created from the Flexible Material Plugin Framework, are not supported and will result in an error.
Other
Layer builder
- FDTD GPU engine cannot be used if any Layer Builder uses a restricted process file
PML Type
- 'Uniaxial anisotropic PML (legacy)' PML type in the simulation boundary conditions is not supported.
See also
- List of licensed features by product
- Ansys optics solve, accelerator, and Ansys HPC license consumption
- List of supported GPU cards