GPU Acceleration for Grating Coupler Optimizations in the Cloud with Ansys Access on Microsoft Azure

FDTD Photonic Integrated Circuits - Passive

In this article, we demonstrate how Ansys Access on Microsoft Azure can be used to leverage GPU assets on the cloud via virtual desktop. GPU acceleration for Lumerical FDTD enables us to reduce our time to results from over an hour with a local CPU to a few minutes.. This enhanced efficiency allows you to meet deadlines, optimize over larger parameter spaces or run more accurate models.

download example

Overview

Understand the simulation workflow and key results

The Finite-Difference Time-Domain (FDTD) method is a state-of-the-art method for solving Maxwell's equations in complex geometries. However, a fine mesh is often required for the highest accuracy results and this often results in long, or even unpractical simulation runtimes. In this article, we show how GPU acceleration can be used to speed up the computer aided design and optimization process. Although, GPU cards are expensive to buy directly, and it can take a long time to see returns on these deprecating assets without constant usage, Ansys Access on Microsoft Azure is a service that allows users to rent a variety of GPU cards via a simple remote connection. As a result, it becomes possible for users to perform computationally demanding simulations without investing into their own GPU cards. The choice of hardware can also be tailored case by case for each application, in terms of the nature of the GPU card, number of CPU core, and memory size.

This workflow is divided into 2 different steps:

Step 1: Getting on Ansys Access on Microsoft Azure and setting up GPU acceleration

In the first section we will introduce Ansys Access on Microsoft Azure and provide a guide for setting up a project space with the desired hardware, and with the desired GPU hardware and Lumerical Application installed.

While this guide assumes Ansys Access on Microsoft Azure , the majority of the steps are also applicable to Ansys Gateway powered by AWS .

Step 2: Comparison between CPU and GPU – Simulation time and Accuracy

In the second step, we demonstrate with a practical example how to configure and run a simulation with GPU acceleration. We will provide a comparison for the running time of simulation and optimization between CPU and GPU, and compare the accuracy of the results.

Run and Results

Instructions for running the model and discussion of key results

Step 1: Getting on Ansys Access on Microsoft Azure and setting up a Virtual Desktop

It is assumed for the starting point that the user can connect to a project space on Ansys Access on Microsoft Azure. For more details, see Installation and Setup (ansys.com)

Configuration of a Virtual Desktop

Open the page Creating a Virtual Desktop (ansys.com) and follow the step-by-step instruction to create a virtual desktop in your project space.

Once connected to an Azure account, a user can configure their own virtual desktop by selecting “New Resource” on the top right corner of the project space page. The different properties are selected in a step-by-step process starting by the location of the datacenter and the desired operating system (Linux or Windows).

At this stage, the user may select several applications to have the desired Ansys tools pre-installed on the Virtual desktop. When an application is selected, an option is prompted to provide the license information, either a license server or an elastic licensing. For more information on how to configure the Ansys license manager for shared access to the licenses on a server, see Configuring the Ansys license manager for shared access – Ansys Optics

It is possible to add multiple applications and note that additional applications may be added later as needed. For the practical example of this article, we are going to use the latest version of Ansys Lumerical available.

The most critical step for the configuration is the choice of hardware. For details on the best practices to select the best hardware for different needs, see Virtual Machine Types and Sizes (ansys.com) . Several pieces of information are displayed to indicate key hardware parameters available in the datacenter, such as the number of cores, the memory size, the GPU card, and the price. It is recommended to check separately the datasheet of the different GPU cards to estimate their performance and memory size as it is not specified directly on the selection page. See also Recommended Virtual Machine Sizes for VDI Workflows (ansys.com) .

In this article, the main application of focus is the GPU acceleration, the simulations were run with the Nvidia Tesla V100 GPU card (instance Standard_NC6s_v3).

Beyond the technical specifications of the hardware, it is also at this stage that the price information can be seen. For budgeting and best practices for managing costs, see Budgeting (ansys.com) and Best Practices for Managing Costs (ansys.com) . Depending on the hardware selected, the price varies from a fraction of a dollar to around 15$/h.

NVIDIA drivers are required to enable accelerated graphics and GPU compute on Azure virtual machines with NVIDIA GPUs. When you are creating a virtual desktop and select a virtual machine size with NVIDIA GPUs, GPU driver options are displayed. The options that are available and the option that is selected by default depend on the virtual machine type. Factors to consider are the application that you are installing, the machine's operating system, and your intended use of the GPU ― accelerated graphics, GPU compute, or none. In our case, we select the GRID drivers. For more details on the driver selection, see NVIDIA GPU Driver Support (ansys.com)

Connect to the Virtual Desktop

Start the virtual desktop and connect to it with a remote desktop connection

Before connecting to a virtual desktop, the settings configured during the creation may be checked, and it is also possible to add applications. Note that cost is accrued once the status of the connection indicates “Running”. Some options are available to automatically disconnect after some time of inactivity, but it is best practice to remember to disconnect when not in use.

Finally, the “User Settings” of the Azure account can be checked before accessing the virtual desktop. It is possible to toggle the preferences to allow for use of multiple screens, and to allow mapping of the local drive. This latest option is useful to transfer files between a local computer and the virtual desktop. By enabling the mapping, local files will be accessible on the virtual desktop in the Network tab of the system explorer.

Transferring files via a local network might be slow. For larger files, it is recommended to use online storage services and to download them directly from the Virtual Desktop once the connection is established. For further information on data management, see General Guidelines for Transferring Files (ansys.com) .

Once the virtual desktop has been created, it can be accessed by pressing “Start” and “Connect”. It is sometimes more stable to download the RDP file and to connect to the virtual desktop with a superadmin session. For further details, see Starting a Virtual Desktop (ansys.com) .

Step 2: Comparison between CPU and GPU – Simulation time and Accuracy

Quick description of the base project

Check out the article Grating coupler – Ansys Optics . Note that there are several steps where there are indications of a long running time, and the data are simply pre-computed for anyone to go through the workflow quicker. Namely, the 3D optimization and the S-parameter extraction steps.
(Optional) Open the file [[grating_coupler_3D.fsp]] and run the “position optimization” and “S-parameters” sweeps. Check how long it takes to run through it on CPU. Running time may vary from the numbers presented in this article since they depend on the local machine.

As an example to demonstrate the performance of GPU acceleration, we are going to check out the design of a grating coupler connecting a single-mode fiber on the surface of a photonic chip to an integrated waveguide. Details on this design can be found in the article Grating coupler – Ansys Optics .

This workflow goes through the design and optimization of the grating coupler parameters, the position of the fiber, and the extraction of the S-parameters. The simulation can be performed with a CPU with a coarse mesh, but running the different optimizations and sweeps to go through the design takes a significant amount of time. In addition, it is often recommended to perform the final analysis with a finer level of mesh. In the original article, the meshing is set at level 2.

Setting up for GPU acceleration

Open the file [[grating_coupler_3D.fsp]] and set it up for GPU computation
1. In the tab "FDTD - Run Simulation", select “GPU”
2. In the tab "File - Configure", select "Resources" and click on "Run tests" to confirm the GPU is supported, and verify the number of Streaming Multiprocessor (SM).

Setting the FDTD solver for GPU computation is a straightforward process. Open the Resource Configuration window and toggle the FDTD solver to GPU mode. It is important to configure the Threads to match the CPU cores as meshing is performed on the CPU. If multiple GPU cards are available, it is possible to choose a specific card.

It is good practice to run a test to identify quickly if the GPU card is supported, and to check how many streaming multiprocessors (SM) are available. It is important to know the number of SM as it is the number that will define the license consumption. FDTD jobs will use all the available SMs in the GPU, the number of SMs per job is not user configurable. For this reason, it is important to ensure there are enough licenses available for the GPU SMs. For more details information on the GPU settings and the license consumption, see FDTD GPU Solver Information – Ansys Optics and Ansys HPC license consumption – Ansys Optics .

For more details on GPU specific settings, see FDTD GPU Solver Information – Ansys Optics

Acceleration of the optimization

Go to the “Optimizations and Sweeps” window and select the optimization item named “position optimization”.
1. Once it is completed, you can right click on the sweep and select “Apply the best parameter”. This set the x position of the fiber to the optimal location. Note that the optimal position is already set for the fiber.
2. You can verify in the properties of the fiber object that its x position is set at the best position computed by the sweep.
Once again, go to the “Optimizations and Sweeps” window, select the sweep item named S-parameters and run the sweep. Once the sweep has been completed, the scattering parameters of the device become available. To see them, right click on the S-parameters sweep item and choose “Visualize” followed by “S parameters” on the context menu. Select scalar operation “Abs^2” to see the power s-parameters. Note that the sweep runs 2 simulations, one for each port. If you run of memory, it is possible to run them separately.
In the object tree, select the FDTD object and raise the mesh accuracy to 4. Repeat the 2 previous steps.

Running the different sweep and optimization can be done in a reasonable time even with CPU when the mesh accuracy is low. Even in this situation, the GPU acceleration improves significantly the running time.

If a finer mesh is required for the simulations, we can see that the running time on CPU jumps from a couple of hours to more than a full day. When multiple runs are necessary for such heavy computation, the gain of time provided by the GPU becomes extremely significant.

Of course, the computation time and the gain provided by the GPU acceleration depends on the CPU of reference and the GPU cards used to perform the simulations. In our example, the FDTD solver was running at a speed of around 300 Mnodes/s on CPU against around 6500 Mnodes/s on GPU. Note that the instance selected for the GPU computation an Azure (Standard_NC6s_v3) had only 6 cores. The CPU computations performed for comparisons in the table above were run on a separate shared machine with 24 cores ( Intel® Xenon(R) Gold 6342 CPU @2.80Ghz) and 2 sockets (48 Threads). Since Azure is billing on the running time, it makes more sense to run separately what can be done on a local hardware. It is also providing a fairer comparison between CPU and GPU by using a CPU presenting a high thread count.

In general, it is possible for the GPU and CPU calculation to be slightly different. In our case, we verify the results are identical. If you zoom sufficiently on the curves, you may see a insignifiant difference from the $7^{th}$ decimal.

For this example, the position optimization lands on the same results between the coarse and fine mesh. However, for the S-parameter calculation, even though the mesh with accuracy 2 provides a great estimation we can see that the finer mesh improves the accuracy on the peak position. This refinement for the peak position and the overall accuracy of the simulations can be critical for the final steps of a system analysis.

GPU is providing a significant acceleration of the FDTD computation, which opens the door to running more complex simulations for higher accuracy level. GPU cards are expensive, but Ansys Access enables users to access the hardware they need, for the time they need. It helps not only to maintain the price tag of GPU acceleration to a reasonable level, but it also allows a high level of customization since different types of hardware are made accessible, with all the Ansys tools required pre-configured and ready to use.

Important model settings

Description of important objects and settings used in this model

Note that the x position of the fiber is already set to the best position obtained by running the “position optimization sweep”
One limitation of the GPU is the memory, that is commonly much lower than the CPU RAM. In this example, setting the mesh accuracy to an even finer level would have exceed the memory resource of the GPU card (16Gb). You can check the memory requirement before running the simulation by going to “Check Check simulation and memory requirements”. After running a simulation, the running time and peak memory information can be read in the “p0.log” file generated. In our example, an FDTD run with a mesh accuracy of 4 resulted with a peak memory of 12.1Gb on CPU, and 13.8Gb on GPU. It is common for the GPU simulations to be a little greedier than CPU in the peak memory usage.
The pricing varies depending on the hardware selected. In our case, the price was around $6.7/h. Running the S-parameters sweep took about 6h with CPU, and was reduced to around 40min with GPU. The price for this acceleration would then be of about $4.5. This number can be compared to the price of a new GPU card starting easily at $1000, and the time it would take to amortize this investment depending on the usage.

Taking the Model Further

Information and tips for users that want to further customize the model

GPU is not the only resource available on Ansys Access on Microsoft Azure. Some hardware provides also high level CPU with multiple core and high memory.
Multiple-GPU are also available for Linux users, as well as automating-scaling clusters.

Additional Resources

Related Publications

"Design fiber-to-waveguide coupling for photonic integrated circuits," Proc. SPIE 12427, Optical Interconnects XXIII, 124270B (8 March 2023)

Overview

Step 1: Getting on Ansys Access on Microsoft Azure and setting up GPU acceleration

Step 2: Comparison between CPU and GPU – Simulation time and Accuracy

Run and Results