How the RCWA solver utilizes your computer's hardware resources can be controlled by changing the resource configuration.
The Resource Configuration window can be opened by clicking the Resources button:
The RCWA resource configuration is set in the RCWA solver tab:
Processes
Unlike the FDTD solver, the RCWA solver can only be run with a single process. This means it isn't possible to distribute a single RCWA simulation over multiple nodes.
Threads
A single run of the RCWA solver can consist of multiple simulations, one for each combination of the selected incident angles and frequencies/wavelengths. As RCWA is a frequency domain solver, each of these simulations is independent and can be run in a separate thread.
The Threads column in the Resource table above assigns a maximum number of threads for RCWA to use. When there is sufficient RAM, RCWA uses the Threads column to control the maximum number of concurrent simulations to run. If there is insufficient RAM for one simulation per thread, RCWA automatically decreases the number of concurrent simulations to within the RAM limit. Excess threads are used for linear algebra acceleration. For example, if there are 5 simulations to run, but 8 threads assigned in the Resource table as seen above, the remaining 3 threads are used to accelerate linear algebra operations.
Capacity
Multiple runs of the RCWA solver (called "jobs") can be run concurrently, for example when using the parameter sweep utility, optimization utility, or addjob/runjobs script commands. The maximum number of jobs run concurrently is set by the Capacity field of the Resources table. For full resource utilization, it is generally best to set the total number of jobs to be a multiple of the Capacity field.
Run within design environment
The run within design environment checkbox determines if the RCWA runs within the design environment instead of using the listed resource(s), like script commands such as FFTs. This option does not apply to sweeps, only single simulations.
You can also set this option by script using the getresource and setresource command.
# Gets the status of the checkbox, 1 if enabled, 0 if disabled
getresource("RCWA", "run within design environment");
#Sets the checkbox, you can use boolean values (true/false) or 1/0
setresource("RCWA","run within design environment", 1);Optimum Resource Configuration
To fully utilize your computing resources, the total number of threads should be equal to the number of physical cores on your computer. The total number of threads is equal to number of running jobs times the number of threads per job. This means that the product of Threads and Capacity should be equal to the number of physical cores to ensure this.
Note: Our testing suggests that hyperthreading does not typically increase the speed of the RCWA solver, which is why physical cores are specified here.
Examples
Below are some example optimum resource configurations on a computer with 8 physical cores.
Running a single RCWA simulation over multiple angles and frequencies
There is only a single job so Capacity can be left as 1. There are multiple incident angles and frequencies which can be run concurrently on separate threads so we want a high number of threads. To make Capacity*Threads equal to the number of physical cores, Threads is set to 8.
Running a parameter sweep of a geometry parameter with 8 parameter values for an RCWA simulation with a single incident angle and a single frequency
The simulation is running for a single incident angle and frequency, so each job only has one simulation. This means Threads can be set to 1. The jobs can be run concurrently by increasing the Capacity. To make Capacity*Threads equal to the number of physical cores, Capacity is set to 8.
Running a convergence test of k-vectors with the parameter sweep utility for an RCWA simulation with a multiple incident angles and frequencies
This is a more complicated case because there are multiple simulations and multiple jobs. Increasing the number of k-vectors increases the simulation time, so the jobs with a high number of k-vectors will take much longer than the other jobs. For this reason, it is best to set Capacity to 1 and Threads to 8. Otherwise, the cores will not be fully utilized when the fast, low k-vector jobs are finished and the high k-vector jobs are still running.