The default configuration runs as a single-process simulation. The job can also be run using multiple MPI processes, with each process utilizing 1 core.
When multiple processes are used, the number of cells in the simulation are divided as evenly as possible across the processes. When the number of cells is not an exact multiple of the number of processes, the processes with lower rank will have one additional cell. For example, if there are three processes and eight cells, rank 0 and rank 1 will each have three cells while rank 2 will have two cells. If the number of processes is greater than the number of cells, one process will be used per cell. The additional processes will not be used.
Each process calculates and normalizes the modes and finds the overlaps for its assigned cells. Those results are then sent back to the rank=0 process for the periodic overlaps – calculated over effective segments – and the overlaps with the ports are calculated. The time taken for the communication between processes and each of the steps in the is reported as a simulation benchmark. All stages, except for the initial mode calculation, are multi-threaded based on the EME Engine settings in the Resource Configuration.
The most useful information for measuring performances is “total eme simulation time”, which the real-time required for the EME Engine to complete the simulation. The time spent on MPI communication between processes is given as “eme communication time”. The additional simulation benchmarks breakdown the overall runtime to the separate stages of the algorithm.
Since the 2022 R1.3 release, performance metrics are also provided as a result of the EME object, making it easier to extract this information. The “eme communication time” was added in 2022 R2 when the ability to run EME with multiple processes via MPI became available.
Choosing a simulation
Devices requiring multiple cells (for example with tapered sections) will benefit from the EME parallelization. For our tests, we use the simulation file from the Edge coupler Application Gallery example.
Amazon EC2 instance types
Amazon Elastic Cloud Compute (EC2) allows to setup and run your own Virtual Machines (instances). Various instance types are available, based on hardware configurations optimized for different use cases.
- For information on EC2 pricing, see: Amazon EC2 instance pricing
- For more information on EC2 instance types, see: Amazon EC2 instance type
Intel Xeon processors
- C6i instances are the latest generation of Compute-optimized instances, powered by 3rd generation Intel Xeon Scalable processors.
AMD EPYC processors
- C6a instances are the latest generation of Compute-optimized instances, powered by 3rd generation AMD EPYC 7003 processors
For each instance type, the tests were run on Windows Server 2016, using Microsoft MPI. We ran the simulation using 1 process, 24 threads and 12 processes, 1 thread:
|Instance type||Description||Simulation time|
Intel Xeon Scalable 8375C 2.9GHz
AMD EPYC 7R13 2.65GHz