EME Performance Benchmarks

Resource Configuration

The default configuration runs as a single-process simulation. The job can also be run using multiple MPI processes, with each process utilizing 1 core.

When multiple processes are used, the number of cells in the simulation are divided as evenly as possible across the processes. When the number of cells is not an exact multiple of the number of processes, the processes with lower rank will have one additional cell. For example, if there are three processes and eight cells, rank 0 and rank 1 will each have three cells while rank 2 will have two cells. If the number of processes is greater than the number of cells, one process will be used per cell. The additional processes will not be used.

Each process calculates and normalizes the modes and finds the overlaps for its assigned cells. Those results are then sent back to the rank=0 process for the periodic overlaps – calculated over effective segments – and the overlaps with the ports are calculated. The time taken for the communication between processes and each of the steps in the is reported as a simulation benchmark. All stages, except for the initial mode calculation, are multi-threaded based on the EME Engine settings in the Resource Configuration.

Measuring Performance

The most useful information for measuring performances is “total eme simulation time”, which the real-time required for the EME Engine to complete the simulation. The time spent on MPI communication between processes is given as “eme communication time”. The additional simulation benchmarks breakdown the overall runtime to the separate stages of the algorithm.

Since the 2022 R1.3 release, performance metrics are also provided as a result of the EME object, making it easier to extract this information. The “eme communication time” was added in 2022 R2 when the ability to run EME with multiple processes via MPI became available.

Choosing a simulation

Devices requiring multiple cells (for example with tapered sections) will benefit from the EME parallelization. For our tests, we use the simulation file from the Edge coupler Application Gallery example.

Amazon EC2 instance types

Amazon Elastic Cloud Compute (EC2) allows to setup and run your own Virtual Machines (instances). Various instance types are available, based on hardware configurations optimized for different use cases.

For information on EC2 pricing, see: Amazon EC2 instance pricing
For more information on EC2 instance types, see: Amazon EC2 instance type

Intel Xeon processors

C6i instances are the latest generation of Compute-optimized instances, powered by 3rd generation Intel Xeon Scalable processors.

AMD EPYC processors

C6a instances are the latest generation of Compute-optimized instances, powered by 3rd generation AMD EPYC 7003 processors

Simulation results

For each instance type, the tests were run on Windows Server 2016, using Microsoft MPI. We ran the simulation using 1 process, 24 threads and 12 processes, 1 thread:

Instance type	Description	Simulation time
Instance type	Description	1 process 24 threads	12 processes 1 threads
c6i.12xlarge	Intel Xeon Scalable 8375C 2.9GHz 24 cores 96GB RAM	2mn59s	23.5s
c6a.12xlarge	AMD EPYC 7R13 2.65GHz 24 cores 96GB RAM	1mn54s	32.6s

Resource Configuration

Measuring Performance

Choosing a simulation

Amazon EC2 instance types

Intel Xeon processors

AMD EPYC processors

Simulation results

See also

Associated files

EME Performance Benchmarks

Resource Configuration

Measuring Performance

Choosing a simulation

Amazon EC2 instance types

Intel Xeon processors

AMD EPYC processors

Simulation results

See also

Associated files

Related articles