This article describes configurations you need to perform on your cluster prior to using Ansys Lumerical for distributed or concurrent computing.
Notes:
- Install Lumerical simulation suite on your cluster, preferably on a shared filesystem.
- Running multiple simulations across several computers simultaneously (concurrent computing), will require as many licenses as the number of computers running the simulations. See the Knowledge Base article Ansys optics solve, accelerator, and Ansys HPC license consumption for more information on estimation and license sharing.
- Concurrent computing is currently supported by all products.
- Distributed computing is only available for FDTD and varFDTD.
Configure your firewall
- Many Linux clusters communicate across a private network and therefore firewall security may not be required. If no firewall is in use in your network, this step may be skipped.
- The MPI processes communicate using a range of ports. It's easiest to simply disable the firewall on all nodes. An alternate solution is to configure MPI to use a specific range of ports, then create exceptions for those ports. See your MPI's documentation for details.
- If you want to leave the firewall turned on, two additional firewall exceptions are required:
- In some configurations, MPI requires the use of the SSH programs to start remote processes on the compute nodes during parallel execution. Ensure that SSH port 22 is allowed to accept incoming TCP/IP connections on all of your compute nodes.
- Open/allow access to the TCP ports used by the license manager.
Shared network storage
Setup
When running distributed and concurrent jobs, the leading host (Rank 0 host), which is the first host/node that appears in the list, must have access to the simulation file.
When the leading host is a local host, the computer only needs to have access to the local folder where the simulation file is stored.
When the leading host is a remote host, the leading host should be able to access the same file as the computer that initiates the job. For example, if the simulation file on the computer that initiates the job is stored in
/simulations/lumerical/myHPCSim.fsp
The leading host should also be able to access the file with the path
/simulations/lumerical/myHPCSim.fsp
One solution to this is to set up shared network storage, or when using AWS, use the S3 storage, and place the simulation files in a location where all nodes can access the files with the same path.
For example, you can save simulation files in your home directory on Linux for clusters where that folder is common to all nodes in the system. Alternatively, you can create a network storage location, then mapping that location to a common drive or directory.
For more information on creating a network file system, see your operating system's documentation.
Limitation for Windows
On Windows, shared network storage will not work by default for the Intel MPI. To set up shared network storage access on Windows, please refer to Intel’s documentation on User Authorization.
When possible, we recommend you use Linux for remote distributed or concurrent computing with Lumerical. Alternatively, you can also use a non-network shared location to store the file.
As a last resort, it is also possible to use the Microsoft MPI, however, you must manually start the listening service on each node.
Configure login credentials
Windows
- Your user account should have a unique username and a password.
- We do not recommend using the default Administrator account on windows.
- When using Intel MPI, register your "user" credentials as shown here.
On Windows, use Intel MPI as the Job launching preset. Microsoft MPI or Local Computer are used when running only on the local machine.
Linux
Configure your compute nodes to allow remote login without a password. MPI uses SSH to start remote jobs. If this is not configured, the user will have to type their password each time the MPI is called to run the simulation.
Creating a passwordless SSH login
- On your primary computer, enter the following commands to create a set of ssh keys.
$ ssh-keygen -t rsa
- Press enter several times to accept all the defaults and an empty passphrase.
- This creates your public/private keys and saves them in your home directory
$HOME/.ssh
- We copy the keys to your authorized_key
$ cd ~/.ssh
$ cat >> authorized_keys < id_rsa.pub - Next, you must place your public key in the text file $HOME/.ssh/authorized_keys on each compute node
$ ssh <node name> "mkdir -p ~/.ssh; chmod 700 ~/.ssh"
$ cat ~/.ssh/id_rsa.pub | ssh <node name> "cat >> ~/.ssh/authorized_keys"
$ ssh <node name> "chmod 700 ~/.ssh/authorized_keys"
$ ssh <node_name>
On shared network file system
If your home directory is shared to all compute nodes, then you will only have to run the above command one time. Once you have completed this step, you should be able to log in to any of the compute nodes without entering a password.
Install Lumerical on a shared filesystem
See Shared filesystem installation on Linux for details.
Configure license
Refer to the "Configuring the License.ini file" section on this guide for details on setting up the global or system-wide license server configuration.
Configure resources
If you have a GUI connection to the cluster, you can run Lumerical simulation from the CAD/GUI and configure your resources depending on your use case.
- Open Lumerical CAD and open "Resources" to configure your resources.
- If you have a Job Scheduler installed on your cluster, see Job scheduler integration - resource configuration.
- Add or edit each resource as needed.