As simulation workflows can get quite demanding in terms of computational resources, both for simulations but also to process the results, using High Performance Computing (HPC) has become a necessity, whether it is on premises or in the cloud.
Ansys Lumerical provides the lumslurm python module to facilitate these workflows by automating the job submission for the simulations and post-processing while taking care of their inter-dependencies. This can be particularly useful on the cloud, where you can dynamically start and stop the instances when needed, optimizing the usage and cost.
Pre-requisite
This module is available with Ansys Lumerical 2023 R2.2 and later. As its name indicates, it is designed to submit jobs to the Slurm job scheduler. Other job schedulers are currently not supported.
Configuration
The lumslurm module includes a default configuration for the mpirun command, MPI library, the FDTD engine and GUI, PYTHONPATH and Python executable.
#defaut config
mpirun = 'mpirun'
mpilib = None
fdtd_engine = '/opt/lumerical/[[verpath]]/bin/fdtd-engine-ompi-lcl'
fdtd_gui = '/opt/lumerical/[[verpath]]/bin/fdtd-solutions'
pythonpath = '/opt/lumerical/[[verpath]]/api/python'
python = '/opt/lumerical/[[verpath]]/python/bin/python'
A custom configuration can be defined either at system level using a lumslurm.config file located in the same folder as lumslurm.py, or at the user level, using a .lumslurm.config file located in the $HOME directory. The configuration is stored in JSON format:
{
"mpirun": "/<openmpi_install_path>/linx64/bin/mpirun",
"mpilib": "/<openmpi_install_path>/linx64/lib",
"fdtd_engine": "/apps/lumerical/releases/[[verpath]].2/bin/fdtd-engine-ompi-lcl",
"fdtd_gui": "/opt/lumerical/[[verpath]]/bin/fdtd-solutions,
"pythonpath": "/opt/lumerical/[[verpath]]/api/python",
"python": "/opt/lumerical/[[verpath]]/python/bin/python"
}
Available functions
fdtd_memory_estimate
Estimate the memory requirements for an FDTD simulation filename.
def fdtd_memory_estimate(filename)
: filename: string
Simulation filename possibly including path
Returns a dict with fields memory, gridpoints and time_steps, where ‘memory field’ is the memory requirement in bytes.
partition_info
Query slurm for information about partitions.
def partition_info()
Returns a dict where key is partition name, value is a dict with partition details.
suggest_partitions
Suggest slurm partitions for solving fsp_file.
This function selects the slurm partitions with nodes with an optimal number of CPUs for the solve job. The solve job's memory requirements are estimated to determine the number of CPUs to use.
The suggested partitions will all contain nodes with the same number of CPU
def suggest_partitions(fsp_file)
: fsp_file: string
Simulation filename possibly including path
Returns a string: comma separated list of slurm partitions.
run_solve
Run an FDTD solve job using slurm.
def run_solve(fsp_file,
partition='auto',
nodes=1,
processes_per_node='auto',
threads_per_process='auto',
gpus_per_node=None,
block=False)
: fsp_file: string
name of fsp file, possibly including path
: partition: string (optional)
name of slurm partition to run job on. Multiple partitions allowed as comma separated list. Default value of 'auto' will result in partition being selected automatically. See suggest_partition()
: nodes: int (optional)
number of nodes for distributed solve. Default value of 1 runs solve on a single node
: processes_per_node: int (optional)
number of processes to run on each node. Default value 'auto' will automatically determine the number of processes based on available CPU and number of threads_per_process (if specified)
: threads_per_process: int (optional)
number of threads to use for each process. Default value of 'auto' will automatically determine the number of threads based on available CPU and the number of processes_per_node (if specified)
: block: Boolean (optional)
Wait until solve job completes before returning from this function. Default is true. Value of false will queue a job and return.
Returns a string with slurm job ID for the solve job.
run_script
Run a script job using slurm. A script job can be Lumerical script or Python. For Python, code can be passed as a string or a file. For Lumerical script, only files are supported.
def run_script(script_file=None,
script_code=None,
fsp_file=None,
partition=None,
threads='auto',
dependency=None,
job_name=None,
block=False)
: script_file: string (optional)
Filename of script possibly including path. If not specified then it is assumed you will supply the script_code parameter. Filename should end with lsf or py extension
: script_code: string (optional)
string containing Python code. If not specified it is assumes you will supply the script_file parameter
: fsp_file: string
name of fsp file, possibly including path. The fsp file will be passed as the second command line argument for Python. The fsp file will be loaded before the lsf script is run
: partition: string (optional)
name of slurm partition to run job on. Multiple partitions allowed as comma separated list. Default value of None will result in slurms's default partition being selected
: threads: int (optional)
number of threads to use for each process. Default value of 'auto' will automatically determine the number of threads based on available CPU
: dependency: string (optional)
ID or IDs of other slurm job(s) that must complete before this job will run. If none supplied then there is no dependency
: job_name: string (optional)
Name to use for slurm job. This will be displayed in squeue output
: block: boolean (optional)
Wait until solve job completes before returning from this function. Default value of false will queue job and return.
Returns a string with slurm job ID for the script job
run_py_script
Run a Python script job using Slurm. Optional arguments can be passed to the script.
def run_py_script(py_file,
data_file=None,
partition=None,
threads='auto',
dependency=None,
job_name=None,
args = [],
block=False)
: py_file: string
Python script file, possibly including path.
: data_file: string (optional)
name of fsp file. The fsp file will be passed as the second command line argument for Python.
: partition: string (optional)
name of slurm partition to run job on. Multiple partitions allowed as comma separated list. Default value of None will result in slurms's default partition being selected
: threads: int (optional)
number of threads to use for each process. Default value of 'auto' will automatically determine the number of threads based on available CPU
: dependency: string (optional)
ID or IDs of other slurm job(s) that must complete before this job will run. If none supplied then there is no dependency
: job_name: string (optional)
Name to use for slurm job. This will be displayed in squeue output
: args: list (optional)
List of arguments to pass to the Python script.
: block: boolean (optional)
Wait until solve job completes before returning from this function. Default value of false will queue job and return.
Returns a string with slurm job ID for the script job
run_batch
Run a set of FDTD solve jobs with optional post-processing scripts. Post-processing scripts or code are run after every solve job. The fsp file is supplied as a command line argument (Python) or loaded before script runs (lsf).
The optional collect script can be run when all solve and post-processing script jobs have been completed.
Solve jobs and script jobs can run on different slurm partitions.
For solve jobs you can configure distributed solves with nodes>1. You can easily set the number of processes per node and number of threads per process. If no values provided the function will choose good default values.
def run_batch(fsp_file_pattern,
postprocess_script=None,
postprocess_code=None,
collect_script=None,
collect_code=None,
solve_partition='auto',
solve_nodes=1,
solve_processes_per_node='auto',
solve_threads_per_process='auto',
solve_gpus_per_node=None,
script_partition=None,
script_threads='auto',
job_name=None,
block=True)
: fsp_file_pattern: string
pattern for fsp file name, possibly including path. Pattern follows glob syntax.
: postprocess_script: string (optional)
Filename of post-process script possibly including path. Post-process script is run after every solve job. Filename should end with lsf or py extension.
: postprocess_code: string (optional)
string containing Python code for post-process script.
: collect_script: string (optional)
Filename of collect script possibly including path. Collect script is run after every solve job and post-process script job has completed. Filename should end with lsf or py extension.
: collect_code: string (optional)
string containing Python code for collect script.
: solve_partition: string (optional)
name of slurm partition to run solve job on. Multiple partitions allowed as comma separated list. Default value of 'auto' will result in partition being selected automatically. See suggest_partition()
: solve_nodes: int (optional)
number of nodes for distributed solve. Default value of 1 runs solve on a single node
: solve_processes_per_node: int (optional)
number of processes to run on each node. Default value 'auto' will automatically determine the number of processes based on available CPU and number of threads_per_process (if specified)
: solve_threads_per_process: int (optional)
number of threads to use for each process. Default value of 'auto' will automatically determine the number of threads based on available CPU and the number of processes_per_node (if specified)
: script_partition: string (optional)
name of slurm partition to run script jobs on. Multiple partitions allowed as comma separated list. Default value of None will result in default slurm partition being selected
: script_threads: int (optional)
number of threads to use for each script process. Default value of 'auto' will automatically determine the number of threads based on available CPU
: job_name: string (optional)
Name to use for slurm job. This will be displayed in squeue output
: block: boolean (optional)
Wait until solve job completes before returning from this function. Default is true. Value of false will queue a job and return.
Returns a string with slurm job ID for the solve job.
run_sweep
Run an FDTD sweep or nested sweeps as independent jobs for solves and result computation. Each solve job for a parameter value is run as a slurm job. Solve jobs can be distributed or single node.
Results are computed as a post-processing step for each solve job in their own slurm job. This greatly accelerates heavy post-processing like far field projections.
Sweep results are collected by a final slurm job that runs when all the solve and post-processing jobs have finished.
Solve jobs and script jobs can run on different slurm partitions. For solve jobs you can configure distributed solves with nodes>1. You can easily set the number of processes per node and number of threads per process. If no values provided the function will choose good default values.
def run_sweep(fsp_file,
sweep_name,
solve_partition='auto',
solve_nodes=1,
solve_processes_per_node='auto',
solve_threads_per_process='auto',
solve_gpus_per_node=None,
script_partition=None,
script_threads='auto',
block=True)
: fsp_file: string
fsp file name, possibly including path.
: sweep_name: string
name of the sweep in fsp file that you will run (solve). For nested sweeps include the full hierarchy with :: separators. Eg outersweep::innersweep
: solve_partition: string (optional)
name of slurm partition to run solve jobs on. Multiple partitions allowed as comma separated list. Default value of 'auto' will result in partition being selected automatically. See suggest_partition()
: solve_nodes: int (optional)
number of nodes for distributed solve. Default value of 1 runs solve on a single node
: solve_processes_per_node: int (optional)
number of processes to run on each node. Default value 'auto' will automatically determine the number of processes based on available CPU and number of threads_per_process (if specified)
: solve_threads_per_process: int (optional)
number of threads to use for each process. Default value of 'auto' will automatically determine the number of threads based on available CPU and the number of processes_per_node (if specified)
: script_partition: string (optional)
name of slurm partition to run script jobs on (result computations and data collection). Multiple partitions allowed as comma separated list. Default value of None will result in default slurm partition being selected
: script_threads: int (optional)
number of threads to use for each script process. Default value of 'auto' will automatically determine the number of threads based on available CPU
: block: boolean (optional)
Wait until solve job completes before returning from this function. Default is true. Value of false will queue a job and return
Returns a string with slurm job ID for the solve job
Example
The following example is based on the Micro-LED example.
import lumslurm
lumslurm.run_sweep('micro_LED_cylindrical.fsp',
'position::polarization',
solve_partition='c6i-8xlarge',
script_partition='c6i-32xlarge')
Here we run a parameter sweep over the source position and orientation. Each individual simulation is submitted as an individual solve job on the c6i-8xlarge partition, while the analysis is submitted on the c6i-32xlarge partition. The analysis will be queued and only started once the corresponding solve job is completed.