HDF5 is an open binary file format for storing and managing large, complex datasets. The file format was developed by the HDF Group, and is widely used in scientific computing. Lumerical's optical and electrical solvers have built-in script commands that can be used to read and import data from HDF5 format files. There are also a large number of free tools online that can be used to browse and manipulate HDF5 files. The HDF Group provides a cross-platform, Java-based file explorer, HDFView . This utility is useful for visually exploring the file structure, but is not suitable for extracting the data. In this article, we will describe how Lumerical's script commands can be used in conjunction with HDFView to import data from HDF5 files.
Exploring HDF5 files in HDFView
Download the HDF5 file processdata.h5. Open HDFView. Use the file browser to navigate to the folder where you downloaded processdata.h5. The following view of the file will appear:
Note the tree structure of the file in the file browser panel on the left hand side. HDF5 files are composed of a hierarchy of groups containing groups, sub-groups, datasets, and attributes.
Groups
Groups are containers that enable a hierarchical organization of the file. Each group can contain sub-groups, attributes, and datasets. The groups appear as folder icons in the HDFView program. In this example, the HDF5 file contains a group named "vertex." Double click on group "vertex" to expand it.
Attributes
Attributes are additional pieces of information that can be used to determine the nature of the data stored in the group and are stored as pairs of label and value. Select the group “vertex.” In the information panel at the bottom of the viewer, the group size and attributes are listed. In this group (vertex), the attributes are labeled MATLAB_class and MATLAB_fields. The MATLAB_class attribute shows that the data is in Matlab 'struct' type format and the MATLAB_fields attribute shows that the fields within the 'struct' data are x, y, and z.
Datasets
Datasets store the raw data in the HDF5 files. They appear as matrix icons in the HDFView tree. Select the dataset 'doping.' Information about the dataset will be provided in the information panel. HDF5 supports the complex organization of binary data, which is beyond the scope of this document. For more information, please consult the HDF Group website .
In this simple example, the data (floating point numbers) is stored in arrays. In the case of the scalar 'doping,' the array has dimension 1x18905 (1 row, 18905 columns). In the case of 'elements,' the the data is stored as a 4x87007 array (4 rows, 87007 columns).
Double clicking on a dataset will bring up the TableView of the dataset. Below is a screenshot showing partial data from the dataset 'elements.'
Accessing data in the HDF5 files using script commands
Lumerical's script environment offers three commands to read data from HDF5 format files: h5info , h5read , and h5readattr . In this part of the example, we will see how the data in the processdata.h5 file can be read inside Lumerical's script environment. To get started, download the script file readh5file.lsf in the same folder as the HDF5 file. In CHARGE (or FDTD or MODE), run the script file and the data from the HDF5 file will be read and loaded in the script workspace. The different component of the script file are discussed below.
Retrieving the file structure
The h5info command reads the structure of the HDF5 file into 'struct' format that can be accessed in the script workspace. In the script prompt window, execute the h5info command on the test file provided (using a "?" mark at the beginning will display the contents of the data).
filename = "processdata.h5";
data = h5info(filename);
?data;
> Struct with fields:
> Attributes
> Datasets
> Datatypes
> FileName
> Groups
> Name
The 'data' struct contains information about the HDF5 tree structure. More details about the data can be found by inspecting each of the elements of the struct.
?data.Groups;
> Cell array with 1 elements
?data.Datasets;
> Cell array with 2 elements
Based on the information provided by HDFView, we know that the HDF5 file has 1 group and 2 datasets in the root of the hierarchy tree. This is confirmed by the above script commands which show that 'data' has 1 group and 2 datasets. Further information about each of the groups and datasets can be found by inspecting the corresponding cell arrays. For example, the script below returns the name of the group (vertex), the names of its two attributes (MATLAB_class and MATLAB_fields), and the names of its three datasets (x, y, and z). Similar commands can be used to get information about the datasets 'doping' and 'elements.'
?data.Groups{1}.Name;
> /vertex
?data.Groups{1}.Attributes;
> Cell array with 2 elements
?data.Groups{1}.Attributes{1}.Name;
> MATLAB_class
?data.Groups{1}.Attributes{2}.Name;
> MATLAB_fields
?data.Groups{1}.Datasets;
> Cell array with 3 elements
?data.Groups{1}.Datasets{1}.Name;
> /vertex/x
?data.Groups{1}.Datasets{2}.Name;
> /vertex/y
?data.Groups{1}.Datasets{3}.Name;
> /vertex/z
Note that the structure information retrieved from the file using the h5info command only contains information about the groups (including attributes) and datasets, but not the data itself.
Accessing data
To retrieve the actual data from the HDF5 file, the h5read command is used. Each group, attribute, and dataset in an HDF5 file is located by its path, which is similar to a path in your computer's file system (e.g. C:\Users\ or /home/). When exploring the file structure (see the preceding section), note that the 'Name' of each group, attribute, and dataset gives the path to that object.
For example, the dataset 'x' had the path (Name) ‘/vertex/x'. To read the data in 'x,' execute the following command in the script prompt. The script below will read the value of x, show its dimension (18905x1) and print the value for the 1000th element.
x = h5read("processdata.h5","/vertex/x");
?size(x);
> result:
> 18905 1
?x(1000);
> result:
> 1.35405e-006
The same action can be performed using the following command as well (assuming that the variables filename and data are already created with the codes in the preceding section).
x = h5read(filename,data.Groups{1}.Datasets{1}.Name);
In the downloaded script file, the data from the other datasets are read using similar commands.
Reading attributes
The h5read command cannot read the attributes of a group or dataset. In order to read the attributes, we have to use the script command h5readattr . Since a single dataset can have multiple attributes and all of them will have the same path, the command needs three arguments. Besides taking in the name of the file and the path of the attribute (which is the name of the group or dataset to which the attribute belongs to), it also takes the name of the attribute itself as an input.
For example, recall that the group 'vertex' had the path (Name) '/vertex' and an attribute named 'MATLAB_class.' In the script command, execute the following command. The command shows that the data in the group 'vertex' is in 'struct' format.
?h5readattr("processdata.h5","/vertex","MATLAB_class");
> struct
The same action can be performed using the following command as well (assuming that the variables filename and data are already created with the codes in the preceding section).
?h5readattr(filename,data.Groups{1}.Name,data.Groups{1}.Attributes{1}.Name);
> struct
In the downloaded script, the units for the 'doping' dataset and the 'x' dataset are read using similar commands.
Using imported data
Once the data is imported from the HDF5 file into the script workspace, depending on the nature of the data, there are numerous options available to use them in the simulation/calculation. Often, HDF5 format files are used to import data in finite element format, like shown in the example above. These data usually contain geometric information that can be used to create structures and doping profiles. Once the data is imported into the script workspace, the ' Dataset builder' wizard can be used to create an unstructured dataset, build structures, and create doping profiles.