Last modified: 25 September 2012 |
As the term implies, variable-length strings are strings of varying lengths; they can be arbitrarily long, anywhere from 1 character to thousands of characters.
HDF5 provides the ability to create a variable-length string datatype.
Like all string datatypes, this type is based on the
atomic string datatype:
H5T_C_S1
in C or
H5T_FORTRAN_S1
in Fortran.
While these datatypes default to one character in size,
they can be resized to specific fixed lengths
or to variable length.
Variable-length strings will transparently accommodate ASCII strings
or UTF-8 strings. This characteristic is set with
H5Tset_cset
in the process of creating the datatype.
The following HDF5 calls create a
C-style variable-length string datatype,
vls_type_c_id
:
vls_type_c_id = H5Tcopy(H5T_C_S1) status = H5Tset_size(vls_type_c_id, H5T_VARIABLE)In a C environment, variable-length strings will always be NULL-terminated, so the buffer to hold such a string must be one byte larger than the string itself to accommodate the NULL terminator.
In Fortran, strings are normally of fixed length.
Variable-length strings come into play only when data
is shared with a C application that uses them.
For such situations, the datatype class H5T_STRING
is
predefined by the HDF5 Library to accommodate variable-length strings.
The first HDF5 call below creates a Fortran string,
vls_type_f_id
, that will handle variable-length string data.
The second call sets the string padding value to space padding:
h5tcopy_f(H5T_STRING, vls_type_f_id, hdferr) h5tset_strpad_f(vls_type_f_id, H5T_STR_SPACEPAD_F, hdferr)While Fortran-style strings are generally space-padded, they may be NULL-terminated in cases where the data is also used in a C environment.
Note: Under the covers, variable-length strings are stored in a heap, potentially impacting efficiency in the following ways: