Activity 10: Parallelization and multi-processing in MsSpec#
As you can see from the previous examples, a complete simulation may require several multiple scattering calculations, for instance to calculate the total intensity of a substrate or to optimize the geometry of a system. As the calculations are often time consuming, it can be useful to distribute these tasks over several processors to make the most of hardware resources. Although MsSpec is not fully parallelized, the code does offer a number of features, which we will explore here.
Matrix inversion parallelization#
When available during installation, MsSpec will link with the system lapack library. It will be used to invert the matrix in the inversion
option of the MsSpec calculator
. To allow MsSpec to use this shared memory parallelism, you need to set the number of cores to be used in the OMP_NUM_THREADS
environment variable.
You can set this variable just for the execution of your script. For example:
$ OMP_NUM_THREADS=12 python my_script.py
will use 12 cores for inverting the matrix in your script.
It is also possible to set environment variable inside your python script.
import os
os.environ['OMP_NUM_THREADS'] = 12
It may be useful for technical reasons or to use different number of cores in some parts of your script.
Process-based parallelism#
Another kind of parallelization used in MsSpec is multiprocessing. Quite often, you need to run different independent calculations. MsSpec provides a simple looper that can be useful for multiprocessing. Let’s demonstrate it with the previous example CO/Fe(001).
This script is the multiprocessed version of the previous one. You can see that the previous nested for loops are now replaced by some declarative content (lines 63-67) and the definition of a process
function (whose name
can be changed).
With the msspec.looper
package, the user define Sweep
objects that are parameters of the calculation or of the cluster. The process
function must accept as many arguments as parameters to sweep (+ the **kwargs
).
A Looper
object is created (line 76) and the process
function is set to its pipeline
attribute (line 77). When MsSpec will run the looper
, it will combine all parameters values to unique individual sets and MsSpec will distribute the calculations over the number of processors specified in the ncpu
option.
63# 1) Multiprocess calculations
64theta = Sweep(key='theta', comments="The molecule tilt angle",
65 start=50, stop=60, step=1, unit='degree')
66phi = Sweep(key='phi', comments="The molecule azimuthal angle",
67 values=[0,45], unit='degree')
68
69def process(theta, phi, **kwargs):
70 cluster = create_cluster(theta=theta, phi=phi, height=0.6, bond_length=1.157)
71 i = kwargs.get('sweep_index')
72 data = compute_polar_scan(cluster, folder=f'calc_{i:d}')
73 dset = data[-1]
74 return dset.theta, dset.cross_section
75
76looper = Looper()
77looper.pipeline = process
78df = looper.run(theta, phi, ncpu=4)
79
80# Black magic to convert the pandas dataframe object 'df' to the
81# parameters dict and the resulst list (will be easier in a future
82# version ;-) ).
83parameters = df.to_dict('list')
84results = np.reshape(parameters.pop('output'), (df.shape[0]*2,-1))
In the paper discussed in Activity 9: Comparing simulation and experiment with R-factors, experimental values of the anisotropy suggest an adsorption height between 0.2 and 0.6 Å. Modify the script to add another sweep for variying the adsorption height of the CO molecule.
63# 1) Multiprocess calculations
64theta = Sweep(key='theta', comments="The molecule tilt angle",
65 start=50, stop=60, step=1, unit='degree')
66phi = Sweep(key='phi', comments="The molecule azimuthal angle",
67 values=[0,45], unit='degree')
68height = Sweep(key='height', comments="The molecule adsorption height",
69 start=0.2, stop=0.6, num=3, unit='angström')
70
71def process(theta, phi, height, **kwargs):
72 cluster = create_cluster(theta=theta, phi=phi, height=height, bond_length=1.157)
73 i = kwargs.get('sweep_index')
74 data = compute_polar_scan(cluster, folder=f'calc_{i:d}')
75 dset = data[-1]
76 return dset.theta, dset.cross_section
77
78looper = Looper()
79looper.pipeline = process
80df = looper.run(theta, phi, height, ncpu=4)
81
82# Black magic to convert the pandas dataframe object 'df' to the
83# parameters dict and the resulst list (will be easier in a future
84# version ;-) ).
85parameters = df.to_dict('list')
86results = np.reshape(parameters.pop('output'), (df.shape[0]*2,-1))