Activity 10: Parallelization and multi-processing in MsSpec

Activity 10: Parallelization and multi-processing in MsSpec#

As you can see from the previous examples, a complete simulation may require several multiple scattering calculations, for instance to calculate the total intensity of a substrate or to optimize the geometry of a system. As the calculations are often time consuming, it can be useful to distribute these tasks over several processors to make the most of hardware resources. Although MsSpec is not fully parallelized, the code does offer a number of features, which we will explore here.

Matrix inversion parallelization#

When available during installation, MsSpec will link with the system lapack library. It will be used to invert the matrix in the inversion option of the MsSpec calculator. To allow MsSpec to use this shared memory parallelism, you need to set the number of cores to be used in the OMP_NUM_THREADS environment variable.

You can set this variable just for the execution of your script. For example:

$ OMP_NUM_THREADS=12 python my_script.py

will use 12 cores for inverting the matrix in your script.

It is also possible to set environment variable inside your python script.

import os

os.environ['OMP_NUM_THREADS'] = 12

It may be useful for technical reasons or to use different number of cores in some parts of your script.

Process-based parallelism#

Another kind of parallelization used in MsSpec is multiprocessing. Quite often, you need to run different independent calculations. MsSpec provides a simple looper that can be useful for multiprocessing. Let’s demonstrate it with the previous example CO/Fe(001).

This script is the multiprocessed version of the previous one. You can see that the previous nested for loops are now replaced by some declarative content (lines 63-67) and the definition of a process function (whose name can be changed).

With the msspec.looper package, the user define Sweep objects that are parameters of the calculation or of the cluster. The process function must accept as many arguments as parameters to sweep (+ the **kwargs).

A Looper object is created (line 76) and the process function is set to its pipeline attribute (line 77). When MsSpec will run the looper, it will combine all parameters values to unique individual sets and MsSpec will distribute the calculations over the number of processors specified in the ncpu option.

# 1) Multiprocess calculations 
theta = Sweep(key='theta', comments="The molecule tilt angle",
              start=50, stop=60, step=1, unit='degree')
phi = Sweep(key='phi', comments="The molecule azimuthal angle",
            values=[0,45], unit='degree')

def process(theta, phi, **kwargs):
    cluster = create_cluster(theta=theta, phi=phi, height=0.6, bond_length=1.157)
    i = kwargs.get('sweep_index')
    data = compute_polar_scan(cluster, folder=f'calc_{i:d}')
    dset = data[-1]
    return dset.theta, dset.cross_section

looper = Looper()
looper.pipeline = process
df = looper.run(theta, phi, ncpu=4)

# Black magic to convert the pandas dataframe object 'df' to the 
# parameters dict and the resulst list (will be easier in a future
# version ;-) ).
parameters = df.to_dict('list')
results = np.reshape(parameters.pop('output'), (df.shape[0]*2,-1))

Quiz

In the paper discussed in Activity 9: Comparing simulation and experiment with R-factors, experimental values of the anisotropy suggest an adsorption height between 0.2 and 0.6 Å. Modify the script to add another sweep for variying the adsorption height of the CO molecule.

# 1) Multiprocess calculations 
theta = Sweep(key='theta', comments="The molecule tilt angle",
              start=50, stop=60, step=1, unit='degree')
phi = Sweep(key='phi', comments="The molecule azimuthal angle",
            values=[0,45], unit='degree')
height = Sweep(key='height', comments="The molecule adsorption height",
               start=0.2, stop=0.6, num=3, unit='angström')

def process(theta, phi, height, **kwargs):
    cluster = create_cluster(theta=theta, phi=phi, height=height, bond_length=1.157)
    i = kwargs.get('sweep_index')
    data = compute_polar_scan(cluster, folder=f'calc_{i:d}')
    dset = data[-1]
    return dset.theta, dset.cross_section

looper = Looper()
looper.pipeline = process
df = looper.run(theta, phi, height, ncpu=4)

# Black magic to convert the pandas dataframe object 'df' to the 
# parameters dict and the resulst list (will be easier in a future
# version ;-) ).
parameters = df.to_dict('list')
results = np.reshape(parameters.pop('output'), (df.shape[0]*2,-1))

Activity 10: Parallelization and multi-processing in MsSpec

Contents

Activity 10: Parallelization and multi-processing in MsSpec#

Matrix inversion parallelization#

Process-based parallelism#