Use OpenMP for Effective Parallelization
The OpenMP API is a powerful tool for parallel programming on multicore/SMT systems. It is supported by many compilers, and thus we can easily use it within our MEX files to speed up our computations even more.
For our cases, using OpenMP means executing the for loops that compute the predicted state samples/measurement samples/log-likelihood values in parallel by running multiple threads. Bascially, OpenMP works with #pragma preprocessor directives. Enabling a parallel for loop execution boils down to write a
#pragma omp parallel for
statement before a for loop. Thus, our previous examples simply have to be changed according to
// Compute a predicted state sample in each iteration #pragma omp parallel for for (int i = 0; i < numSamples; ++i) { const double px = stateSamples(0, i); const double py = stateSamples(1, i); const double dir = stateSamples(2, i); const double speed = stateSamples(3, i); const double turnRate = stateSamples(4, i); const double speedNoise = noiseSamples(0, i); const double turnRateNoise = noiseSamples(1, i); const double predSpeed = speed + speedNoise; const double predTurnRate = turnRate + turnRateNoise; const double predDir = dir + deltaT * predTurnRate; predictedStates(0, i) = px + std::cos(predDir) * deltaT * predSpeed; predictedStates(1, i) = py + std::sin(predDir) * deltaT * predSpeed; predictedStates(2, i) = predDir; predictedStates(3, i) = predSpeed; predictedStates(4, i) = predTurnRate; }
// Compute a measurement sample in each iteration #pragma omp parallel for for (int i = 0; i < numSamples; ++i) { const double px = stateSamples(0, i); const double py = stateSamples(1, i); measurements.col(i) = Eigen::Vector2d(std::sqrt(px * px + py * py), std::atan2(py, px)); }
// Compute a log-likelihood value in each iteration #pragma omp parallel for for (int i = 0; i < numSamples; ++i) { const double px = stateSamples(0, i); const double py = stateSamples(1, i); Eigen::Vector2d vec(std::sqrt(px * px + py * py), std::atan2(py, px)); vec -= measurement; const double logExpVal = -0.5 * vec.transpose() * invNoiseCov * vec; // Note that the computed likelihood values are only correct up to proportionality // as we omit the normalization part of the Gaussian PDF! logValues(i) = logExpVal; }
However, compiling C++ code with OpenMP requires to set certain compiler flags. Fortunately, the Nonlinear Estimation Toolbox provides a compileMexOpenMP() function in addition to compileMex(). It automatically enables OpenMP when compiling the C++ code. Thus, the modified C++ files now have to be compiled with
>> compileMexOpenMP('targetSystemEquation.cpp')
>> compileMexOpenMP('polarMeasurementEquation.cpp')
>> compileMexOpenMP('polarMeasLogLikelihood.cpp')
in order to recognize the added #pragma statements.
Multithreading does not necessarily speed up the code execution. If not enough samples are passed to the MEX files, the overhead of managing the threads, cache issues etc. can make the execution times even worse! So carefully evaluate your code to decide if multithreading is a benefit.
Of course, OpenMP offers much more than demonstrated here. For example, you can define critical sections or control the number of threads. For more information on that please refer to the OpenMP documentation.