Tuesday, July 30, 2013

How to link Intel MKL lapack

This article summarizes how to link existing C++ code with Intel MKL LAPACK without modifying a single line of the original source code that was written with standard LAPACK libraries and compiled by g++.

Intel MKL comprises so many things and for my purpose. In my case, I only want to know if MKL can provide better performance compared to GotoBLAS in my application. So I don't want to change any part of my code and just want to replace GotoBLAS with MKL. This article is written for pure newbies to Intel MKL and is not intended for learning MKL properly.

The task of replacing LAPACK with MKL turns out to be very simple: I used to link my code with GotoBLAS by flag -lgoto2 or standard LAPACK by -llapack. For MKL, this website helps a lot:

http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

After filling my platform information (GNU C/C++ and libgomp), the following flags are suggested:

 -L$MKLROOT/lib/intel64 -lmkl_rt -ldl -lpthread -lm

I replace -libgoto2 with the above flags, recompile my code, and then my code runs smoothly with MKL (and much faster than GotoBLAS for multithreaded cases but slower when single threaded).

This article here summarizes how to control the threads that MKL for the LAPACK part

http://software.intel.com/sites/products/documentation/hpc/mkl/lin/MKL_UG_managing_performance/Using_the_Intel_MKL_Parallelism.htm

In short, MKL automatically uses the number of physical cores as default number of threads. Hyperthreading is not considered as very little juice can be squeezed out when highly optimized and efficient code such MKL is used. If you want to control the number of threads, several global variables can be used and described clearly here:

http://software.intel.com/sites/products/documentation/hpc/mkl/lin/MKL_UG_managing_performance/Using_Additional_Threading_Control.htm

The global variables are independent from OpenMP counterparts. In my case, for example, when I want to use two threads only to solver my linear system Ax=b, I enter the following command in bash:

export MKL_NUM_THREADS=2

before I run my executable. Different performance being observed before and after setting up the global variable confirms the success of my attempt.