A **parallel program** is designed to be run at the same time by several processors. The interest of a parallel program is that it is much faster than a non parallel equivalent (up to 1000 times).

R++ takes an interest in three sources of parallelism that are usable in current office computers: multi-core, graphics board, the cloud.

- The
**multi-core**solution is the fact of using simultaneously all the CPUs of a computer. - The
**graphics boards**(GPUs) are massively parallel electronic circuits with thousands of processors. Those have a very small memory, which could be a handicap generally speaking but doesn’t matter in the specific case of mathematical calculations such as the resolution of linear systems. - Finally, the
**cloud**is a network of connected computers.

The languages and architecture we tested are: R with the MICE package, C mono-core, C via the graphics board (CUDA) and C multi-core (6, 8, 10 and 12).

The processors we used are:

- CPU : Intel Xeon E5645@2.4Ghz
- GPU : Tesla C2050, 3GB, 1.15GHz
- Multi-core : 12-Core Intel Xeon E5645@2.4Ghz

For the test matrixes of different sizes have been studied: the number of variables goes from 100 to 1 000 and the number of observations for each of them goes from 1 000 to 1 000 000.

For each size of matrix we produced 10% missing values and fixed to 5 the number of imputations.

As part of his final year internship, Chai Anchen compared the efficiency of the bootstrap when using the different languages and architectures.

Multiple account assignment is a method that enables to make statistical analysis of incomplete data sets without underestimating the variance. The principle is:

- Initialise all the missing values. When a value is missing it is replaced with one of the possible values, which leads to a full set of data.
- “Predict” the first variable missing values, thanks to the all the other variables, with a linear regression. The missing values of the first variable are replaced with the predicted values.
- Iterate with all the variables. Each time replace the missing values with the predicted values.

Then the statistical analysis can be conducted with the completed data set.

No surprise, C is faster than R.

The performances of GPU and multi-core are less distinct. They are better than the CPU but only from a certain volume of data. Below, breaking down the data and communicating with the graphics board and the cores slow down the entire operation.

In the end:

- For small calculations C is better (although it’s neutral on one single calculation, waiting 1 or 10 milliseconds doesn’t matter).
- For big data sets the CPU is clearly more effective. It’s 1.24 times faster than the 12-multi-core, 2.4 times faster than the 4-multi-core, 3.7 times faster than C and 819 times faster than R.

For further information

**Download Chai Anchen’s thesis.** It explains in details the methods and results above.