Performance
From OpenCV on the Cell
This page describes the comparison of PPE and SPE execution times.
The conditions for measurement were as follows:
- The number of SPE is 6.
- Image size is 1024x768.
- The result is an average of 10 executions.
The image size is likely to differ depending on the function.
If you are interested in benchmark program, get and run them according to the following. These programs need Ruby and eRuby:
$ svn co https://cvcell.svn.sourceforge.net/svnroot/cvcell/trunk/benchmark $ cd image-benchmark $ make bench-spe bench-ppe
and run viewresult.rb
$ ruby viewresult.rb
cvAbsDiff,16u : 11.437 3.606 3.2
cvAbsDiff,8u : 8.655 2.329 3.7
cvAbsDiff,32s : 9.625 7.348 1.3
cvAdd,16u : 8.255 0.625 13.2
cvAdd,8u : 10.431 0.484 21.6
cvAdd,32s : 8.831 0.822 10.7
cvAddWeighted,16u : 61.750 9.951 6.2
cvAddWeighted,8u : 15.854 5.098 3.1
cvAddWeighted,32s : 50.629 15.874 3.2
cvAnd,16u : 5.850 1.389 4.2
This table is generated by 'gentable.rb' program. 'gentable.rb' is contained in image-benchmark
| Target functions | Performance | Comment | ||||
|---|---|---|---|---|---|---|
| PPE original code [ms] | SPE optimized code [ms] | Ratio 0 100 % fast slow <--- ---> | Pure SPE (without PPE-SPE overhead) [ms] | |||
| cvAbsDiff | 16u | 6.104 | 0.953 | 15.6 % | 0.926 | |
| 8u | 8.513 | 0.766 | 9.0 % | 0.733 | ||
| 32s | 9.578 | 1.521 | 15.9 % | 1.493 | ||
| cvAbsDiffS | 16u | 7.910 | 1.006 | 12.7 % | 0.975 | |
| 8u | 9.927 | 0.813 | 8.2 % | 0.747 | ||
| 32s | 6.446 | 1.648 | 25.6 % | 1.619 | ||
| cvAdd | 16u | 7.921 | 0.466 | 5.9 % | 0.435 | |
| 8u | 9.380 | 0.574 | 6.1 % | 0.546 | ||
| 32s | 8.568 | 0.709 | 8.3 % | 0.662 | ||
| cvAddS | 16u | 6.630 | 0.447 | 6.7 % | 0.415 | |
| 8u | 10.770 | 0.586 | 5.4 % | 0.560 | ||
| 32s | 5.074 | 0.703 | 13.9 % | 0.666 | ||
| cvAddWeighted | 16u | 60.791 | 8.348 | 13.7 % | 0.506 | |
| 8u | 13.269 | 4.837 | 36.5 % | 0.542 | ||
| 32s | 49.539 | 12.679 | 25.6 % | 0.925 | ||
| cvAnd | 16u | 4.283 | 0.429 | 10.0 % | 0.403 | |
| 8u | 2.110 | 0.272 | 12.9 % | 0.249 | ||
| 32s | 8.778 | 0.667 | 7.6 % | 0.641 | ||
| cvAndS | 16u | 3.084 | 0.463 | 15.0 % | 0.422 | |
| 8u | 1.517 | 0.258 | 17.0 % | 0.234 | ||
| 32s | 5.642 | 0.692 | 12.3 % | 0.657 | ||
| cvCalcArrHist | 32x1 | 16.950 | 2.483 | 14.6 % | 2.403 | |
| 32x32 | 16.974 | 2.548 | 15.0 % | 2.443 | ||
| cvCmp | cmpop0 | 9.073 | 0.315 | 3.5 % | 0.291 | |
| cmpop1 | 8.664 | 0.317 | 3.7 % | 0.286 | ||
| cvCmpS | cmpop0 | 8.174 | 0.356 | 4.4 % | 0.286 | |
| cmpop1 | 12.641 | 0.308 | 2.4 % | 0.283 | ||
| cvConvertScale | 8u32s | 8.620 | 1.192 | 13.8 % | 1.156 | |
| 16u8u | 26.575 | 1.041 | 3.9 % | 0.888 | ||
| 8u16u | 9.486 | 0.917 | 9.7 % | 0.865 | ||
| 32s8u | 31.901 | 1.391 | 4.4 % | 1.363 | ||
| cvCvtColor | BGR2GRAY | 8.408 | 0.336 | 4.0 % | 0.261 | |
| BGR2YCrCb | 43.793 | 4.682 | 10.7 % | 4.632 | ||
| BGR2HSV | 33.864 | 1.509 | 4.5 % | 1.458 | ||
| BGR2Lab | 64.543 | 7.854 | 12.2 % | 7.675 | ||
| GRAY2BGR | 3.919 | 1.770 | 45.2 % | 1.714 | ||
| cvDilate | ksize=3,ch=3,shape=cross | 49.078 | 0.621 | 1.3 % | 0.541 | |
| ksize=9,ch=3,shape=cross | 138.701 | 2.405 | 1.7 % | 2.254 | ||
| ksize=7,ch=3,shape=ellipse | 268.906 | 2.614 | 1.0 % | 2.530 | ||
| ksize=5,ch=1,shape=ellipse | 47.699 | 0.683 | 1.4 % | 0.483 | ||
| ksize=9,ch=3,shape=rect | 88.022 | 0.624 | 0.7 % | 0.549 | ||
| ksize=9,ch=1,shape=cross | 46.356 | 0.913 | 2.0 % | 0.799 | ||
| ksize=9,ch=3,shape=ellipse | 455.859 | 3.573 | 0.8 % | 3.412 | ||
| ksize=7,ch=3,shape=cross | 112.006 | 1.971 | 1.8 % | 1.891 | ||
| ksize=7,ch=1,shape=ellipse | 89.874 | 0.968 | 1.1 % | 0.888 | ||
| ksize=9,ch=1,shape=rect | 29.397 | 0.375 | 1.3 % | 0.300 | ||
| ksize=7,ch=3,shape=rect | 71.837 | 0.545 | 0.8 % | 0.484 | ||
| ksize=5,ch=1,shape=rect | 18.087 | 0.290 | 1.6 % | 0.216 | ||
| ksize=7,ch=1,shape=cross | 37.312 | 0.756 | 2.0 % | 0.677 | ||
| ksize=3,ch=3,shape=ellipse | 49.005 | 0.647 | 1.3 % | 0.539 | ||
| ksize=7,ch=1,shape=rect | 23.568 | 0.370 | 1.6 % | 0.258 | ||
| ksize=5,ch=1,shape=cross | 26.823 | 0.541 | 2.0 % | 0.425 | ||
| ksize=5,ch=3,shape=ellipse | 142.909 | 1.420 | 1.0 % | 1.316 | ||
| ksize=3,ch=1,shape=ellipse | 16.340 | 0.329 | 2.0 % | 0.226 | ||
| ksize=3,ch=3,shape=rect | 38.528 | 0.396 | 1.0 % | 0.316 | ||
| ksize=9,ch=1,shape=ellipse | 153.068 | 1.269 | 0.8 % | 1.187 | ||
| ksize=5,ch=3,shape=cross | 81.294 | 1.220 | 1.5 % | 1.142 | ||
| ksize=5,ch=3,shape=rect | 54.903 | 0.437 | 0.8 % | 0.357 | ||
| ksize=3,ch=1,shape=cross | 16.549 | 0.307 | 1.9 % | 0.228 | ||
| ksize=3,ch=1,shape=rect | 12.453 | 0.276 | 2.2 % | 0.193 | ||
| cvDiv | 16u | 7.728 | 2.190 | 28.3 % | 2.125 | |
| 8u | 6.996 | 2.203 | 31.5 % | 2.175 | ||
| 32s | 6.198 | 2.294 | 37.0 % | 2.266 | ||
| cvDotProduct | 32f | 9.035 | 0.519 | 5.7 % | 0.493 | |
| 64f | 15.523 | 1.783 | 11.5 % | 1.746 | ||
| cvErode | ksize=3,ch=3,shape=cross | 49.686 | 0.666 | 1.3 % | 0.542 | |
| ksize=9,ch=3,shape=cross | 139.481 | 2.336 | 1.7 % | 2.256 | ||
| ksize=7,ch=3,shape=ellipse | 272.361 | 2.621 | 1.0 % | 2.540 | ||
| ksize=5,ch=1,shape=ellipse | 47.741 | 0.562 | 1.2 % | 0.480 | ||
| ksize=9,ch=3,shape=rect | 90.582 | 0.670 | 0.7 % | 0.549 | ||
| ksize=9,ch=1,shape=cross | 45.695 | 0.881 | 1.9 % | 0.800 | ||
| ksize=9,ch=3,shape=ellipse | 455.103 | 3.513 | 0.8 % | 3.411 | ||
| ksize=7,ch=3,shape=cross | 111.957 | 1.975 | 1.8 % | 1.894 | ||
| ksize=7,ch=1,shape=ellipse | 89.806 | 0.973 | 1.1 % | 0.893 | ||
| ksize=9,ch=1,shape=rect | 29.859 | 0.380 | 1.3 % | 0.299 | ||
| ksize=7,ch=3,shape=rect | 73.287 | 0.554 | 0.8 % | 0.463 | ||
| ksize=5,ch=1,shape=rect | 18.782 | 0.289 | 1.5 % | 0.215 | ||
| ksize=7,ch=1,shape=cross | 37.309 | 0.753 | 2.0 % | 0.677 | ||
| ksize=3,ch=3,shape=ellipse | 49.041 | 0.620 | 1.3 % | 0.543 | ||
| ksize=7,ch=1,shape=rect | 24.235 | 0.337 | 1.4 % | 0.257 | ||
| ksize=5,ch=1,shape=cross | 26.703 | 0.504 | 1.9 % | 0.425 | ||
| ksize=5,ch=3,shape=ellipse | 143.704 | 1.390 | 1.0 % | 1.322 | ||
| ksize=3,ch=1,shape=ellipse | 16.324 | 0.308 | 1.9 % | 0.226 | ||
| ksize=3,ch=3,shape=rect | 38.318 | 0.395 | 1.0 % | 0.315 | ||
| ksize=9,ch=1,shape=ellipse | 150.434 | 1.266 | 0.8 % | 1.185 | ||
| ksize=5,ch=3,shape=cross | 81.829 | 1.219 | 1.5 % | 1.141 | ||
| ksize=5,ch=3,shape=rect | 60.526 | 0.441 | 0.7 % | 0.350 | ||
| ksize=3,ch=1,shape=cross | 16.549 | 0.307 | 1.9 % | 0.232 | ||
| ksize=3,ch=1,shape=rect | 14.341 | 0.276 | 1.9 % | 0.194 | ||
| cvFilter2D | kernelsize=3 | 22.174 | 1.009 | 4.6 % | 0.949 | |
| kernelsize=5 | 22.717 | 2.560 | 11.3 % | 2.495 | ||
| kernelsize=7 | 23.014 | 4.321 | 18.8 % | 4.269 | ||
| kernelsize=9 | 20.911 | 7.046 | 33.7 % | 6.995 | ||
| cvFindStereoCorrespondence | 8u | 1396.111 | 27.077 | 1.9 % | 22.200 | |
| cvGEMM | ch=1 | 419.555 | 46.960 | 11.2 % | 45.802 | |
| ch=2 | 933.260 | 84.262 | 9.0 % | 81.926 | ||
| cvInRange | 8u | 26.461 | 2.611 | 9.9 % | 0.287 | |
| cvInRangeS | 8u | 20.624 | 2.531 | 12.3 % | 0.287 | |
| cvLUT | 8u_16u | 3.053 | 0.949 | 31.1 % | 0.860 | |
| 8u_32s | 3.102 | 0.948 | 30.6 % | 0.900 | ||
| 8u_8u | 6.991 | 0.891 | 12.7 % | 0.818 | ||
| cvMahalanobis | 32f | 8.209 | 1.620 | 19.7 % | 1.564 | |
| cvMax | 16u | 14.120 | 0.533 | 3.8 % | 0.499 | |
| 8u | 9.995 | 0.351 | 3.5 % | 0.303 | ||
| 32s | 17.656 | 0.771 | 4.4 % | 0.724 | ||
| cvMaxS | 16u | 12.302 | 0.488 | 4.0 % | 0.463 | |
| 8u | 6.996 | 0.329 | 4.7 % | 0.303 | ||
| 32s | 14.251 | 0.776 | 5.4 % | 0.734 | ||
| cvMin | 16u | 14.129 | 0.521 | 3.7 % | 0.502 | |
| 8u | 10.290 | 0.328 | 3.2 % | 0.308 | ||
| 32s | 17.858 | 0.750 | 4.2 % | 0.735 | ||
| cvMinS | 16u | 12.398 | 0.493 | 4.0 % | 0.463 | |
| 8u | 7.004 | 0.334 | 4.8 % | 0.308 | ||
| 32s | 14.213 | 0.758 | 5.3 % | 0.733 | ||
| cvMorphologyEx | ksize=9,ch=1,op=4 | 69.375 | 1.340 | 1.9 % | 0.268 | |
| ksize=7,ch=3,op=1 | 144.339 | 1.689 | 1.2 % | 0.936 | ||
| ksize=5,ch=3,op=3 | 137.571 | 1.862 | 1.4 % | 0.545 | ||
| ksize=5,ch=1,op=1 | 36.800 | 0.696 | 1.9 % | 0.336 | ||
| ksize=3,ch=1,op=3 | 34.596 | 0.967 | 2.8 % | 0.268 | ||
| ksize=3,ch=3,op=2 | 105.161 | 1.377 | 1.3 % | 0.538 | ||
| ksize=7,ch=3,op=2 | 174.377 | 1.643 | 0.9 % | 0.546 | ||
| ksize=7,ch=1,op=0 | 48.318 | 0.995 | 2.1 % | 0.501 | ||
| ksize=5,ch=3,op=4 | 138.328 | 1.844 | 1.3 % | 0.550 | ||
| ksize=5,ch=1,op=2 | 46.564 | 0.888 | 1.9 % | 0.268 | ||
| ksize=3,ch=1,op=4 | 34.533 | 1.000 | 2.9 % | 0.267 | ||
| ksize=3,ch=3,op=3 | 105.953 | 1.648 | 1.6 % | 0.544 | ||
| ksize=7,ch=3,op=3 | 174.072 | 2.256 | 1.3 % | 0.545 | ||
| ksize=7,ch=1,op=1 | 48.533 | 0.966 | 2.0 % | 0.461 | ||
| ksize=5,ch=1,op=3 | 46.342 | 1.055 | 2.3 % | 0.268 | ||
| ksize=3,ch=3,op=4 | 105.121 | 1.632 | 1.6 % | 0.538 | ||
| ksize=7,ch=3,op=4 | 174.413 | 2.292 | 1.3 % | 0.540 | ||
| ksize=7,ch=1,op=2 | 58.143 | 0.973 | 1.7 % | 0.268 | ||
| ksize=5,ch=1,op=4 | 45.997 | 1.013 | 2.2 % | 0.266 | ||
| ksize=7,ch=1,op=3 | 57.406 | 1.267 | 2.2 % | 0.269 | ||
| ksize=9,ch=3,op=0 | 179.900 | 2.168 | 1.2 % | 1.137 | ||
| ksize=7,ch=1,op=4 | 57.343 | 1.285 | 2.2 % | 0.267 | ||
| ksize=9,ch=3,op=1 | 188.808 | 2.059 | 1.1 % | 1.134 | ||
| ksize=9,ch=3,op=2 | 213.900 | 1.821 | 0.9 % | 0.548 | ||
| ksize=9,ch=1,op=0 | 58.627 | 1.045 | 1.8 % | 0.491 | ||
| ksize=5,ch=3,op=0 | 110.894 | 1.257 | 1.1 % | 0.647 | ||
| ksize=3,ch=1,op=0 | 24.961 | 0.649 | 2.6 % | 0.328 | ||
| ksize=9,ch=3,op=3 | 207.612 | 2.616 | 1.3 % | 0.546 | ||
| ksize=9,ch=1,op=1 | 58.650 | 1.007 | 1.7 % | 0.513 | ||
| ksize=5,ch=3,op=1 | 111.285 | 1.267 | 1.1 % | 0.611 | ||
| ksize=3,ch=1,op=1 | 25.252 | 0.656 | 2.6 % | 0.363 | ||
| ksize=9,ch=3,op=4 | 207.352 | 2.680 | 1.3 % | 0.546 | ||
| ksize=9,ch=1,op=2 | 68.803 | 1.083 | 1.6 % | 0.291 | ||
| ksize=7,ch=3,op=0 | 145.843 | 1.675 | 1.1 % | 0.852 | ||
| ksize=5,ch=3,op=2 | 138.029 | 1.438 | 1.0 % | 0.538 | ||
| ksize=5,ch=1,op=0 | 36.834 | 0.708 | 1.9 % | 0.335 | ||
| ksize=3,ch=1,op=2 | 34.640 | 0.876 | 2.5 % | 0.268 | ||
| ksize=3,ch=3,op=0 | 76.167 | 1.035 | 1.4 % | 0.495 | ||
| ksize=9,ch=1,op=3 | 69.709 | 1.346 | 1.9 % | 0.268 | ||
| ksize=3,ch=3,op=1 | 76.596 | 1.087 | 1.4 % | 0.480 | ||
| cvMul | 16u | 10.576 | 3.799 | 35.9 % | 3.772 | |
| 8u | 12.864 | 1.832 | 14.2 % | 1.786 | ||
| 32s | 11.303 | 3.853 | 34.1 % | 3.809 | ||
| cvNot | 16u | 2.521 | 0.407 | 16.1 % | 0.382 | |
| 8u | 1.272 | 0.235 | 18.5 % | 0.209 | ||
| 32s | 5.236 | 0.614 | 11.7 % | 0.586 | ||
| cvOr | 16u | 4.306 | 0.428 | 9.9 % | 0.417 | |
| 8u | 2.195 | 0.272 | 12.4 % | 0.247 | ||
| 32s | 8.693 | 0.681 | 7.8 % | 0.658 | ||
| cvOrS | 16u | 3.087 | 0.447 | 14.5 % | 0.414 | |
| 8u | 1.546 | 0.264 | 17.1 % | 0.233 | ||
| 32s | 5.569 | 0.688 | 12.4 % | 0.664 | ||
| cvPerspectiveTransform | 3d | 25.596 | 4.512 | 17.6 % | 4.354 | |
| 2d | 18.104 | 2.181 | 12.0 % | 2.079 | ||
| cvRandArr | normal-16u | 531.870 | 417.865 | 78.6 % | 417.805 | This is running on a single SPE. The slow result is due to sequential operation. We need an algorithm that is suited to SIMD. |
| normal-32s | 466.241 | 414.113 | 88.8 % | 414.055 | ||
| normal-8u | 523.787 | 417.631 | 79.7 % | 417.570 | ||
| cvScaleAdd | ch=1 | 3.219 | 0.614 | 19.1 % | 0.585 | |
| ch=2 | 8.125 | 1.041 | 12.8 % | 1.004 | ||
| cvScaleAdd | ch=1 | 3.219 | 0.633 | 19.7 % | 0.588 | |
| ch=2 | 8.064 | 1.047 | 13.0 % | 1.007 | ||
| cvSub | 16u | 8.291 | 0.459 | 5.5 % | 0.434 | |
| 8u | 9.317 | 0.298 | 3.2 % | 0.277 | ||
| 32s | 8.517 | 0.688 | 8.1 % | 0.661 | ||
| cvSubRS | 16u | 6.636 | 0.449 | 6.8 % | 0.456 | |
| 8u | 8.327 | 0.290 | 3.5 % | 0.276 | ||
| 32s | 5.086 | 0.696 | 13.7 % | 0.663 | ||
| cvSubS | 16u | 6.590 | 0.446 | 6.8 % | 0.415 | |
| 8u | 10.813 | 0.590 | 5.5 % | 0.560 | ||
| 32s | 5.117 | 0.738 | 14.4 % | 0.659 | ||
| cvSum | 8u3c | 4.593 | 5.849 | 127.3 % | 5.820 | |
| 8u1c | 1.688 | 0.750 | 44.4 % | 0.722 | ||
| 32s3c | 25.983 | 4.083 | 15.7 % | 4.055 | ||
| 32s1c | 7.865 | 0.599 | 7.6 % | 0.573 | ||
| cvTranspose | 64f | 50.214 | 1.658 | 3.3 % | 0.825 | |
| 32s | 25.758 | 1.042 | 4.0 % | 0.627 | ||
| cvXor | 16u | 4.302 | 0.462 | 10.7 % | 0.399 | |
| 8u | 2.128 | 0.269 | 12.6 % | 0.244 | ||
| 32s | 8.697 | 0.711 | 8.2 % | 0.642 | ||
| cvXorS | 16u | 3.077 | 0.437 | 14.2 % | 0.411 | |
| 8u | 1.547 | 0.277 | 17.9 % | 0.252 | ||
| 32s | 5.570 | 0.682 | 12.2 % | 0.661 | ||
[edit]
An example of poor performance
The following function has been ommitted as a candidate for optimization. The reasons are described below.
- Generally, when small data is processed, the overhead to processing time ratio increases.
- Typically, an argument of this function contains small amount of data.
This table shows the execution time of PPE is faster than that of PPE-SPEs.
"Pure SPE" means that an execution time excluding DMA and PPE-SPEs communication time.
| Target functions | Performance | |||
|---|---|---|---|---|
| PPE original code [ms] | SPE optimized code [ms] | Pure SPE (without PPE-SPE overhead) [ms] | ||
| cvCompareHist | 32x1 | 0.001 | 0.043 | 0.002 |
| 32x32 | 0.004 | 0.073 | 0.021 | |
