Performance

From OpenCV on the Cell

Jump to: navigation, search

This page describes the comparison of PPE and SPE execution times.

The conditions for measurement were as follows:

  • The number of SPE is 6.
  • Image size is 1024x768.
  • The result is an average of 10 executions.

The image size is likely to differ depending on the function.

If you are interested in benchmark program, get and run them according to the following. These programs need Ruby and eRuby:

$ svn co https://cvcell.svn.sourceforge.net/svnroot/cvcell/trunk/benchmark
$ cd image-benchmark
$ make bench-spe bench-ppe

and run viewresult.rb

$ ruby viewresult.rb
           cvAbsDiff,16u :               11.437                3.606   3.2
            cvAbsDiff,8u :                8.655                2.329   3.7
           cvAbsDiff,32s :                9.625                7.348   1.3
               cvAdd,16u :                8.255                0.625  13.2
                cvAdd,8u :               10.431                0.484  21.6
               cvAdd,32s :                8.831                0.822  10.7
       cvAddWeighted,16u :               61.750                9.951   6.2
        cvAddWeighted,8u :               15.854                5.098   3.1
       cvAddWeighted,32s :               50.629               15.874   3.2
               cvAnd,16u :                5.850                1.389   4.2


This table is generated by 'gentable.rb' program. 'gentable.rb' is contained in image-benchmark

Target functions Performance Comment
PPE
original
code
[ms]
SPE
optimized
code
[ms]
Ratio
0 100 %
fast slow
<--- --->
Pure SPE
(without
PPE-SPE
overhead)
[ms]
cvAbsDiff 16u 6.104 0.953 15.6 % 0.926  
8u 8.513 0.766 9.0 % 0.733  
32s 9.578 1.521 15.9 % 1.493  
cvAbsDiffS 16u 7.910 1.006 12.7 % 0.975  
8u 9.927 0.813 8.2 % 0.747  
32s 6.446 1.648 25.6 % 1.619  
cvAdd 16u 7.921 0.466 5.9 % 0.435  
8u 9.380 0.574 6.1 % 0.546  
32s 8.568 0.709 8.3 % 0.662  
cvAddS 16u 6.630 0.447 6.7 % 0.415  
8u 10.770 0.586 5.4 % 0.560  
32s 5.074 0.703 13.9 % 0.666  
cvAddWeighted 16u 60.791 8.348 13.7 % 0.506  
8u 13.269 4.837 36.5 % 0.542  
32s 49.539 12.679 25.6 % 0.925  
cvAnd 16u 4.283 0.429 10.0 % 0.403  
8u 2.110 0.272 12.9 % 0.249  
32s 8.778 0.667 7.6 % 0.641  
cvAndS 16u 3.084 0.463 15.0 % 0.422  
8u 1.517 0.258 17.0 % 0.234  
32s 5.642 0.692 12.3 % 0.657  
cvCalcArrHist 32x1 16.950 2.483 14.6 % 2.403  
32x32 16.974 2.548 15.0 % 2.443  
cvCmp cmpop0 9.073 0.315 3.5 % 0.291  
cmpop1 8.664 0.317 3.7 % 0.286  
cvCmpS cmpop0 8.174 0.356 4.4 % 0.286  
cmpop1 12.641 0.308 2.4 % 0.283  
cvConvertScale 8u32s 8.620 1.192 13.8 % 1.156  
16u8u 26.575 1.041 3.9 % 0.888  
8u16u 9.486 0.917 9.7 % 0.865  
32s8u 31.901 1.391 4.4 % 1.363  
cvCvtColor BGR2GRAY 8.408 0.336 4.0 % 0.261  
BGR2YCrCb 43.793 4.682 10.7 % 4.632  
BGR2HSV 33.864 1.509 4.5 % 1.458  
BGR2Lab 64.543 7.854 12.2 % 7.675  
GRAY2BGR 3.919 1.770 45.2 % 1.714  
cvDilate ksize=3,ch=3,shape=cross 49.078 0.621 1.3 % 0.541  
ksize=9,ch=3,shape=cross 138.701 2.405 1.7 % 2.254  
ksize=7,ch=3,shape=ellipse 268.906 2.614 1.0 % 2.530  
ksize=5,ch=1,shape=ellipse 47.699 0.683 1.4 % 0.483  
ksize=9,ch=3,shape=rect 88.022 0.624 0.7 % 0.549  
ksize=9,ch=1,shape=cross 46.356 0.913 2.0 % 0.799  
ksize=9,ch=3,shape=ellipse 455.859 3.573 0.8 % 3.412  
ksize=7,ch=3,shape=cross 112.006 1.971 1.8 % 1.891  
ksize=7,ch=1,shape=ellipse 89.874 0.968 1.1 % 0.888  
ksize=9,ch=1,shape=rect 29.397 0.375 1.3 % 0.300  
ksize=7,ch=3,shape=rect 71.837 0.545 0.8 % 0.484  
ksize=5,ch=1,shape=rect 18.087 0.290 1.6 % 0.216  
ksize=7,ch=1,shape=cross 37.312 0.756 2.0 % 0.677  
ksize=3,ch=3,shape=ellipse 49.005 0.647 1.3 % 0.539  
ksize=7,ch=1,shape=rect 23.568 0.370 1.6 % 0.258  
ksize=5,ch=1,shape=cross 26.823 0.541 2.0 % 0.425  
ksize=5,ch=3,shape=ellipse 142.909 1.420 1.0 % 1.316  
ksize=3,ch=1,shape=ellipse 16.340 0.329 2.0 % 0.226  
ksize=3,ch=3,shape=rect 38.528 0.396 1.0 % 0.316  
ksize=9,ch=1,shape=ellipse 153.068 1.269 0.8 % 1.187  
ksize=5,ch=3,shape=cross 81.294 1.220 1.5 % 1.142  
ksize=5,ch=3,shape=rect 54.903 0.437 0.8 % 0.357  
ksize=3,ch=1,shape=cross 16.549 0.307 1.9 % 0.228  
ksize=3,ch=1,shape=rect 12.453 0.276 2.2 % 0.193  
cvDiv 16u 7.728 2.190 28.3 % 2.125  
8u 6.996 2.203 31.5 % 2.175  
32s 6.198 2.294 37.0 % 2.266  
cvDotProduct 32f 9.035 0.519 5.7 % 0.493  
64f 15.523 1.783 11.5 % 1.746  
cvErode ksize=3,ch=3,shape=cross 49.686 0.666 1.3 % 0.542  
ksize=9,ch=3,shape=cross 139.481 2.336 1.7 % 2.256  
ksize=7,ch=3,shape=ellipse 272.361 2.621 1.0 % 2.540  
ksize=5,ch=1,shape=ellipse 47.741 0.562 1.2 % 0.480  
ksize=9,ch=3,shape=rect 90.582 0.670 0.7 % 0.549  
ksize=9,ch=1,shape=cross 45.695 0.881 1.9 % 0.800  
ksize=9,ch=3,shape=ellipse 455.103 3.513 0.8 % 3.411  
ksize=7,ch=3,shape=cross 111.957 1.975 1.8 % 1.894  
ksize=7,ch=1,shape=ellipse 89.806 0.973 1.1 % 0.893  
ksize=9,ch=1,shape=rect 29.859 0.380 1.3 % 0.299  
ksize=7,ch=3,shape=rect 73.287 0.554 0.8 % 0.463  
ksize=5,ch=1,shape=rect 18.782 0.289 1.5 % 0.215  
ksize=7,ch=1,shape=cross 37.309 0.753 2.0 % 0.677  
ksize=3,ch=3,shape=ellipse 49.041 0.620 1.3 % 0.543  
ksize=7,ch=1,shape=rect 24.235 0.337 1.4 % 0.257  
ksize=5,ch=1,shape=cross 26.703 0.504 1.9 % 0.425  
ksize=5,ch=3,shape=ellipse 143.704 1.390 1.0 % 1.322  
ksize=3,ch=1,shape=ellipse 16.324 0.308 1.9 % 0.226  
ksize=3,ch=3,shape=rect 38.318 0.395 1.0 % 0.315  
ksize=9,ch=1,shape=ellipse 150.434 1.266 0.8 % 1.185  
ksize=5,ch=3,shape=cross 81.829 1.219 1.5 % 1.141  
ksize=5,ch=3,shape=rect 60.526 0.441 0.7 % 0.350  
ksize=3,ch=1,shape=cross 16.549 0.307 1.9 % 0.232  
ksize=3,ch=1,shape=rect 14.341 0.276 1.9 % 0.194  
cvFilter2D kernelsize=3 22.174 1.009 4.6 % 0.949  
kernelsize=5 22.717 2.560 11.3 % 2.495  
kernelsize=7 23.014 4.321 18.8 % 4.269  
kernelsize=9 20.911 7.046 33.7 % 6.995  
cvFindStereoCorrespondence 8u 1396.111 27.077 1.9 % 22.200  
cvGEMM ch=1 419.555 46.960 11.2 % 45.802  
ch=2 933.260 84.262 9.0 % 81.926  
cvInRange 8u 26.461 2.611 9.9 % 0.287  
cvInRangeS 8u 20.624 2.531 12.3 % 0.287  
cvLUT 8u_16u 3.053 0.949 31.1 % 0.860  
8u_32s 3.102 0.948 30.6 % 0.900  
8u_8u 6.991 0.891 12.7 % 0.818  
cvMahalanobis 32f 8.209 1.620 19.7 % 1.564  
cvMax 16u 14.120 0.533 3.8 % 0.499  
8u 9.995 0.351 3.5 % 0.303  
32s 17.656 0.771 4.4 % 0.724  
cvMaxS 16u 12.302 0.488 4.0 % 0.463  
8u 6.996 0.329 4.7 % 0.303  
32s 14.251 0.776 5.4 % 0.734  
cvMin 16u 14.129 0.521 3.7 % 0.502  
8u 10.290 0.328 3.2 % 0.308  
32s 17.858 0.750 4.2 % 0.735  
cvMinS 16u 12.398 0.493 4.0 % 0.463  
8u 7.004 0.334 4.8 % 0.308  
32s 14.213 0.758 5.3 % 0.733  
cvMorphologyEx ksize=9,ch=1,op=4 69.375 1.340 1.9 % 0.268  
ksize=7,ch=3,op=1 144.339 1.689 1.2 % 0.936  
ksize=5,ch=3,op=3 137.571 1.862 1.4 % 0.545  
ksize=5,ch=1,op=1 36.800 0.696 1.9 % 0.336  
ksize=3,ch=1,op=3 34.596 0.967 2.8 % 0.268  
ksize=3,ch=3,op=2 105.161 1.377 1.3 % 0.538  
ksize=7,ch=3,op=2 174.377 1.643 0.9 % 0.546  
ksize=7,ch=1,op=0 48.318 0.995 2.1 % 0.501  
ksize=5,ch=3,op=4 138.328 1.844 1.3 % 0.550  
ksize=5,ch=1,op=2 46.564 0.888 1.9 % 0.268  
ksize=3,ch=1,op=4 34.533 1.000 2.9 % 0.267  
ksize=3,ch=3,op=3 105.953 1.648 1.6 % 0.544  
ksize=7,ch=3,op=3 174.072 2.256 1.3 % 0.545  
ksize=7,ch=1,op=1 48.533 0.966 2.0 % 0.461  
ksize=5,ch=1,op=3 46.342 1.055 2.3 % 0.268  
ksize=3,ch=3,op=4 105.121 1.632 1.6 % 0.538  
ksize=7,ch=3,op=4 174.413 2.292 1.3 % 0.540  
ksize=7,ch=1,op=2 58.143 0.973 1.7 % 0.268  
ksize=5,ch=1,op=4 45.997 1.013 2.2 % 0.266  
ksize=7,ch=1,op=3 57.406 1.267 2.2 % 0.269  
ksize=9,ch=3,op=0 179.900 2.168 1.2 % 1.137  
ksize=7,ch=1,op=4 57.343 1.285 2.2 % 0.267  
ksize=9,ch=3,op=1 188.808 2.059 1.1 % 1.134  
ksize=9,ch=3,op=2 213.900 1.821 0.9 % 0.548  
ksize=9,ch=1,op=0 58.627 1.045 1.8 % 0.491  
ksize=5,ch=3,op=0 110.894 1.257 1.1 % 0.647  
ksize=3,ch=1,op=0 24.961 0.649 2.6 % 0.328  
ksize=9,ch=3,op=3 207.612 2.616 1.3 % 0.546  
ksize=9,ch=1,op=1 58.650 1.007 1.7 % 0.513  
ksize=5,ch=3,op=1 111.285 1.267 1.1 % 0.611  
ksize=3,ch=1,op=1 25.252 0.656 2.6 % 0.363  
ksize=9,ch=3,op=4 207.352 2.680 1.3 % 0.546  
ksize=9,ch=1,op=2 68.803 1.083 1.6 % 0.291  
ksize=7,ch=3,op=0 145.843 1.675 1.1 % 0.852  
ksize=5,ch=3,op=2 138.029 1.438 1.0 % 0.538  
ksize=5,ch=1,op=0 36.834 0.708 1.9 % 0.335  
ksize=3,ch=1,op=2 34.640 0.876 2.5 % 0.268  
ksize=3,ch=3,op=0 76.167 1.035 1.4 % 0.495  
ksize=9,ch=1,op=3 69.709 1.346 1.9 % 0.268  
ksize=3,ch=3,op=1 76.596 1.087 1.4 % 0.480  
cvMul 16u 10.576 3.799 35.9 % 3.772  
8u 12.864 1.832 14.2 % 1.786  
32s 11.303 3.853 34.1 % 3.809  
cvNot 16u 2.521 0.407 16.1 % 0.382  
8u 1.272 0.235 18.5 % 0.209  
32s 5.236 0.614 11.7 % 0.586  
cvOr 16u 4.306 0.428 9.9 % 0.417  
8u 2.195 0.272 12.4 % 0.247  
32s 8.693 0.681 7.8 % 0.658  
cvOrS 16u 3.087 0.447 14.5 % 0.414  
8u 1.546 0.264 17.1 % 0.233  
32s 5.569 0.688 12.4 % 0.664  
cvPerspectiveTransform 3d 25.596 4.512 17.6 % 4.354  
2d 18.104 2.181 12.0 % 2.079  
cvRandArr normal-16u 531.870 417.865 78.6 % 417.805 This is running on a single SPE. The slow result is due to sequential operation. We need an algorithm that is suited to SIMD.
normal-32s 466.241 414.113 88.8 % 414.055
normal-8u 523.787 417.631 79.7 % 417.570
cvScaleAdd ch=1 3.219 0.614 19.1 % 0.585  
ch=2 8.125 1.041 12.8 % 1.004  
cvScaleAdd ch=1 3.219 0.633 19.7 % 0.588  
ch=2 8.064 1.047 13.0 % 1.007  
cvSub 16u 8.291 0.459 5.5 % 0.434  
8u 9.317 0.298 3.2 % 0.277  
32s 8.517 0.688 8.1 % 0.661  
cvSubRS 16u 6.636 0.449 6.8 % 0.456  
8u 8.327 0.290 3.5 % 0.276  
32s 5.086 0.696 13.7 % 0.663  
cvSubS 16u 6.590 0.446 6.8 % 0.415  
8u 10.813 0.590 5.5 % 0.560  
32s 5.117 0.738 14.4 % 0.659  
cvSum 8u3c 4.593 5.849 127.3 % 5.820  
8u1c 1.688 0.750 44.4 % 0.722  
32s3c 25.983 4.083 15.7 % 4.055  
32s1c 7.865 0.599 7.6 % 0.573  
cvTranspose 64f 50.214 1.658 3.3 % 0.825  
32s 25.758 1.042 4.0 % 0.627  
cvXor 16u 4.302 0.462 10.7 % 0.399  
8u 2.128 0.269 12.6 % 0.244  
32s 8.697 0.711 8.2 % 0.642  
cvXorS 16u 3.077 0.437 14.2 % 0.411  
8u 1.547 0.277 17.9 % 0.252  
32s 5.570 0.682 12.2 % 0.661  


An example of poor performance

The following function has been ommitted as a candidate for optimization. The reasons are described below.

  • Generally, when small data is processed, the overhead to processing time ratio increases.
  • Typically, an argument of this function contains small amount of data.


This table shows the execution time of PPE is faster than that of PPE-SPEs. "Pure SPE" means that an execution time excluding DMA and PPE-SPEs communication time.

Target functions Performance
PPE
original
code
[ms]
SPE
optimized
code
[ms]
Pure SPE
(without
PPE-SPE
overhead)
[ms]
cvCompareHist 32x1 0.001 0.043 0.002
32x32 0.004 0.073 0.021

Views
Personal tools
Toolbox