Performance

From OpenCV on the Cell

Jump to: navigation, search

This page describes the comparison of PPE and SPE execution times.

The conditions for measurement were as follows:

  • The number of SPE is 6.
  • Image size is 640x480 pixels.
  • The result is an average of 10 executions.

If you are interested in benchmark program, get and run them according to the following. These programs need Ruby and eRuby:

$ svn co https://cvcell.svn.sourceforge.net/svnroot/cvcell/trunk/benchmark
$ cd image-benchmark
$ make bench-spe bench-ppe bench-spe-pure

and run viewresult.rb

$ ruby viewresult.rb
           cvAbsDiff,16u :               11.437                3.606   3.2
            cvAbsDiff,8u :                8.655                2.329   3.7
           cvAbsDiff,32s :                9.625                7.348   1.3
               cvAdd,16u :                8.255                0.625  13.2
                cvAdd,8u :               10.431                0.484  21.6
               cvAdd,32s :                8.831                0.822  10.7
       cvAddWeighted,16u :               61.750                9.951   6.2
        cvAddWeighted,8u :               15.854                5.098   3.1
       cvAddWeighted,32s :               50.629               15.874   3.2
               cvAnd,16u :                5.850                1.389   4.2


This table is generated by 'gentable.rb' program. 'gentable.rb' is contained in image-benchmark

Target functions Performance Comment
PPE
original
code
[ms]
SPE
optimized
code
[ms]
Ratio
0 100 %
fast slow
<--- --->
Pure SPE
(without
PPE-SPE
overhead)
[ms]
cvAbsDiff 16u 2.356 0.182 7.7 % 0.134  
8u 3.350 0.138 4.1 % 0.090  
32s 3.629 0.282 7.8 % 0.227  
cvAbsDiffS 16u 3.075 0.203 6.6 % 0.151  
8u 4.354 0.135 3.1 % 0.086  
32s 2.466 0.302 12.2 % 0.252  
cvAdd 16u 2.881 0.185 6.4 % 0.134  
8u 3.556 0.139 3.9 % 0.090  
32s 3.251 0.279 8.6 % 0.226  
cvAddS 16u 2.591 0.198 7.6 % 0.149  
8u 4.057 0.135 3.3 % 0.087  
32s 1.905 0.303 15.9 % 0.247  
cvAddWeighted 16u 23.663 0.653 2.8 % 0.149  
8u 5.171 0.446 8.6 % 0.088  
32s 19.934 0.990 5.0 % 0.250  
cvAnd 16u 1.658 0.180 10.9 % 0.133  
8u 0.828 0.139 16.8 % 0.090  
32s 3.273 0.281 8.6 % 0.225  
cvAndS 16u 1.225 0.213 17.4 % 0.148  
8u 0.530 0.159 30.0 % 0.086  
32s 2.092 0.302 14.4 % 0.248  
cvCalcArrHist 32x1 6.558 1.004 15.3 % 0.908  
32x32 6.538 1.058 16.2 % 0.910  
cvCmp cmpop0 3.514 0.143 4.1 % 0.091  
cmpop1 3.355 0.139 4.1 % 0.090  
cvCmpS cmpop0 3.165 0.136 4.3 % 0.089  
cmpop1 4.965 0.136 2.7 % 0.089  
cvConvertScale 8u32s 3.275 0.545 16.6 % 0.504  
16u8u 10.485 0.431 4.1 % 0.397  
8u16u 3.687 0.406 11.0 % 0.373  
32s8u 12.714 0.690 5.4 % 0.642  
cvCvtColor BGR2GRAY 3.460 0.207 6.0 % 0.154  
BGR2YCrCb 16.488 1.988 12.1 % 1.924  
BGR2HSV 12.978 0.683 5.3 % 0.624  
BGR2Lab 24.602 3.031 12.3 % 2.967  
GRAY2BGR 1.441 0.768 53.3 % 0.703  
cvDilate ksize=3,ch=1,shape=rect 5.344 0.205 3.8 % 0.114  
ksize=7,ch=3,shape=cross 49.044 0.834 1.7 % 0.732  
ksize=7,ch=1,shape=ellipse 39.792 0.462 1.2 % 0.367  
ksize=5,ch=1,shape=ellipse 21.152 0.292 1.4 % 0.204  
ksize=7,ch=3,shape=rect 30.499 0.310 1.0 % 0.220  
ksize=9,ch=3,shape=cross 60.694 0.995 1.6 % 0.893  
ksize=9,ch=1,shape=cross 20.240 0.433 2.1 % 0.338  
ksize=9,ch=3,shape=ellipse 200.637 1.463 0.7 % 1.353  
ksize=7,ch=3,shape=ellipse 119.366 1.088 0.9 % 0.982  
ksize=9,ch=3,shape=rect 37.507 0.361 1.0 % 0.267  
ksize=9,ch=1,shape=rect 12.625 0.256 2.0 % 0.166  
ksize=7,ch=1,shape=rect 10.147 0.234 2.3 % 0.147  
ksize=5,ch=1,shape=rect 7.663 0.213 2.8 % 0.128  
ksize=7,ch=1,shape=cross 16.526 0.376 2.3 % 0.285  
ksize=5,ch=3,shape=ellipse 63.040 0.601 1.0 % 0.494  
ksize=3,ch=3,shape=ellipse 21.067 0.341 1.6 % 0.247  
ksize=3,ch=1,shape=ellipse 7.089 0.208 2.9 % 0.122  
ksize=5,ch=3,shape=cross 35.018 0.522 1.5 % 0.424  
ksize=5,ch=1,shape=cross 11.712 0.268 2.3 % 0.181  
ksize=9,ch=1,shape=ellipse 66.979 0.593 0.9 % 0.491  
ksize=5,ch=3,shape=rect 23.243 0.297 1.3 % 0.179  
ksize=3,ch=3,shape=cross 21.103 0.344 1.6 % 0.247  
ksize=3,ch=1,shape=cross 7.066 0.210 3.0 % 0.122  
ksize=3,ch=3,shape=rect 16.293 0.243 1.5 % 0.150  
cvDiv 16u 2.732 0.211 7.7 % 0.160  
8u 2.280 0.218 9.6 % 0.168  
32s 2.512 0.285 11.3 % 0.232  
cvDotProduct 32f 3.470 0.247 7.1 % 0.212  
64f 6.069 0.727 12.0 % 0.686  
cvEqualizeHist 8u 8.382 0.832 9.9 % 0.327  
cvErode ksize=3,ch=1,shape=rect 5.078 0.200 3.9 % 0.114  
ksize=7,ch=3,shape=cross 49.053 0.863 1.8 % 0.736  
ksize=7,ch=1,shape=ellipse 39.764 0.468 1.2 % 0.367  
ksize=5,ch=1,shape=ellipse 21.087 0.289 1.4 % 0.202  
ksize=7,ch=3,shape=rect 27.510 0.312 1.1 % 0.220  
ksize=9,ch=3,shape=cross 60.625 1.005 1.7 % 0.903  
ksize=9,ch=1,shape=cross 20.207 0.427 2.1 % 0.338  
ksize=9,ch=3,shape=ellipse 200.581 1.464 0.7 % 1.358  
ksize=7,ch=3,shape=ellipse 119.097 1.085 0.9 % 0.983  
ksize=9,ch=3,shape=rect 33.323 0.363 1.1 % 0.267  
ksize=9,ch=1,shape=rect 11.237 0.279 2.5 % 0.166  
ksize=7,ch=1,shape=rect 9.145 0.250 2.7 % 0.147  
ksize=5,ch=1,shape=rect 7.007 0.236 3.4 % 0.129  
ksize=7,ch=1,shape=cross 16.420 0.405 2.5 % 0.285  
ksize=5,ch=3,shape=ellipse 63.085 0.588 0.9 % 0.489  
ksize=3,ch=3,shape=ellipse 21.049 0.335 1.6 % 0.245  
ksize=3,ch=1,shape=ellipse 7.080 0.207 2.9 % 0.121  
ksize=5,ch=3,shape=cross 35.051 0.518 1.5 % 0.416  
ksize=5,ch=1,shape=cross 11.722 0.267 2.3 % 0.178  
ksize=9,ch=1,shape=ellipse 66.994 0.590 0.9 % 0.490  
ksize=5,ch=3,shape=rect 21.385 0.267 1.2 % 0.180  
ksize=3,ch=3,shape=cross 21.047 0.334 1.6 % 0.245  
ksize=3,ch=1,shape=cross 7.062 0.210 3.0 % 0.121  
ksize=3,ch=3,shape=rect 15.534 0.241 1.6 % 0.150  
cvFilter2D kernelsize=3 8.782 0.356 4.1 % 0.305  
kernelsize=5 8.817 0.367 4.2 % 0.313  
kernelsize=7 8.793 0.508 5.8 % 0.450  
kernelsize=9 7.902 4.518 57.2 % 4.455  
cvFindStereoCorrespondence 8u 361.165 16.753 4.6 % 13.929  
cvGEMM ch=1 415.827 46.363 11.1 % 45.233  
ch=2 908.749 83.554 9.2 % 81.557  
cvInRange 8u 10.129 0.518 5.1 % 0.157  
cvInRangeS 8u 7.732 0.442 5.7 % 0.131  
cvLUT 8u_8u 2.725 0.378 13.9 % 0.327  
8u_16u 1.152 0.398 34.5 % 0.347  
8u_32s 1.158 0.483 41.7 % 0.435  
cvMahalanobis 32f 8.146 1.610 19.8 % 1.553  
cvMax 16u 2.357 0.181 7.7 % 0.134  
8u 4.062 0.140 3.4 % 0.091  
32s 3.649 0.297 8.1 % 0.226  
cvMaxS 16u 1.618 0.200 12.4 % 0.150  
8u 2.752 0.137 5.0 % 0.087  
32s 2.424 0.304 12.5 % 0.251  
cvMin 16u 2.206 0.181 8.2 % 0.134  
8u 3.856 0.140 3.6 % 0.091  
32s 3.485 0.276 7.9 % 0.228  
cvMinS 16u 1.614 0.202 12.5 % 0.150  
8u 2.752 0.136 4.9 % 0.089  
32s 2.398 0.301 12.6 % 0.250  
cvMorphologyEx ksize=9,ch=1,op=4 27.432 0.907 3.3 % 0.094  
ksize=7,ch=3,op=1 57.827 1.106 1.9 % 0.598  
ksize=5,ch=3,op=2 55.826 0.787 1.4 % 0.184  
ksize=5,ch=1,op=1 14.634 0.630 4.3 % 0.328  
ksize=3,ch=3,op=4 42.809 0.962 2.2 % 0.185  
ksize=3,ch=1,op=2 14.081 0.540 3.8 % 0.093  
ksize=7,ch=3,op=2 68.986 0.862 1.2 % 0.185  
ksize=7,ch=1,op=0 19.336 0.722 3.7 % 0.335  
ksize=5,ch=1,op=2 18.350 0.569 3.1 % 0.094  
ksize=7,ch=3,op=3 68.924 1.386 2.0 % 0.186  
ksize=7,ch=1,op=1 19.199 0.687 3.6 % 0.354  
ksize=5,ch=3,op=3 55.666 1.197 2.2 % 0.186  
ksize=5,ch=1,op=3 18.142 0.765 4.2 % 0.094  
ksize=3,ch=1,op=3 13.814 0.732 5.3 % 0.093  
ksize=7,ch=3,op=4 68.888 1.417 2.1 % 0.187  
ksize=7,ch=1,op=2 22.906 0.612 2.7 % 0.094  
ksize=5,ch=3,op=4 55.655 1.165 2.1 % 0.185  
ksize=5,ch=1,op=4 18.188 0.777 4.3 % 0.093  
ksize=3,ch=1,op=4 13.813 0.710 5.1 % 0.093  
ksize=7,ch=1,op=3 22.744 0.861 3.8 % 0.094  
ksize=9,ch=3,op=0 70.726 1.349 1.9 % 0.734  
ksize=7,ch=1,op=4 22.857 0.875 3.8 % 0.094  
ksize=9,ch=3,op=1 70.735 1.358 1.9 % 0.764  
ksize=9,ch=3,op=2 81.795 0.959 1.2 % 0.184  
ksize=9,ch=1,op=0 23.946 0.744 3.1 % 0.393  
ksize=3,ch=3,op=0 31.805 0.712 2.2 % 0.364  
ksize=9,ch=3,op=3 81.757 1.619 2.0 % 0.188  
ksize=9,ch=1,op=1 23.845 0.750 3.1 % 0.417  
ksize=3,ch=3,op=1 31.768 0.718 2.3 % 0.398  
ksize=9,ch=3,op=4 81.793 1.630 2.0 % 0.185  
ksize=9,ch=1,op=2 27.533 0.758 2.8 % 0.093  
ksize=7,ch=3,op=0 57.935 1.125 1.9 % 0.700  
ksize=5,ch=3,op=0 44.528 0.917 2.1 % 0.490  
ksize=5,ch=1,op=0 14.663 0.630 4.3 % 0.295  
ksize=3,ch=3,op=2 42.998 0.705 1.6 % 0.185  
ksize=3,ch=1,op=0 10.379 0.588 5.7 % 0.286  
ksize=9,ch=1,op=3 27.610 0.922 3.3 % 0.094  
ksize=5,ch=3,op=1 44.480 0.945 2.1 % 0.472  
ksize=3,ch=3,op=3 42.813 0.995 2.3 % 0.185  
ksize=3,ch=1,op=1 10.359 0.553 5.3 % 0.274  
cvMul 16u 4.254 0.184 4.3 % 0.136  
8u 5.006 0.145 2.9 % 0.095  
32s 4.304 0.281 6.5 % 0.229  
cvNot 16u 0.974 0.198 20.3 % 0.146  
8u 0.409 0.137 33.5 % 0.085  
32s 1.932 0.300 15.5 % 0.246  
cvOr 16u 1.650 0.181 11.0 % 0.135  
8u 0.817 0.148 18.1 % 0.090  
32s 3.273 0.283 8.6 % 0.226  
cvOrS 16u 1.196 0.199 16.6 % 0.148  
8u 0.507 0.135 26.6 % 0.086  
32s 2.128 0.303 14.2 % 0.248  
cvPerspectiveTransform 3d 29.865 5.162 17.3 % 5.082  
2d 21.359 2.440 11.4 % 2.367  
cvRandArr normal-16u 206.601 182.178 88.2 % 182.119 This is running on a single SPE. The slow result is due to sequential operation. We need an algorithm that is suited to SIMD.
normal-8u 204.119 181.861 89.1 % 181.805
normal-32s 180.506 180.444 100.0 % 180.372
cvScaleAdd ch=1 3.757 0.375 10.0 % 0.323  
ch=2 9.431 0.662 7.0 % 0.590  
cvScaleAdd ch=1 3.767 0.374 9.9 % 0.322  
ch=2 9.447 0.662 7.0 % 0.592  
cvSub 16u 3.260 0.183 5.6 % 0.135  
8u 3.593 0.143 4.0 % 0.091  
32s 3.275 0.283 8.6 % 0.227  
cvSubRS 16u 2.583 0.200 7.7 % 0.148  
8u 3.227 0.137 4.2 % 0.088  
32s 1.920 0.301 15.7 % 0.247  
cvSubS 16u 2.589 0.199 7.7 % 0.148  
8u 4.078 0.139 3.4 % 0.088  
32s 1.911 0.303 15.9 % 0.248  
cvSum 8u3c 1.719 0.279 16.2 % 0.226  
8u1c 0.399 0.167 41.9 % 0.122  
32s3c 9.112 0.457 5.0 % 0.395  
32s1c 2.954 0.247 8.4 % 0.198  
cvTranspose 64f 3.792 0.575 15.2 % 0.517  
32s 2.054 0.335 16.3 % 0.285  
cvXor 16u 1.658 0.183 11.0 % 0.133  
8u 0.832 0.138 16.6 % 0.091  
32s 3.298 0.281 8.5 % 0.225  
cvXorS 16u 1.213 0.200 16.5 % 0.148  
8u 0.474 0.140 29.5 % 0.087  
32s 2.164 0.374 17.3 % 0.248  
cvIntegral 8u32s,mode=0 1.687 0.869 51.5 % 0.278  
8u32s,mode=1 3.579 2.054 57.4 % 0.811  
8u32s,mode=2 5.503 2.330 42.3 % 0.753  
cvHaarDetectObjects lena,flag=0 1454.043 147.656 10.2 % 141.958  
lena,flag=1 1499.969 1502.872 100.2 % 0.285  
lena,flag=2 2690.345 88.363 3.3 % 83.755  
cvPyrDown 8u 2.347 0.155 6.6 % 0.108  
cvCornerHarris ksize=3 46.130 1.398 3.0 % 1.335  
ksize=5 45.843 1.540 3.4 % 1.477  
cvCornerMinEigenVal ksize=3 67.329 1.640 2.4 % 1.568  
ksize=5 67.155 1.771 2.6 % 1.710  
cvGoodFeaturesToTrack harris 55.266 4.109 7.4 % 0.262  
mineigen 79.428 5.132 6.5 % 0.263  
cvMinMaxLoc int 2.591 0.114 4.4 % 0.062  
float 3.002 0.114 3.8 % 0.062  
cvThreshold uchar 2.825 0.093 3.3 % 0.044  
float 3.408 0.173 5.1 % 0.124  


Contents

An example of poor performance

The following function has been ommitted as a candidate for optimization. The reasons are described below.

  • Generally, when small data is processed, the overhead to processing time ratio increases.
  • Typically, an argument of this function contains small amount of data.


This table shows the execution time of PPE is faster than that of PPE-SPEs. "Pure SPE" means that an execution time excluding DMA and PPE-SPEs communication time.

Target functions Performance
PPE
original
code
[ms]
SPE
optimized
code
[ms]
Pure SPE
(without
PPE-SPE
overhead)
[ms]
cvCompareHist 32x1 0.001 0.043 0.002
32x32 0.004 0.073 0.021

Name (required):

Website:

Comment:


Hameescalia said ...

http://brazil.mcneel.com/members/lopid.aspx lopid

--Hameescalia 01:33, 19 March 2010 (JST)

fitzgibbon said ...

components difficult working china warmest

--fitzgibbon 14:10, 21 April 2010 (JST)

brooksonea said ...

northern access frequency mean

--brooksonea 14:10, 21 April 2010 (JST)

reedballa said ...

activity http://whitepapers.scmagazineus.com union http://www.smh.com.au

--reedballa 14:11, 21 April 2010 (JST)

rydgesegal said ...

made records pattern observations

--rydgesegal 14:11, 21 April 2010 (JST)

ocelfaover said ...

aerosols http://blog.accsocal.com engine http://www.docstoc.com community total http://www.sciencedaily.com

--ocelfaover 14:12, 21 April 2010 (JST)

shanikavar said ...

carbon high albedo melting slowly

--shanikavar 14:13, 21 April 2010 (JST)

edrahuth said ...

fall safari oscillation address efforts uncertain substantial effect

--edrahuth 15:16, 30 April 2010 (JST)

hadenholde said ...

decade http://www.jasminedirectory.com jaiku measurements http://www.alistapart.com

--hadenholde 15:17, 30 April 2010 (JST)

pitneysutt said ...

domestic models forcing webmate called

--pitneysutt 15:18, 30 April 2010 (JST)

wattsonand said ...

inside new forcing variations ozone stance contribute annual

--wattsonand 15:19, 30 April 2010 (JST)

renayfunkh said ...

group http://digg.com http://www.envirovaluation.org movit comments http://www.sfu.ca

--renayfunkh 15:21, 30 April 2010 (JST)

orrickcott said ...

space twentieth related height

--orrickcott 15:21, 30 April 2010 (JST)

ediblylig said ...

I enjoyed reading your blog. Keep it that way.

--ediblylig 01:36, 22 August 2010 (JST)

payday loans said ...

This article gives the light in which we can observe the reality. This is very nice one and gives in-depth information. Thanks for this nice article

--payday loans 11:34, 1 September 2010 (JST)

Views
Personal tools
Toolbox