Introduction
In pull request #3451 I mentioned two new typemaps for working with lists of 2-D arrays (and 3-D arrays), either as an input argument, or lists of 2-D arrays (and 3-D arrays) as an in-place argument.DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3 DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3 DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4 DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4
When working with lists of arrays, using these new typemaps vs using the existing ones avoids copying data into a contiguous block of memory first.
The following for example is very costly (both in memory allocation and in time the operation takes):
a = numpy.ones((1024,1024),numpy.uint8) la = [a]*100 b = numpy.mean(numpy.array(la,float),axis=0).astype(numpy.uint8)
Unfortunately, wrapping a C function with the usual
DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3
will not speed things up much. A contiguous memory block of size DIM1*DIM2*DIM3 must still be allocated.
For this particular operation however, contiguity between slices is not necessary. If the slices themselves are contiguous, all that is needed is a list of pointers to the individual slices and the slices dimensions.
Limitations
For both the INPUT and INPLACE datatypes, the arrays in the list must all have the same type and size. For in-place operation, the slices must also be contiguous.What happens in the background
IN_ARRAY3 / IN_ARRAY4 typemaps
When a list of 2-D or 3-D arrays (slices / cubes) is sent as an argument to the SWIG wrapped function, the following hapens:- We check that the Python object is a list.
- Two arrays are created:
- object_array is a list of pointers, with a maximum size of n, the number of elements in the list.
- is_new_object_array records whether a new contiguous object needed creating
- **object_array, dim1 (number of slices), dim2, dim3 (height and width of the slices) are passed to the C function. For lists of 3-D arrays, dim2,dim3,dim4 are the depth,height and width of the cubes.
- After completion of the wrapped function, the destructor loops through object_array and new_object_array and destroys temporary objects (object_array != NULL && new_object_array[i] == 1). The destructor may also be called if an error occurred (one of the slices is a different size for example).
- new_object_array is freed.
- object_array is freed.
INPLACE_ARRAY3 / INPLACE_ARRAY4 typemaps
Almost the same process as before, but without the creation of temporary contiguous arrays (so no new_object_array needed). If one of the slices / cubes is a different size or isn't contiguous, we have an error state and the object_array is freed.An example, processing video sequences
The reason I implemented these new typemaps was to process large lists of image arrays loaded from video sequences. Potentially,
thousands of frames.
import cv2 import mytest fn = r"ATM.mpg" vidcap = cv2.VideoCapture(fn) max_count = 50 count = 0 stack = [] success,image = vidcap.read() if success: while success: stack.append(mytest.grayscale(image)) count += 1 success,image = vidcap.read() if count >= max_count: break if count == 0: sys.exit(1)Just as a sidenote, frames are immediately converted from RGB to grayscale using the grayscale function wrapped in mytest (see mytest.c).
For testing, I used the ATM sequence available from http://ivylab.kaist.ac.kr/demo/vs/dataset.htm (many thanks to KAIST for making these available!).
IN_ARRAY3 ** example code
Let's get back to our original example. We want to accelerateb = numpy.mean(numpy.array(la, float),axis=0).astype(numpy.uint8)
for a list of 2-D arrays.
Here's what I wrote (see mytest.c):
void average(unsigned char **ArrayIn, int Zdim, int Ydim, int Xdim, unsigned char **pArrayOut, int *YdimOut, int *XdimOut) { int z,i; unsigned char *ArrayOut = NULL; float temp; //allocating memory for the output image if (*pArrayOut == NULL) { *pArrayOut = (unsigned char *)calloc(Ydim*Xdim,sizeof(unsigned char)); } ArrayOut = *pArrayOut; if (ArrayOut == NULL) { errno = ENOMEM; goto end; } #pragma omp parallel for \ default(shared) private(z,temp) for (i = 0; i < Ydim*Xdim; i++) { temp = 0; for (z = 0; z < Zdim; z++) { temp += ArrayIn[z][i]; } temp = temp / Zdim; if (temp > 255) temp = 255; ArrayOut[i] = (unsigned char)temp; } end: *pArrayOut = ArrayOut; *YdimOut = Ydim; *XdimOut = Xdim; }
This, together with the typemap definition in mytest.i is all the code needed! The new IN_ARRAY3 typemap takes care of generating an **ArrayIn, which is then passed on to the average function (together with Zdim, Ydim, Xdim).
As mentioned previously, the output array is of type ARGOUTVIEWM_ARRAY2, which means the array we created inside the C function will be freed when the Python numpy array is destroyed. These will be available as standard in the next version of numpy.i .
INPLACE_ARRAY3 ** example code
If for some reason, we wanted to subtract this averaged image from each frame in the stack (as an in-place operation), here is what we would do:void bsub(unsigned char **ArrayInpl, int Zdim, int Ydim, int Xdim, unsigned char *ArrayIn2, int Ydim2, int Xdim2) { int i,z; if (Ydim!=Ydim2 || Xdim!=Xdim2) { errno = E2BIG; goto end; } #pragma omp parallel for \ default(shared) private(i) for (z = 0; z < Zdim; z++) { for (i = 0; i< Ydim*Xdim; i++) ArrayInpl[z][i] = abs(ArrayInpl[z][i]-ArrayIn2[i]); } end: return; }
As usual, all the code is available from the EZwidgets googlecode SVN
Wrapping up
the mytest library can be compiled using the provided setup_mytest.py with the following command:>> python setup_mytest.py build_ext --inplace
and some tests are performed with test_sequence.py:
>> python test_sequence.py loaded 50 (h576 x w704) frames testing mytest average... mytest average took 0.01 seconds, sum=51430105 testing numpy.average . . . . . numpy.mean took 2.45 seconds, sum=51430105 testing inplace background subtraction... bsub took 0.02 seconds (inplace op, done once) saving images...
Particularly telling, is the difference between averaging using the mytest function and using numpy.mean.
The average image generated from the 50 first frames and abs(frame 33 - average) follow.
If you have any comments, please head back to pull request #3451 . Thank you.