Tuesday 18 June 2013

New numpy.i typemaps for working with lists of 2-D and 3-D arrays.

Introduction

In pull request #3451 I mentioned two new typemaps for working with lists of 2-D arrays (and 3-D arrays), either as an input argument, or lists of 2-D arrays (and 3-D arrays) as an in-place argument.

DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3
DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3
DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
  DIM_TYPE DIM4
DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
  DIM_TYPE DIM4

When working with lists of arrays, using these new typemaps vs using the existing ones avoids copying data into a contiguous block of memory first.

The following for example is very costly (both in memory allocation and in time the operation takes):

a = numpy.ones((1024,1024),numpy.uint8)
la = [a]*100
b = numpy.mean(numpy.array(la,float),axis=0).astype(numpy.uint8) 

Unfortunately, wrapping a C function with the usual
DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3
will not speed things up much. A contiguous memory block of size DIM1*DIM2*DIM3 must still be allocated.

For this particular operation however, contiguity between slices is not necessary. If the slices themselves are contiguous, all that is needed is a list of pointers to the individual slices and the slices dimensions.

Limitations

For both the INPUT and INPLACE datatypes, the arrays in the list must all have the same type and size. For in-place operation, the slices must also be contiguous.

What happens in the background

IN_ARRAY3 / IN_ARRAY4 typemaps

When a list of 2-D or 3-D arrays (slices / cubes) is sent as an argument to the SWIG wrapped function, the following hapens:
  • We check that the Python object is a list.
  • Two arrays are created:
    • object_array is a list of pointers, with a maximum size of n, the number of elements in the list.
    • is_new_object_array records whether a new contiguous object needed creating
  • **object_array, dim1 (number of slices), dim2, dim3 (height and width of the slices) are passed to the C function. For lists of 3-D arrays, dim2,dim3,dim4 are the depth,height and width of the cubes.
  • After completion of the wrapped function, the destructor loops through object_array and new_object_array and destroys temporary objects (object_array != NULL && new_object_array[i] == 1). The destructor may also be called if an error occurred (one of the slices is a different size for example).
  • new_object_array is freed.
  • object_array is freed.

INPLACE_ARRAY3 / INPLACE_ARRAY4 typemaps

Almost the same process as before, but without the creation of temporary contiguous arrays (so no new_object_array needed). If one of the slices / cubes is a different size or isn't contiguous, we have an error state and the object_array is freed.

An example, processing video sequences

The reason I implemented these new typemaps was to process large lists of image arrays loaded from video sequences. Potentially, thousands of frames.

OpenCV has all the tools necessary for reading frames from a video file. Here is some code to store frames (as grayscale) into a list:
import cv2
import mytest

fn = r"ATM.mpg"
vidcap = cv2.VideoCapture(fn)

max_count = 50
count = 0
stack = []

success,image = vidcap.read()
if success:
    while success:
        stack.append(mytest.grayscale(image))
        count += 1

        success,image = vidcap.read()
        if count >= max_count:
            break

if count == 0:
    sys.exit(1)
Just as a sidenote, frames are immediately converted from RGB to grayscale using the grayscale function wrapped in mytest (see mytest.c).

For testing, I used the ATM sequence available from http://ivylab.kaist.ac.kr/demo/vs/dataset.htm (many thanks to KAIST for making these available!).

IN_ARRAY3 ** example code

Let's get back to our original example. We want to accelerate
b = numpy.mean(numpy.array(la, float),axis=0).astype(numpy.uint8)
for a list of 2-D arrays.
Here's what I wrote (see mytest.c):
void average(unsigned char **ArrayIn, int Zdim, int Ydim, int Xdim, unsigned char **pArrayOut, int *YdimOut, int *XdimOut)
{
    int z,i;

    unsigned char *ArrayOut = NULL;
    float temp;

    //allocating memory for the output image
    if (*pArrayOut == NULL) {
        *pArrayOut = (unsigned char *)calloc(Ydim*Xdim,sizeof(unsigned char));
    }
    ArrayOut = *pArrayOut;

    if (ArrayOut == NULL) {
        errno = ENOMEM;
        goto end;
    }

    #pragma omp parallel for        \
        default(shared) private(z,temp)

    for (i = 0; i < Ydim*Xdim; i++)
    {
        temp = 0;
        for (z = 0; z < Zdim; z++)
        {
            temp += ArrayIn[z][i];
        }
        temp = temp / Zdim;
        if (temp > 255)
            temp = 255;

        ArrayOut[i] = (unsigned char)temp;
    }

end:
    *pArrayOut = ArrayOut;
    *YdimOut = Ydim;
    *XdimOut = Xdim;
}

This, together with the typemap definition in mytest.i is all the code needed! The new IN_ARRAY3 typemap takes care of generating an **ArrayIn, which is then passed on to the average function (together with Zdim, Ydim, Xdim).

As mentioned previously, the output array is of type ARGOUTVIEWM_ARRAY2, which means the array we created inside the C function will be freed when the Python numpy array is destroyed. These will be available as standard in the next version of numpy.i .

INPLACE_ARRAY3 ** example code

If for some reason, we wanted to subtract this averaged image from each frame in the stack (as an in-place operation), here is what we would do:
void bsub(unsigned char **ArrayInpl, int Zdim, int Ydim, int Xdim, unsigned char *ArrayIn2, int Ydim2, int Xdim2)
{
    int i,z;

    if (Ydim!=Ydim2 || Xdim!=Xdim2) {
        errno = E2BIG;
        goto end;
    }

    #pragma omp parallel for        \
        default(shared) private(i)

    for (z = 0; z < Zdim; z++)
    {
        for (i = 0; i< Ydim*Xdim; i++)            
            ArrayInpl[z][i] = abs(ArrayInpl[z][i]-ArrayIn2[i]);
    }

end:
    return;
}

As usual, all the code is available from the EZwidgets googlecode SVN

Wrapping up

the mytest library can be compiled using the provided setup_mytest.py with the following command:
>> python setup_mytest.py build_ext --inplace

and some tests are performed with test_sequence.py:
>> python test_sequence.py

loaded 50 (h576 x w704) frames
testing mytest average...
  mytest average took 0.01 seconds, sum=51430105

testing numpy.average . . . . .
  numpy.mean took 2.45 seconds, sum=51430105

testing inplace background subtraction...
  bsub took 0.02 seconds (inplace op, done once)

saving images...

Particularly telling, is the difference between averaging using the mytest function and using numpy.mean.

The average image generated from the 50 first frames and abs(frame 33 - average) follow.



If you have any comments, please head back to pull request #3451 . Thank you.