Haskell and OpenCV: Envisioning

March 14, 2016: Reformatting

Some time ago I forked Noam Lewis's HOpenCV bindings to the fine OpenCV library to fill them out with pieces I needed for several projects at work, and to experiment with how such bindings could be used. Over time I've built up some useful components, and, in a fit of non-procrastination, I've recently pushed many updates and assembled a fun demo program.

To try things out, grab the code and cabal install this alternate universe version of HOpenCV. I'm not going to go into too much detail here about how the library works, but performance deserves a mention, as does some explication of usage.

The first thing that strikes the inquisitive Haskeller when attempting to work with a native library such as OpenCV through the FFI is that image processing feels rather functional: we process an input image to produce an output image. Now, in practice, much OpenCV code is visibly caught between a naming rock and a naming hard place. Supposing an Image type in C, one sees things like,

...
cvCvtColor(rgbImg, grayImg, CV_RGB2GRAY);
cvThreshold(grayImg, grayImg, 100, 255, CV_THRESH_BINARY);
...

The programmer coverts an RGB image to a grayscale image, then thresholds the grayscale image before moving on. An issue one might have with this code is the choice of names: grayImg is a fine choice of name for a grayscale version of rgbImg, but using the same name for the thresholded version? That seems less fine. Note that the re-use of the same image value is perfectly reasonable as the intermediate value is not used, and the allocation savings can add up. To avoid having names that are either inappropriate for what they end up bound to, or are inappropriate for what they start out bound to, an alternative is to name all such re-used images something like tempImg. This alternative has the advantage of avoiding misleadingly suggestive names, at the expense of cluttering code up with tokens that are devoid of content for the reader.

If you are coming at this from a functional programming point of view, you want to write the active part of this little snippet as, cvThreshold 100 255 . cvCvtColor CV_RGB2GRAY.

To cut a long story short(er), this is how you string together operations in the version of HOpenCV you built from my github repository. Before we get to the literate Haskell part of this post, let's take one quick diversion into the FFI and in-place updates piece.

In order to avoid restricting every use of OpenCV to the IO monad, one can maintain referential transparency by never performing in-place updates, and wrapping all the calls into OpenCV with unsafePerformIO. The first part of this course of action is effected by always allocating a fresh image for any OpenCV operation to use as a destination. The two parts together are made somewhat prettier by relying on a small set of core combinators for wrapping calls into OpenCV. One of the combinators I use is called cv. Using cv results in compositions that look like, cv f . cv g . cv h.

The great part of function composition like this is that those hard-to-name intermediate values are out of our hair: the . solved our naming problem!

Remember that, in order to make OpenCV calls safe, we duplicate input before operating on the duplicate in-place. This means that the example composition is actually more like, cv f dup . cv g dup . cv h dup where dup clones the input value, and cv calls the OpenCV function, e.g. f, passing it the cloned value to be mutated in-place. What we can do instead is strike up a profitable conversation with the amazing programmable inliner GHC comes with to have this kind of computational pipeline rewritten to the moral equivalent of, cv (f.g.h) dup.

Et voila! We have a functional interface to OpenCV that pays a bare minimum in allocation to maintain purity. We can lift sub-expressions out of compound expressions and not have to worry about varying semantics.

So lets see about tamping down some constant factors of our processing code! The VideoFunhouse example application includes an effect that turns an image into a kind of blue print using four shades of blue, and thick edge tracing. This involves making quite a few passes over the input pixel data to threshold, mask, find edges, etc. On an older Core 2 Duo 2.33 GHz MacBook Pro, processing a 640x480 image takes 34.5 milliseconds (ms). Not bad? If we bump up the RTS's allocation size with +RTS -A8M to reflect that we're working with large-ish arrays (images), we knock that time down to 30.1ms. This is a dual core processor, so let's make sure GHC uses the cores with +RTS -N. This gets us to 27.7ms.

Now we'll try out two different options, one at a time. If we enable the rewrite rules described above for fusing operation compositions for in-place updates, we get performance all the way down to 23.9 ms. Alternately, we can use Control.Parallel lightweight par and pseq annotations to suggest to the RTS that it evaluate the two main components of the image processing effect in parallel. This option moves us from 27.7ms down to 19.9ms.

If we use par and the rewrite rules? 15.8ms per frame. This is more than twice as fast as where we started. We get a 21% improvement by mutating in-place whenever we're sure it is safe to do so. We are also getting a 34% speedup from the par annotations. The program used to measure the performance of these options using Criterion is here.

The example programs are browsable via the above links, but here is an example standalone program that runs Canny edge detection on a live webcam feed and shows the results in a window.

import AI.CV.OpenCV.HighCV

main = createCameraCapture (Just 0) >>= runWindow . fmap proc
  where proc = canny 70 110 3 . convertRGBToGray

I hope this library may be of some help to anyone interested in computer vision and Haskell. Moving forward, the biggest issue is how little of OpenCV is wrapped by this library, while the next only slightly smaller issue is that the code shows many obvious signs of having gone through several generations. The breadth of the OpenCV API will likely require automatically generated bindings, but making the Haskell side of the interface takes some thought, so I am not sure what the best way to reasonable API coverage is.