Developing machine vision software with Ruby instead of C/C++07 Nov 2012
When I started doing a PhD in machine vision in 2004 I didn't know what I was in for. I thought I would learn about various object recognition algorithms, implement them in C++, and then try to come up with something new. I was motivated to implement 2D object recognition and tracking algorithms and I was hoping to eventually get into 3D object recognition/tracking and/or Visual SLAM (simultaneous localisation and mapping).
The trouble started when I started to realise how many representations of images there are. I am not even talking about colour spaces or compressed images/videos. It is already sufficient to just consider one-channel grey images. Virtually every C/C++ library for handling images comes with its own data structures for representing images. I.e. when trying to use more than one C/C++ library at a time, one ends up writing a lot of code for converting between different representation of images.
It get's worse. CPUs usually offer arithmetic using 8-bit, 16-bit, 32-bit, and 64-bit integers which can be signed or unsigned. Also there are single-precision and double-precision floating point numbers (i.e. 10 or more different native data types). When implementing a C/C++ library which just wants to support basic binary operations (addition, subtraction, division, multiplication, exponent, comparisons, ...) for array-scalar, scalar-array, and array-array combinations, one quickly ends up with literally thousands of possible combinations. This leads to a combinatorial explosion of methods as one can see in the Framewave library for example.
In the end I wrote a thesis about a different way of implementing machine vision systems. The thesis shows how one can implement machine vision software using a popular dynamically typed programming language (i.e. the Ruby programming language).
The listing below shows an IRB (Interactive Ruby) session to illustrate the result. Comment lines (preceded with '#') show the output of the IRB REPL (read-eval-print loop). The session first opens a video display showing the camera image. After closing the window it shows a video display with the thresholded camera image.
See the picture below for an example of a thresholded image.
The example has just 7 lines of code. The REPL furthermore facilitates experimentation with machine vision software in an unprecedented way. The system achieves real-time by generating C-programs for the required operations, compiling them to Ruby extensions, and linking them on-the-fly.
I released the software as software libre under the name Hornetseye. My thesis is available for download now, too: Efficient Implementations of Machine Vision Algorithms using a Dynamically Typed Programming Language (Bibtex).
Here's the abstract:
Current machine vision systems (or at least their performance critical parts) are predominantly implemented using statically typed programming languages such as C, C++, or Java. Statically typed languages however are unsuitable for development and maintenance of large scale systems.
When choosing a programming language, dynamically typed languages are usually not considered due to their lack of support for high-performance array operations. This thesis presents efficient implementations of machine vision algorithms with the (dynamically typed) Ruby programming language. The Ruby programming language was used, because it has the best support for meta-programming among the currently popular programming languages. Although the Ruby programming language was used, the approach presented in this thesis could be applied to any programming language which has equal or stronger support for meta-programming (e.g. Racket (former PLT Scheme)).
A Ruby library for performing I/O and array operations was developed as part of this thesis. It is demonstrated how the library facilitates concise implementations of machine vision algorithms commonly used in industrial automation. That is, this thesis is about a different way of implementing machine vision systems. The work could be applied to prototype and in some cases implement machine vision systems in industrial automation and robotics.
The development of real-time machine vision software is facilitated as follows
- A just-in-time compiler is used to achieve real-time performance. It is demonstrated that the Ruby syntax is sufficient to integrate the just-in-time compiler transparently.
- Various I/O devices are integrated for seamless acquisition, display, and storage of video and audio data.
In combination these two developments preserve the expressiveness of the Ruby programming language while providing good run-time performance of the resulting implementation.
To validate this approach, the performance of different operations is compared with the performance of equivalent C/C++ programs.
I hope that my work has shown that the choice of programming language plays a fundamental role in the implementation of machine vision systems and that those choices should be revisited.
- HornetsEye: Ruby computer vision library (developed as part of this thesis)
- OpenCV: C/C++ real-time computer vision library
- NumPy: Python numerical arrays
- NArray: Ruby numerical arrays
- Lush: Lisp dialect for large-scale numerical and graphic applications
- Halide: a language for image processing and computational photography
- Maru: a symbolic expression evaluator that can compile its own implementation language
The thesis is now also available on Figshare.com.