.

Functional image processing in D

I’ve recently completed an overhaul of the graphics package of my D library. The goals for the overhaul were inspired by D’s std.algorithm and std.range modules:

  • Present everything as small, composable components
  • Avoid implicit copying and prefer lazy evaluation
  • Use templates for efficient code

From its first iteration, all components of the image processing package were templated by the color type. This is not a conventional way to implement graphics libraries – most libraries abstract away the exact color type the image uses behind an OOP interface, or they simply convert all images to a single in-memory pixel format. However, for most cases, this is wasteful and inefficient – usually, the programmer already knows the exact format that an image will be in, with notable exceptions being applications in which the image data comes from user input (e.g. image editors). Instead, this library declares all image types as templates, with the type indicating the image’s color being a template parameter.

I’m rather pleased with the result of this overhaul, so I’d like to share some highlights in this article.


The library starts with the definition of a view:

/// A view is any type which provides a width, height,
/// and can be indexed to get the color at a specific
/// coordinate.
enum isView(T) =
	is(typeof(T.init.w) : size_t) && // width
	is(typeof(T.init.h) : size_t) && // height
	is(typeof(T.init[0, 0])     );   // color information

This method of defining static interfaces is similar to the one used by D’s std.range, e.g. for isInputRange. Instead of defining an interface close to the OOP sense, D static interfaces conventionally are defined by testing if they implement certain features. This is done by checking if operations that the type is expected to implement compile with no errors, or evaluate to a certain type. Usually the IsExpression is used for this, or alternatively the compiles trait.

Similar to std.range.ElementType, we define a template to extract the type a view uses to indicate a pixel’s colors:

/// Returns the color type of the specified view.
alias ViewColor(T) = typeof(T.init[0, 0]);

Next, we define some view specializations:

/// Views can be read-only or writable.
enum isWritableView(T) =
	isView!T &&
	is(typeof(T.init[0, 0] = ViewColor!T.init));

/// Optionally, a view can also provide direct pixel
/// access. We call these "direct views".
enum isDirectView(T) =
	isView!T &&
	is(typeof(T.init.scanline(0)) : ViewColor!T[]);

Again, this is similar to how isForwardRange is defined: check that the type implements all of a base type’s features, as well as some additional features specific to this specialization.

Since the implementation of the view’s pixel access primitives can be derived from a direct view’s scanline primitive, we declare a template mixin which does just that:

/// Mixin which implements view primitives on top of
/// existing direct view primitives.
mixin template DirectView()
{
	alias COLOR = typeof(scanline(0)[0]);

	/// Implements the view[x, y] operator.
	ref COLOR opIndex(int x, int y)
	{
		return scanline(y)[x];
	}

	/// Implements the view[x, y] = c operator.
	COLOR opIndexAssign(COLOR value, int x, int y)
	{
		return scanline(y)[x] = value;
	}
}

For example, here’s the definition of the Image template, which defines a concrete in-memory image:

/// An in-memory image.
/// Pixels are stored in a flat array.
struct Image(COLOR)
{
	int w, h;
	COLOR[] pixels;

	/// Returns an array for the pixels at row y.
	COLOR[] scanline(int y)
	{
		assert(y>=0 && y<h);
		return pixels[w*y..w*(y+1)];
	}

	mixin DirectView;

	this(int w, int h)
	{
		size(w, h);
	}

	/// Does not scale image
	void size(int w, int h)
	{
		this.w = w;
		this.h = h;
		if (pixels.length < w*h)
			pixels.length = w*h;
	}
}

Arrays in D are represented as simply a pointer and a length (and allocate only when resized / concatenated), so the expression pixels[w*y..w*(y+1)] does not create a copy.

A unit test statically verifies that, indeed, an Image satisfies the isDirectView interface’s conditions:

unittest
{
	static assert(isDirectView!(Image!ubyte));
}

So, what can we do with this model?

Well, for one, we can define images which are not actually backed by any in-memory pixel store, but rather generate their pixels on-demand:

/// Returns a view which calculates pixels
/// on-demand using the specified formula.
template procedural(alias formula)
{
	alias fun = binaryFun!(formula, "x", "y");
	alias COLOR = typeof(fun(0, 0));

	auto procedural(int w, int h)
	{
		struct Procedural
		{
			int w, h;

			auto ref COLOR opIndex(int x, int y)
			{
				return fun(x, y);
			}
		}
		return Procedural(w, h);
	}
}

This eponymous template uses std.functional.binaryFun to accept a predicate in the form of either a string expression (which will be mixed in), or a delegate literal (lambda). As the function has an auto return type and returns a struct declared within the function, Procedural is an example of Voldemort types.

The simplest example of a procedural image is one of a solid color:

/// Returns a view of the specified dimensions
/// and same solid color.
auto solid(COLOR)(COLOR c, int w, int h)
{
	return procedural!((x, y) => c)(w, h);
}

Note how the color type of the returned view is inferred from the type of the c parameter, so e.g. solid(RGB(1, 2, 3), 10, 10) will return a view with RGB pixels, even though it has no fully-qualified name.


Another thing we can do with this model is create views that transform other views in some way or another. We define another template mixin for the common code:

/// Mixin which implements view primitives on top of
/// another view, using a coordinate transform function.
mixin template Warp(V)
{
	V src;

	auto ref ViewColor!V opIndex(int x, int y)
	{
		warp(x, y);
		return src[x, y];
	}

	static if (isWritableView!V)
	ViewColor!V opIndexAssign(ViewColor!V value, int x, int y)
	{
		warp(x, y);
		return src[x, y] = value;
	}
}

Note the line static if (isWritableView!V). It indicates that the view[x, y] = c operation should only be defined if the underlying view src supports it. This way, the warped view will only be writable if the underlying view is.

With this in hand, we can implement a cropping view, which presents a rectangular portion of another view:

/// Crop a view to the specified rectangle.
auto crop(V)(auto ref V src, int x0, int y0, int x1, int y1)
	if (isView!V)
{
	assert( 0 <=    x0 &&  0 <=    y0);
	assert(x0 <     x1 && y0 <     y1);
	assert(x1 <= src.w && y1 <= src.h);

	static struct Crop
	{
		mixin Warp!V;

		int x0, y0, x1, y1;

		@property int w() { return x1-x0; }
		@property int h() { return y1-y0; }

		void warp(ref int x, ref int y)
		{
			x += x0;
			y += y0;
		}

		static if (isDirectView!V)
		ViewColor!V[] scanline(int y)
		{
			return src.scanline(y0+y)[x0..x1];
		}
	}

	static assert(isDirectView!V == isDirectView!Crop);

	return Crop(src, x0, y0, x1, y1);
}

The if (isView!V) template constraint verifies that the first argument satisfies the conditions of the isView interface.

As above, crop uses isDirectView to provide direct pixel access if the underlying image supports it. Direct pixel access is useful when working with pixels in bulk will result in greater performance than accessing each pixel in turn. For example, when blitting one image to another, it is much faster to use array slice copies (D’s type-safe equivalent of memcpy) than assigning each pixel individually:

/// Blits a view onto another.
/// The views must have the same size.
void blitTo(SRC, DST)(auto ref SRC src, auto ref DST dst)
	if (isView!SRC && isWritableView!DST)
{
	assert(src.w == dst.w && src.h == dst.h, "View size mismatch");
	foreach (y; 0..src.h)
	{
		static if (isDirectView!SRC && isDirectView!DST)
			dst.scanline(y)[] = src.scanline(y)[];
		else
		{
			foreach (x; 0..src.w)
				dst[x, y] = src[x, y];
		}
	}
}

The same idea as crop can be used to implement a view that tiles another, or does nearest-neighbor scaling. (More complicated scaling algorithms are better implemented in an imperative style.) The code is similar to crop, so I have not included it here.

Even though crop takes the source as a regular argument, the intended usage of the function and its siblings is as if it was a method of the source view: someView.nearestNeighbor(100, 100).tile(1000, 1000).crop(50, 50, 950, 950). This capability is provided by a language feature called “Uniform Function Call Syntax” (UFCS for short), which allows writing a.fun(b...) instead of fun(a, b...). Its biggest benefit is that it allows chaining (a.fun1().fun2().fun3() instead of fun3(fun2(fun1(a)))), which both Phobos and this package take advantage of.


For simple transformations which do not change the view size, we can define a helper function that simply applies a user-specified formula to each pixel’s coordinates:

/// Return a view of src with the coordinates transformed
/// according to the given formulas
template warp(string xExpr, string yExpr)
{
	auto warp(V)(auto ref V src)
		if (isView!V)
	{
		static struct Warped
		{
			mixin Warp!V;

			@property int w() { return src.w; }
			@property int h() { return src.h; }

			void warp(ref int x, ref int y)
			{
				auto nx = mixin(xExpr);
				auto ny = mixin(yExpr);
				x = nx; y = ny;
			}

			private void testWarpY()()
			{
				int y;
				y = mixin(yExpr);
			}

			/// If the x coordinate is not affected and y does not
			/// depend on x, we can transform entire scanlines.
			static if (xExpr == "x" &&
				__traits(compiles, testWarpY()) &&
				isDirectView!V)
			ViewColor!V[] scanline(int y)
			{
				return src.scanline(mixin(yExpr));
			}
		}

		return Warped(src);
	}
}

warp uses a tricky method of inspecting the user-supplied formulas. The function testWarpY is declared as a template, yet with zero template arguments – this will cause the compiler to not perform semantic analysis on its body until it is instantiated. And since the function does not have an x symbol in its scope, it can only be instantiated successfully if the yExpr expression does not refer to x. The __traits(compiles, testWarpY()) static expression checks for just that. This allows us to define the direct view scanline primitive only if we can be sure that we can do so safely. Example:

/// Return a view of src with the x coordinate inverted.
alias hflip = warp!(q{w-x-1}, q{y});

/// Return a view of src with the y coordinate inverted.
alias vflip = warp!(q{x}, q{h-y-1});

/// Return a view of src with both coordinates inverted.
alias flip = warp!(q{w-x-1}, q{h-y-1});

The q{...} syntax is just a fancy way to declare a string literal. This syntax is conventionally used to contain D code, which is usually later mixin’d somewhere. The expression has access to all the symbols at the mixin site – in our case, the warp and testWarpY methods of the Warped struct.

Since vflip satisfies the first two conditions needed to declare the scanline method, someView.vflip() will be a direct view if someView is. This was achieved without special-casing the vflip declaration.

Because the abstraction we use does not rely on runtime polymorphism, the compiler is free to inline calls across all transformation layers. Flipping an image horizontally twice is a no-op – and, indeed, i[5, 5] and i.hflip().hflip()[5, 5] produce identical machine code. D compilers with more advanced backends can perform more advanced optimizations: for example, if we define a flipXY function which flips the X and Y axes of a view, and a rotateCW function (to rotate an image 90° clockwise) as src.flipXY().hflip(), then four successive rotateCW calls (a full 360°) get optimized away to nothing.


Let’s move on to performing operations on the pixels themselves. std.algorithm’s flagship function is map, which returns a range which lazily applies an expression over another range. Our colorMap applies the idea to colors:

/// Return a view which applies a predicate over the
/// underlying view's pixel colors.
template colorMap(alias pred)
{
	alias fun = unaryFun!(pred, false, "c");

	auto colorMap(V)(auto ref V src)
		if (isView!V)
	{
		alias OLDCOLOR = ViewColor!V;
		alias NEWCOLOR = typeof(fun(OLDCOLOR.init));

		struct Map
		{
			V src;

			@property int w() { return src.w; }
			@property int h() { return src.h; }

			auto ref NEWCOLOR opIndex(int x, int y)
			{
				return fun(src[x, y]);
			}
		}

		return Map(src);
	}
}

With colorMap, declaring a function which inverts an image’s colors is as simple as:

alias invert = colorMap!q{~c};

colorMap does not require that the input and output colors have the same type. This allows using it for color type conversion: read("image.bmp").parseBMP!RGB().colorMap!(c => BGRX(c.b, c.g, c.r)) will present an RGB bitmap as a BGRX view.


Image processing is often an embarrassingly parallelizable task. D’s std.parallelism helps in making parallel image processing trivial:

/// Splits a view into segments and
/// calls fun on each segment in parallel.
/// Returns an array of segments which
/// can be joined using vjoin or vjoiner.
template parallel(alias fun)
{
	auto parallel(V)(auto ref V src, size_t chunkSize = 0)
		if (isView!V)
	{
		auto processSegment(R)(R rows)
		{
			auto y0 = rows[0];
			auto y1 = y0 + rows.length;
			auto segment = src.crop(0, y0, src.w, y1);
			return fun(segment);
		}

		import std.range : iota, chunks;
		import std.parallelism : taskPool, parallel;

		if (!chunkSize)
			chunkSize = taskPool.defaultWorkUnitSize(src.h);

		auto range = src.h.iota.chunks(chunkSize);
		alias Result = typeof(processSegment(range.front));
		auto result = new Result[range.length];
		foreach (n; range.length.iota.parallel(1))
			result[n] = processSegment(range[n]);
		return result;
	}
}

Even though our parallel function template shares the name with the one we use from std.parallelism, there is no conflict, as they have different signatures and operate on distinct types.

With this, an operation can be split across multiple threads by replacing image.process() with image.parallel!(segment => segment.process()).vjoin().


Practical examples:

  1. Creating a small animated image:

    import std.algorithm;
    import std.parallelism;
    import std.range;
    import std.stdio;
    import std.string;
    
    import ae.utils.geometry;
    import ae.utils.graphics.color;
    import ae.utils.graphics.draw;
    import ae.utils.graphics.gamma;
    import ae.utils.graphics.image;
    
    void main()
    {
    	enum W = 4096;
    
    	const FG = L16(0);
    	const BG = L16(ushort.max);
    
    	auto image = Image!L16(W, W);
    	image.fill(BG);
    
    	enum OUTER = W/2 * 16/16;
    	enum INNER = W/2 * 13/16;
    	enum THICK = W/2 *  3/16;
    
    	image.fillCircle(W/2, W/2, OUTER, FG);
    	image.fillCircle(W/2, W/2, INNER, BG);
    	image.fillRect(0, W/2-INNER, W/2, W/2+INNER, BG);
    	image.fillRect(W/2-THICK/2, W/2-INNER, W/2+THICK/2, W/2+INNER, FG);
    
    	enum frames = 32;
    	foreach (n; frames.iota.parallel)
    		image
    		.rotate(TAU * n / frames, BG)
    		.copy
    		.downscale!(W/16)
    		.lum2pix(gammaRamp!(ushort, ubyte, ColorSpace.sRGB))
    		.toPNG
    		.toFile("loading-%02d.png".format(n++));
    }

    This program draws the initial image in higher resolution, using 16-bit luminance, which is later converted to 8-bit sRGB after downscaling. The downscaling from a higher resolution is done to avoid aliasing, and the gamma conversion is required for accurate resizing.

    Output (after converting the PNG frames to an animated GIF):

  2. Rendering a grayscale image of the Mandelbrot set:

    auto mandelbrot(int w, int h)
    {
    	import std.algorithm, std.range;
    	import ae.utils.graphics.view;
    	return procedural!((x, y)
    	{
    		auto c = (2.0*x/w - 1.5) + (2.0*y/h - 1.0)*1i;
    		return cast(ubyte)(1+
    			recurrence!((a, n) => c + a[n-1]^^2)(0+0i)
    			.take(ubyte.max)
    			.countUntil!(z => z.re^^2 + z.im^^2 > 4));
    	})(w, h);
    }
    
    void main()
    {
    	import std.stdio;
    	import ae.utils.graphics.image;
    	mandelbrot(500, 500).toPNG().toFile("mandel.png");
    }

    This program’s use of the recurrence, take, and countUntil D range primitives, as well as D’s native support of complex numbers, allows a much terser implementation of an algorithm usually requiring a dozen lines to implement. (Built-in complex number support is in the process of being deprecated in favor of std.complex, though.)

    Output:


The templated approach promises great performance advantages. As a simple benchmark, this program downscales a directory of images by 25%:

import std.file;
import std.path;
import std.stdio;

import ae.utils.graphics.color;
import ae.utils.graphics.gamma;
import ae.utils.graphics.image;

void main()
{
	alias BGR16 = Color!(ushort, "b", "g", "r");
	auto gamma = GammaRamp!(ushort, ubyte)(2.2);
	foreach (de; dirEntries("input", "*.bmp", SpanMode.shallow))
	{
		static Image!BGR scratch1;
		static Image!BGR16 scratch2, scratch3;

		de
		.read
		.parseBMP!BGR(scratch1)
		.parallel!(segment =>
			segment
			.pix2lum(gamma)
			.copy(scratch2)
			.downscale!4(scratch3)
			.lum2pix(gamma)
			.copy
		)(4)
		.vjoin
		.toBMP
		.toFile(buildPath("output-d", de.baseName));
	}
}

I’ve compared it with the equivalent ImageMagick command:

convert           \
 input/*.bmp      \
 -depth 16        \
 -gamma 0.454545  \
 -filter box      \
 -resize 25%      \
 -gamma 2.2       \
 -depth 8         \
 output-im/%02d.bmp

The D program runs about 4-5 times quicker. Of course, it’s not a fair comparison: even though both use 16-bit color depth, gamma correction, multithreading, and target the same CPU architecture, the D program consists of code specifically optimized for this task. Short of some sort of JIT, this cannot be matched by generic image-processing libraries.


The graphics package is available on GitHub. Thanks to David Ellsworth for his input to this article.

Comments