2012 0007 0008

Beating Interview Problems to Death with Parallel Haskell

Like anyone for whom graduation is becoming more immanent, I’ve been taking a look at the latest trends in the typical technology interview process. While many of the Fizz Buzzes being thrown around aren’t exactly exciting highlights of problem solving… well, you can always just beat them to death.

The Run Length Encoding algorithm is a nice, compact, and slightly real-world interview problem that has been making the rounds for years now. The basic idea being that “runs” of data, e.g. aaaabbbbbbb, are compressed into tuples, e.g. 4a7b, which may be a smaller representation if there is a large amount of adjacent repeated information. While real-world use cases for such a naïve compression scheme aren’t abundant, the algorithm is straightforward and can be implemented in a dozen lines or so in most languages. If you’ve got regexes or similar libraries at your disposal, you can manage even fewer lines. In Haskell’s case, one:

rleMap l = map (\e -> (head e, length e)) (group l)

which converts a string (or any arbitrary list of items) into a list of tuples, each of which has the character and its count. The function has type

rleMap :: (Eq a) => [a] -> [(a, Int)]

Simple and easy. But where’s the fun in calling it quits now? Let’s MapReduce our RLE algorithm to make it easier to parallelize and potentially Hadoop-friendly. We’ve already got our map function, so lets create a reduce:

rleReduce :: [(Char,Int)] -> [(Char,Int)] -> [(Char,Int)]
rleReduce [] [] = []
rleReduce a  [] = a
rleReduce [] b  = b
rleReduce a b
          | (fst $ last a ) == (fst $ head b) = 
                 init a ++  [(fst(last(a)),snd(last(a)) + snd(head(b)))] ++ tail b
          | otherwise = a ++ b

This is a less common component of RLE implementations (I haven’t spotted this particular bit of code anywhere else, so there’s likely room for improvement), but no less straightforward: simply join two RLE‘d lists together if their tail and head are not the same character; if they are, merge the head and tail tuple (updating the count) and combine the rest of the list as normal.

Now, it’s simply a matter of splitting the RLE target into pieces, maping over pieces, and reducing them back into a cohesive RLE-encoded document:

splitRLE n s = foldl rleReduce [] $ map rleMap $ chunkn n s

(chunkn is a simple hand-rolled routine that splits a string into n similar-sized pieces) As expected, splitting the list apart and recombining is needless overhead without parallelization:

# No splitting (rleMap s)
> ghc -O2 prle --make
> /usr/bin/time -f '%E' ./prle large.txt 1>/dev/null

# Nonparallel splitting (foldl rleReduce [] $ map rleMap $ chunkn n s)
> ghc -O2 prle --make
> /usr/bin/time -f '%E' ./prle large.txt 1>/dev/null

If we parallelize it using a simple parMap,

parallelRLE n s = foldl rleReduce [] $ (parMap rdeepseq) rleMap $ chunkn n s

we might expect some improvement:

# parallelRLE n s = foldl rleReduce [] $ (parMap rdeepseq) rleMap $ chunkn n s
> ghc -O2 prle --make -threaded -rtsopts

# Parallel map 1 core
> /usr/bin/time -f '%E' ./prle large.txt +RTS -N1 1>/dev/null

# Parallel map 2 cores 
> /usr/bin/time -f '%E' ./prle large.txt +RTS -N2 1>/dev/null

# Parallel map 4 cores
/usr/bin/time -f '%E' ./prle large.txt +RTS -N4 1>/dev/null

Unfortunately, the bookkeeping and garbage collection overwhelm the problem very quickly, never achieving better performance.

I’m running the above on a randomly-generated 12MB text file, and no amount of coaxing could make the parallelized version do any better. While we could have written our RLE algorithm in plain C without much more trouble and not have encountered such performance obstacles, one does not simply parallelize C by swapping in a parMap either (see also: this). Thus, we deep-dive into some Haskell optimization to get a performant version.

There is one particularly painful bottleneck: Haskell list monads are not ideal for handling bulk data of the sort we need because Haskell’s String type is represented as a [Char]. Since there’s no reason to use a boxed linked list just to scan over characters, we instead turn to Data.ByteString for reading the input, and to Data.Sequence to handle the RLE-encoded tuples. Data.Sequence specifically removes the large penalty when concatenating the lists together in the reduce step, as adding to either end of a Seq is a constant time operation. This is in contrast to [], where only adding an element to a list head is constant time. Importing these

import Data.ByteString.Lazy.Char8 as BL 
       , take
       , null
       , splitAt
       , group
       , head
       , pack
       , readFile)
import Data.Sequence as S

lets us rewrite our map

rleMap :: BL.ByteString -> Seq (Char,Int)
rleMap l = fromList $ P.zip (map BL.head c) (map (fromIntegral . BL.length) c)
        c = BL.group $ l

and reduce

rleReduce :: Seq (Char,Int) -> Seq (Char,Int) -> Seq (Char,Int)
rleReduce a b = rleReduce' (viewr a) (viewl b)
              rleReduce' EmptyR EmptyL = S.empty
              rleReduce' EmptyR _ = b
              rleReduce' _ EmptyL = a
              rleReduce' (rs :> r) (l :< ls)
                         | (fst r) == (fst l) = 
                           (rs |> (fst(r),snd(r) + snd(l))) >< ls
                         | otherwise = a >< b

Optionally, Data.Sequence can be expressed with ViewPatterns. Rewriting with these in mind allows the new reduce to resemble the old one fairly closely:

{-# LANGUAGE ViewPatterns #-}
rleReduce (viewr -> EmptyR) (viewl -> EmptyL) = S.empty
rleReduce (viewr -> EmptyR) b = b
rleReduce a (viewl -> EmptyL) = a
rleReduce a@(viewr -> (rs :> r)) b@(viewl -> (l :< ls))
           | (fst r) == (fst l) = 
             (rs |> (fst(r),snd(r) + snd(l))) >< ls
           | otherwise = a >< b

Now we finally define a new parallelRLE

parallelRLE :: Int -> BL.ByteString -> Seq (Char, Int)
parallelRLE n s = foldl rleReduce empty $ (parMap rseq) rleMap $ chunkn n s

and wrap it all up in a IO monad

main :: IO()
main = do
     [fileName] <- getArgs
     s <- (BL.readFile fileName)
     print (parallelRLE (numCapabilities) s)

With an improved algorithm and IO wrapper, it’s time for a more complete benchmark:

Performance Plot

This was run on a 0.5GB file, as the smaller 12MB file used above runs so fast that is essentially instant. Between 2 and 5 processors, we get a nicely ramped speedup. After 5 processors, the bookkeeping overhead rears its ugly head again reversing the trend, and around 48 processors (my system maximum), the parallelization ends up running as slowly as the unparallelized version. This is certainly not the end of possible optimizations, but we have to stop sometime.

While I’m no Haskell expert, parallelization which costs no more than swapping in a parMap and paying homage to the Big O gods is a very compelling reason to hammer out any other toy interview questions with Haskell in the future.

Get the code here. Feedback welcome.

2012 0005 0019

Google Completion in Emacs

While many an emacs include dabbrev-expand for within-buffer completion, I’ve always wanted (purely for reasons of amusement) to take it further: completion via Google’s search suggestions. I was going to do this as a weekend project, but an ugly version was surprisingly simple.

Conveniently, curl is all we need to fetch the completions for a query string as JSON:

> echo -en $(curl -H "Accept: application/json" \

["query",["query","query xiv","query letter","query_posts","query shark","query access","query tracker","query string","query letter sample","queryperformancecounter"]]

using a (very platform dependent) echo trick to convert the escaped unicode sequences to their proper characters.

With this, a quick hack in elisp is all that’s necessary to parse the results into emacs and insert it into the current buffer:

(defun google-request (query)
   "echo -en $(curl -H \"Accept: application/json\" \"http://suggestqueries.google.com/complete/search?client=firefox&q=%s\" 2>/dev/null)"

(defun google-preprocess (query)
 (let ((l (split-string 
	   (apply 'string 
		   '(?\" ?\[ ?\])
		   (string-to-list query)))
  (if (> (length (car (cdr l))) 0)
    (remove (car l) (cdr l))
(defun google-complete ()
 (end-of-thing 'word)
 (let ((s (thing-at-point 'word)))
  (when s
   (let ((q (google-preprocess (google-request s))))
    (when q
     (insert (substring
	       (car q)
	       (length s))))))))
(defun removel (el l)
 (cond (el (removel (cdr el) (remove (car el) l)))
       (t l)))

Since it went more swiftly than anticipated, I generalized the code to parsing any delimited shell output and wrapped it in a minor mode with some key bindings and customize variables. Right now, I’m uncreatively calling it shell-parse.el.

After activating shell-parse-mode, it has support for scrolling through the list of completions forwards (shell-parse-complete) and backwards (shell-parse-complete-backwards) with the C-Tab and C-Shift-Tab keys, respectively. Using M-x customize-mode <Enter> shell-parse-mode, you can swap out the curl command with any shell snippet that will kick back completions, and change the delimiter as well.

You can grab shell-parse.el on github. Simply load-file the shell-parse.el script in .emacs and it should be ready to go.

It has a few todos scattered through it yet, and is not very idiomatic emacs or portable, but that’s what github’s issue tracker is for, after all.

2012 0004 0007

Preprocessors and Graphviz

Graphviz is a useful toolset for describing and rendering graphs. One of the features the graphviz language doesn’t have, however, is a C-like preprocessor to #include files. Which, granted, isn’t a particularly common use case when building graphs, but one I found desirable when dealing with a large number of graphs with a shared subset of nodes, differing by how they are connected.

Initially, I grafted together an unwieldy script that used an ugly grep+sed combination to grab the line number and substitute the included file contents: essentially a poor man’s preprocessor. Thankfully, to the rescue was a particularly useful gist (initially illustrated with JavaScript) I serendipitously found on Reddit. The key call being this:

cpp -P -undef -Wundef -std=c99 -nostdinc -Wtrigraphs \
    -fdollars-in-identifiers -C < input.dot > output.dot

In your input .dot file, standard C include syntax

#include "common.doth"

will work as expected, despite the fact that it is a completely different language.

This solution is highly verbose if you don’t drop it in a build chain and forget about it. Which is straightforward using a standard makefile:

SOURCES = $(wildcard *.dot)
OBJECTS = $(SOURCES:.dot=.doto)
IMAGES  = $(SOURCES:.dot=.png)

all: $(IMAGES)

%.png: %.doto
	dot -Tpng $< > $@

%.doto: %.dot
	cpp -P -undef -Wundef -std=c99 -nostdinc -Wtrigraphs \
		-fdollars-in-identifiers -C < $< | \
		gvpr -c 'N[$$.degree==0]{delete(NULL,$$)}' > $@

	-rm $(IMAGES) $(OBJECTS) *~

In the above example, I introduced an intermediate doto file (analogous to an object file) and doth (header file) to recreate a C-like build process.

Another ingredient above is the piped invocation of gvpr, which removes nodes of degree 1 (so that included nodes that are not attached to anything in the current file will be ignored). Remember that in makefiles, the $ must be escaped by using $$. Unfortunately, the delete function in gvpr is broken in a number of Debian-based distros (at least), but the latest version works bug-free in Arch.

So, given a nodes.doth file with these contents:

    node1 [label = "1"];
    node2 [label = "2"];
    node3 [label = "3"];

and a graph.dot file as such:

graph g {
    #include "nodes.doth"

    node1 -- node2;
    node2 -- node3;
    node3 -- node1;

the makefile will use cpp to generate the following intermediate file,

graph g {
    node1 [label = "1"];
    node2 [label = "2"];
    node3 [label = "3"];
    node1 -- node2;
    node2 -- node3;
    node3 -- node1;

which will then be compiled by graphviz’s dot command into an image. While obviously not necessary with this toy example, scaling up to more nodes shared by multiple graphs is much more pleasant when the nodes don’t have to be duplicated in each graph.

Very little of this is exclusive to graphviz, and is reasonable to extrapolate to other problems fairly easily. And, since this literally uses the C preprocessor to do the job, there’s many more tricks to be explored.

Quite helpful to have on hand when the need arises, and a testament to the quality of cpp that it can be used for arbitrary metaprogramming in other languages.

2012 0003 0024

QR Codes En Masse

For the upcoming POSSCON here in Columbia, we had need of QR codes for the brochure. Lots of them. And while there are some great online resources, I wanted to create QR codes in batch.

Of course, the online services could be batch processed with a dose of curl magic, but there is a more UNIX way: libqrencode.

Creating a discrete QR code image is straightforward with the qrencode command:

qrencode -o output.png -s 50 "http://www.malloc47.com"

The -s parameter controls the size of the QR “dots” and therefore the output resolution.

This is great if you don’t need a vectorized format, but for print-quality work where you may not know the eventual DPI, having vectorized output (e.g. eps, and svg) is preferable.

Again, the vast repositories of libre software come to the rescue here: potrace is designed for exactly this. Annoyingly, it only handles bitmap (or the easy-to-generate pnm) format, so a little imagemagick will take care of this:

convert output.png output.bmp

Now we can convert to a vector format easily:

potrace -e -o output.eps output.bmp # -e is for EPS
potrace -s -o output.svg output.bmp # -s is for SVG

We can wrap it all up into a nice (bash) script:

set -e
qrencode -o $1.png -s 50 "$2"
convert $1.png $1.bmp
potrace -e -o $1.eps $1.bmp
rm $1.png $1.bmp

which takes a file name prefix and a string to be encoded. To generate a large set of QR codes with this script, simply create a file with file prefix-URL(or whatever data is to be encoded) pairs, each on a separate line,

google http://www.google.com
amazon http://www.amazon.com
reddit http://www.reddit.com

and then loop over this file, line-by-line:

while read line ; do ./create-qr-code.sh $line ; done < list.text

which conveniently gives us google.eps, amazon.eps, and reddit.eps files for their respective URLs.

If there is uncertainty that your URLs are good (i.e. don’t kick back 404s), then you can augment the above script with this nice curl snippet (courtesy of this post on SO):

set -e
curl -s --head $2 | head -n 1 | grep "HTTP/1.[01] [23].." > /dev/null
if [ $? -eq 0 ] ; then
    qrencode -o $1.png -s 50 "$2"
    convert $1.png $1.bmp
    potrace -e -o $1.eps $1.bmp
    rm $1.png $1.bmp
    echo "URL error: $2" 1>&2

This will let you know which URLs don’t come back with clean headers so you can give them further attention. It won’t capture everything that might go wrong, but it does give you a programmatic way to verify that all is well.

Incidentally, all the tools used here can be installed on Arch with

pacman -S qrencode potrace imagemagick curl

Not exactly the prettiest shell glue, but it certainly beats slowly copy & pasting in and out of a browser.

2012 0003 0020

Flirting with the Front End

Every few years, when I’m not teaching introductory web programming, I revisit front end development, oftentimes in the form of retooling my site. Last time, it was a Wordpress-driven theme with cobbled-together JavaScript snippets for random bits of functionality:

Old Site

Serviceable, at least.

Before this, I used a generic Wordpress theme, the specifics of which I don’t recall. Rolling back all the way to the mid-90s, I had a fortunecity site, which was–as typical of sites in the 90s–equal parts bland and garish:

Old Site

Yes, it had a Christmas theme for the title page. And yes, the header, navigation bar, and footer (on individual pages) are all java applets. Not exactly, a usability panacea.

And now, I’ve transitioned to Jekyll, for a few reasons:

  • It’s hard to get faster than static pages for serving content.
  • Github can handle more traffic than the shared hosting I was using previously.
  • A Jekyll deploy on github can’t use external plugins. Which is, by most accounts, a downside, but it forces me to find front-end solutions for what I want rather than turning to the back end for everything.
  • I wanted to build everything from scratch. The limited Liquid DSL used by Jekyll is leaner than full-blown PHP, and more satisfying for building from the ground-up (all my Wordpress themes started from–at minimum–a skeleton theme, just to cover the essentials needed by Wordpress).
  • Having everything in a git repo is both satisfying for my current work flow and avoids the pain of database backups.

So here it is. I avoided jQuery (convenient as it is) to keep things lean and loading quickly, and rampantly bludgeoned the site with HTML5/CSS3 without much regard for backwards compatibility. To further optimize queries, I used Liquid includes to aggregate all the js and css into single files. For JavaScript:

(function () {
    "use strict";
{% include cookies.js %}
{% include mini-clock.js %}
{% include check-time.js %}
{% include event-handler.js %}

you can wrap everything with "use strict" to get some extra exception goodness. Doing this may cause JSLint to complain about indentation issues, and if you don’t add event handlers with JavaScript (e.g. you use the onclick or onload events in your HTML tags), you may run into scope issues as well. All of this together provided a nearly 20-point speed bump on Google page speed.

I opted for a dual-themed site, determined by the time of day. The clock drawn in the HTML5 Canvas element in the upper-left shows when the transition will occur, or you can override it with the icon next to the clock.

All in all, a good transition so far.