2013 0002 0017

Laptops are the Stenotypes of Software Engineers

Increasingly, I’ve been asked variants on the question “what will happen to desktops/laptops,” particularly in light of the proliferation of smartphones and tablets. This has resulted in several good conversations, and I’ve begun to use the following analogy when discussing this with colleagues in non-engineering disciplines:

Laptop computers will become the stenotypes of software engineers.

The stenotype is a niche device used by stenographers (most prominently, court reporters) to transcribe dialog in real-time at blindingly-fast words-per-minute rates. Fellow Emacs users might appreciate how the stenotype works: instead of typing single letters, multiple keys are “chorded” together to allow many more combinations with many fewer keys. And instead of producing single letters, many of these combinations produce syllables or whole words. These physical optimizations are coupled with conventions among stenographers to wantonly omit or abbreviate words where there is little ambiguity of meaning, which further improves efficiency—the fewer characters put to a page, the less chance of typos. This leads to the output of the stenotype being difficult to read by those who are not well-versed in the conventions (called “theories”) used by stenographers.

One could describe the typical QWERTY keyboard as the antithesis of the stenotype. Designed to reduce jams in the old typewriting systems—a constraint clearly nonexistent in modern hardware—mainstream keyboards are considered more accessible compared with a stenotype. However, even considering later iterations, such as the Dvorak layout, these keyboards cannot hope to match the ruthless efficiency of a stenotype when used in real-time transcription work.

Few would reach for a stenotype to write a letter to their mother (unless she was a stenographer herself!), and no trained court reporter would care to wonder into the courtroom with a QWERTY keyboard. Despite digital recorders making inroads into court reporting and closed caption transcription, stenotypes are still available (hint: ebay), persisting thanks to an ingrained base of stenographers who remain ruthlessly efficient at these highly-specialized tasks.

Which brings us back to computing.

Software engineers are not your average user. We don’t have an average computing workload and have a completely disparate set of tools and conventions. Dell didn’t even try to lampshade this fact with their Sputnik Ubuntu laptop aimed squarely at developers. And, while tablets have proven to be capable of everything from typical computing tasks to basic software development,

If it doesn’t have a keyboard, I feel that my thoughts are being forced out through a straw.
—Joey Hess

The immanent death of the laptop is greatly exaggerated; after all, the stenotype lives on to this day. However, the fate of the laptop as we know it—available in every imaginable color, style, shape, size, and brandishing shiny logos to reinforce its reputation as a status symbol—is less certain. This fact was made more stark for me when I realized that installing Android x86 on my Eee PC gave it more in common with modern computing platforms than my primary development laptop (Thinkpad x120e with Arch Linux and the minimal xmonad window manager). And I know I’m not the only developer who has, consciously or unconsciously, increased the specialization of my desktops/laptops while using smartphones/tablets for non-work related activities. Who really wants to pull out and boot up a laptop for light internet reading when you’ve got an instant-on smartphone or tablet within reach? I’m becoming more convinced that this is what the real “death” of the laptop looks like.

For software engineers, the role of laptops (and desktops even moreso) is slowly morphing into one similar to that of stenotypes. It’s arguable that this is inevitable: artists don’t use college-rule paper, chemists don’t conduct titrations in a coffee mug, and firefighters don’t roll up in a corolla to put out house fires. Professions evolve better and more efficient methodology, and when professionals outgrow the prevailing tools available to consumers, they develop new tools. This has already been well underway on the software side—you won’t find lawyers writing briefs with vim, after all.

If consumer computing devices become more recalcitrant for software engineers, it makes sense that a professional-grade tool should fill in the gap. The most natural candidate is the form factors we already have: laptops and desktops. But that certainly doesn’t limit innovation to current devices—a positive side effect of this “death” of laptops is that it provides a great opportunity to rethink what sort of device would benefit developers most, without being strictly constrained by a 1980s design that was intended for general computing. And this prospect may not be as far away as it might seem, given the recent rise of hardware startups.

The next time you pull out a laptop in a coffee shop, I don’t anticipate you’ll get the same quizzical looks you might receive if you brought a stenotype with you. But, like the stenotype, I do think that the proliferation of tablets and other more consumer-oriented devices will necessitate a professional class of devices that are less common and more specialized. And in the meantime, that might mean the stylishness of laptops will begin to wane. I’m okay with that.

[Incidentally, you can turn a conventional keyboard into a stenotype-like device with Plover, an open-source stenotype software package.]

2013 0001 0029

Producing LaTeX from NumPy Arrays

For my comprehensive exam, I needed to quickly convert some NumPy arrays into nice-looking LaTeX array elements. The TeX Stack Exchange site has a good answer for tabular environments, but wasn’t quite suited to the array environment. The usual answer here would be Pweave but, being short on time, I ended up rolling my own function instead:

def to_latex(a,label='A'):
    sys.stdout.write('\[ '
                     + label
                     + ' = \\left| \\begin{array}{' 
                     + ('c'*a.shape[1]) 
                     + '}\n' )
    for r in a:
        sys.stdout.write(str(r[0]))
        for c in r[1:]:
            sys.stdout.write(' & '+str(c))
        sys.stdout.write('\\\\\n')
    sys.stdout.write('\\end{array} \\right| \]\n')    

Here’s an incomplete snippet of it in action, where I convolve an array t with four different filters, producing a latex formula for each result:

filters = (('A \\oplus H_1',h1)
           , ('A \\oplus H_2',h2)
           , ('A \\oplus H_3',h3)
           , ('A \\oplus H_4',h4))

for label,f in filters:
    t2 = scipy.signal.convolve(t,f,'same')
    to_latex(t2.astype('uint8'),label=label)

I’ll likely get around to expanding this into a full package sometime in the future, since there’s a lot that is hard coded (the \[ \] environment, stringification of the array, the fact that all columns are centered, etc.). A gist of the function is available here.

2012 0011 0024

pythonbrew+opencv+debian

There are a number of ways to go about building a modern development environment for scientific computing and computer vision in python. If you’re used to developing on bleeding-edge, however, the latest debian stable makes it a chore to get started with the latest and greatest. It ships with python2.6 instead of 2.7, and opencv is notoriously out of date in a number of distributions, debian included. I typically use Arch, but the server-class machines I have access to were running debian, so I had to bootstrap my setup into this environment.

Challenge accepted.

Thankfully, pythonbrew (or pythonz) comes to the rescue by making it easy to handle multiple pythons for a single account (without having to install them system-wide) as well as providing wrappers around virtualenv. However, not everything is rosy. The python you choose has to be built with shared libraries if you want to install opencv later:

pythonbrew install --configure="--enable-shared" 2.7.3 

After this, you can bootstrap a virtualenv as usual

pythonbrew venv init
pythonbrew venv create debian
pythonbrew venv use debian

and install any requisite stuff you might need (minimum numpy/scipy)

pip install numpy
pip install scipy
pip install pymorph
pip install matplotlib
pip install distutils

Unfortunately, there’s no such pip package for opencv. Thankfully, the debian installation guide isn’t too far out of date, and many of the listed packages to apt-get are still relevant.

wget http://downloads.sourceforge.net/project/opencvlibrary/opencv-unix/2.4.3/OpenCV-2.4.3.tar.bz2
tar xjvf OpenCV-2.4.3.tar.bz2
cd OpenCV-2.4.3
mkdir {build,release}
cd release

At this point, we need to delve into where pythonbrew puts all its related files to configure opencv correctly. First, your installed python will be available in one of two places (here python 2.7.3 is used as an example):

~/.pythonbrew/venvs/Python-2.7.3/{venv-name}/bin/python
~/.pythonbrew/pythons/Python-2.7.3/bin/python

All virtualenvs based on a particular version of python will have a copy of that python binary for use in their own isolated environment. In addition, the virtualenv has an include directory that you should use, since all your additional packages installed into the virtualenv will place their headers in this directory:

~/.pythonbrew/venvs/Python-2.7.3/{venv-name}/include/python2.7

The hitch, however, is that the virtualenv does not have a copy/symlink of the shared library we specifically built when first compiling python using pythonbrew, unlike a typical native python install. This means that cmake’s approach to locate this library will fail. Thus we must point opencv to this

~/.pythonbrew/pythons/Python-2.7.3/lib/libpython2.7.so

for it to build corectly.

Speaking of cmake, there is a bug in the cmake included in debian that prevents it from building opencv correctly. I was lazy and simply grabbed a binary of the latest cmake,

wget http://www.cmake.org/files/v2.8/cmake-2.8.9-Linux-i386.tar.gz

which worked on my debian build, but it’s better to compile it if you plan to continue using it for more than a one-off build.

Finally, understanding opencv’s cmake flags is important for getting everything stitched together:

PYTHON_EXECUTABLE=~/.pythonbrew/venvs/Python-2.7.3/{venv-name}/bin/python
PYTHON_INCLUDE_DIR=~/.pythonbrew/venvs/Python-2.7.3/debian/include/python2.7
PYTHON_LIBRARY=~/.pythonbrew/pythons/Python-2.7.3/lib/libpython2.7.so

Additionally, if you find that numpy isn’t autodetected, you can specify

PYTHON_NUMPY_INCLUDE_DIR=~/.pythonbrew/venvs/Python-2.7.3/debian/lib/python2.7/site-packages/numpy/core/include

You can also specify your virtualenv path to install the python libraries

PYTHON_PACKAGES_PATH=~/.pythonbrew/venvs/Python-2.7.3/{venv-name}/lib/python2.7/site-packages

or just symlink/copy the resulting cv2.so and cv.py files there later.

Putting it all together, I used this command to generate the makefile which compiles correctly against pythonbrew’s python (where debian is my virtualenv name):

~/cmake-2.8.9-Linux-i386/bin/cmake \
-D CMAKE_INSTALL_PREFIX=../build \
-D BUILD_NEW_PYTHON_SUPPORT=ON \
-D BUILD_PYTHON_SUPPORT=ON \
-D BUILD_EXAMPLES=OFF \
-D PYTHON_EXECUTABLE=~/.pythonbrew/venvs/Python-2.7.3/debian/bin/python \
-D PYTHON_INCLUDE_DIR=~/.pythonbrew/venvs/Python-2.7.3/debian/include/python2.7 \
-D PYTHON_LIBRARY=~/.pythonbrew/pythons/Python-2.7.3/lib/libpython2.7.so \
-D PYTHON_NUMPY_INCLUDE_DIR=~/.pythonbrew/venvs/Python-2.7.3/debian/lib/python2.7/site-packages/numpy/core/include \
-D PYTHON_PACKAGES_PATH=~/.pythonbrew/venvs/Python-2.7.3/debian/lib/python2.7/site-packages \
../
make
make install

Depending on what you’re doing, there may be other tricks with LD_LIBRARY_PATH to make specific things work, but your pythonbrewed python should be primed to access opencv from here.

2012 0009 0015

Anatomy of a Chrome Extension

I launched nonpartisan.me a few weeks back, which exists primarily in the form of a Google Chrome extension (there’s a Firefox add-on too). Since I released it with all of the source, this makes it a great time to dissect the (very simple) code. As you will notice from the site and the small bit of press it picked up, nonpartisan.me has a very simple premise: filter out political keywords from the various newsfeeds (specifically Facebook, Twitter, and Google+).

This was my first attempt at a Chrome extension, and it’s surprisingly straightforward. All such extensions require a manifest.json, which looks like this for nonpartisan.me:

{
    "name"             : "nonpartisan.me",
    "version"          : "0.2.1",
    "manifest_version" : 2,
    "description"      : "Removes partisanship from your news feeds",
    "icons"            : { "16": "icon16.png",
                           "48": "icon48.png",
                          "128": "icon128.png" },
    "homepage_url"     : "http://nonpartisan.me",
    "page_action"      : {"default_icon" : "icon48.png",
                          "default_title": "nonpartisan'ed" },
    "permissions"      : ["tabs",
                          "http://www.facebook.com/",
                          "http://www.twitter.com/",
                          "http://plus.google.com/"],
    "options_page"     : "options.html",
    "content_scripts"  : [
    {
        "matches": ["*://*.facebook.com/*"],
        "js"     : ["jquery.js","common.js","fb.js","nonpartisan.js"],
        "run_at" : "document_end"
    },
    {
        "matches": ["*://twitter.com/*"],
        "js"     : ["jquery.js","common.js","tw.js","nonpartisan.js"],
        "run_at" : "document_end"
    },
    {
        "matches": ["*://plus.google.com/*"],
        "js"     : ["jquery.js","common.js","gp.js","nonpartisan.js"],
        "run_at" : "document_end"
    }],
    "background": {"scripts"   : ["common.js","background.js"],
                   "persistent": false }
}

The real meat here is content_scripts, which lists the javascript we wish to trigger after a page is loaded, greasemonkey-style. A particularly nice feature of content scripts are that they work in an isolated environment separate from any javascript that the page itself may include. Thus we can add jquery to the list of javascript that is run without fear of clashing with a page’s global namespace.

You can think of every element in the "js" array as a separate <script> tag in an HTML page, so the files are loaded in the given order, all into a single namespace. Rather clumsily, I chose to simply put a callback module (which is called plugin here) in the individual fb.js, tw.js, and gp.js files which is then used by the core component, nonpartisan.js, as a simple means of avoiding any hard-coded per-site values in the actual filtering code.

With this, and the pseudo-regex "matches" field that specifies which pages trigger the content script, we can run arbitrary code on websites we specify. For nonpartisan.me, the filtering code looks like this:

"use strict";
var nonpartisan = function(plugin) {

    function nonpartisan (watch,parent,keywords) {
        function kill (parent,removeList){
            $(parent).each(function () {
                var el = $(this);
                if(el.css('display') !== 'none') {
                    el.find('*').each(function () {
                        var toCheck = $(this).text().toLowerCase();
                        if(toCheck.length > 0 &&
                           (removeList.some(function (value) {
                               return (toCheck.search("\\b"+value.toLowerCase()+"\\b") >=0);
                           }))
                          ) {
                            el.css({'display':'none'});
                            return false;
                        }
                    });
                }
            });
        }

        if($(parent) && $(watch)) {
            var numChildren = $(parent).children().length;
            setInterval(function () {
                var newNumChildren = $(parent).children().length;
                if(numChildren !== newNumChildren) {
                    kill(parent,keywords);
                    numChildren = newNumChildren;
                }
            },
                        500);
            kill(parent,keywords);
        }
    }

    // get parameters from plugin and trigger nonpartisan() here...

}(plugin);

The first chunk–the kill function–works as advertised: given a parent element and a set of keywords, the function iterates over every child element and determines if any of the nested elements within (i.e. el.find('*')) contains any of the keywords. Instead of deleting DOM nodes, which may break the page’s own javascript (I discovered this the hard way), it’s easier to instead call el.css({'display':'none}); to simply hide unwanted elements. For efficiency, the forEach terminates as soon any any nested child returns a match, potentially saving a small amount of needless searching.

The second chunk starts a timer (if indeed the parent is even found on the current page) that checks if the number of children of the parent element has changed and, if so, re-triggers the filtering process to determine if there are any new children to be hidden. This helps handle AJAX-driven sites, like the “infinite scrolling” facebook newsfeed, which may mutate the DOM at any time. Both of these functions are wrapped up into another easy-to-call function inside of the high-level nonpartisan module.

And that really is all there is to a typical greasemonkey-like Chrome extension, but that’s certainly not the end of what a complete and helpful extension can provide. The trickier bit is persisting configuration options. The downside of sandboxing content scripts is that they exist in a transient execution context, meaning there’s no localStorage to persist program options. The details of the plumbing used to kick-off the process and handle options were omitted from the above snippet, so we’ll dig more into this now to illustrate how to handle persistent options.

Chrome provides a nice solution to the problem of not having localStorage available to content scripts by providing a background script which does have its own localStorage, which it can transmit to a content script via the chrome.extension.onMessage listener. We can then fill in the omitted component of the above snippet with:

chrome.extension.sendMessage({method: "config"}, function (response) {
    if(!response.sites[plugin.site]) return;
    var l = response.filter;
    if(l && l.length>0) {
        plugin.cb(l,nonpartisan);
    }
    // get default values from common.js
    else {
        l = [];
        for(var index in choices) {
            l = l.concat(choices[index]);
        }
        plugin.cb(l,nonpartisan);
    }
});

This sends a message, requesting "config" from the background.js script, which returns, among other things, the list of keywords we wish to filter. This list was saved in localStorage in background.js’s execution context. Recall that plugin is the module that specifies the particular settings for the page being filtered. Thus we pass along the list of words to filter and the nonpartisan() callback function to the plugin module, and it subsequently executes nonpartisan() on the appropriate elements on the DOM. The background.js file used in nonpartisan.me is a bit more involved, but it nonetheless essentially acts as a broker, converting Chrome’s internal message-passing API calls to localStorage requests.

Of course, there’s only so much utility to be gained from localStorage without supplying the user with the ability to configure the various options that may be saved in therein. This is done by a typical html page, specified by "options_page". Since there’s not much magic there–it’s just a plain html page with enough javascript to persist the settings–I will omit the gory details, which you can poke around yourself in the repository, if you’re so inclined.

So that’s an extension. Writing the above was literally a matter of minutes and some quality time with the Chrome API specifications. As is always the case (especially when I’m working outside of my area of expertise, say with making the amateurish logo), the real work is doing the little bits of spit-and-polish to handle the various configuration options, throwing together the webpage, creating the icons and promotional images for the Chrome Web Store, etc. But it’s still good to know that the Chrome team has made the extension-building process as simple and well documented as they have.

2012 0008 0011

Poor Man's LDAP

In addition to being a researcher and backend web developer, I’ve also worn the system administrator hat for a number of years. While the likes of LDAP, Active Directory, NIS, and their ilk can work quite well for managing medium-to-large networks, I’ve more often been tasked with managing small-scale (< 20 machines) heterogeneous Linux networks, where deploying LDAP with full Kerberos authentication would be overkill. Typical requirements I’ve encountered in small lab settings are simple user account and home folder sharing, and (relatively) similar package installations.

With this in mind, I did what probably every sysadmin in the same situation would do: scrape together a simple set of scripts to handle basic file synchronization for me. Specifically, I noticed two prevalent requirements among config files being synced:

  • machines and/or distros have a common header or footer that must be included (e.g., a list of system users in /etc/passwd), and

  • specific machines (e.g., servers) shouldn’t have some files synced with the rest of the machines (e.g., file shares might be different on a server).

Thus, Poor Man's LDAP was born.

While nothing more than a collection of scripts–no different than what many other sysadmins have implemented, in all likelihood–they will hopefully be of use for those who, like me, are graduate students or otherwise non-full-time sysadmins that don’t have time to do things the “right” way.

I’m dogfooding pmldap on my research lab’s network, where we (currently) have 5 Fedora machines (various versions between 10 and 16) and 5 Debian machines (all on stable). Since my recent patch, pmldap now supports groups, which are useful for running yum commands only on the Fedora machines and apt commands on only the Debian boxes. Files being synchronized include: fstab, group, hosts, hosts.allow, hosts.deny, passwd, shadow, and sudoers.

Also in the repo are a few convenience tools that I’ve found useful:

  • authorize-machine bootstraps a machine by setting up ssh keys

  • setup bootstraps config files from a remote machine so they can be merged with the desired additions

  • cmd runs an arbitrary command on all machines (or a particular group of machines)

  • useradd is a feature-incomplete reimplementation of the native useradd command that works on local passwd, shadow, and group files to add new users that can later be synchronized across the network

Since I hadn’t stumbled across something of this scope to fit the small-scale-network use case, I’m hopeful that pmldap will be of use to anyone in a similar situation.

You’ll find it on gitub here.