Index | Archives | Atom Feed | RSS Feed

Introducing Vim

Vim Example

So you want to learn the way of vim? What have you heard? The learning curve is high; you’ll need to type obscure commands just to quit? There is learning ahead, true, but it’s not like that at all. Frankly, you can learn to use vim in 20 minutes with a tutor you already have on your system.

Try it out:

$ vimtutor

In a week you’ll surpass your speed in your old editor (Disclaimer: any bashing of other editors here does not mean to include emacs. Emacs is also awesome and you should learn it too.) It’s not really difficult at all, but it foreign.

What makes vim so different? Modes.


Vim is a modal editor. When you think of modals you probably imagine pop-up menus in a web page. That’s not wrong, per se, but it misses the point. For an idea of what I mean by modes, instead think of a car. If your car is in Drive, when you push the gas you’ll find yourself going forward. If your car is in Reverse the opposite will happen. If you’re in Park, you’ll get a cool revving sound that will make you look cool to 12 year olds. These are modes. You perform an action and based on the mode of your system (in this case a car), different things happen.

User Experience professionals and Human-Computer Interaction experts have been bashing mode-based interfaces for years. They love to explain how modes are the cause of confusion and errors, and unnecessary complexity. Perhaps they’re right, or perhaps people find modal systems difficult because they’re unfamiliar. Whatever the case, you’re about to take a journey that will give you a blazing, clear example of a modal system working well.

When you first open vim, you’ll find yourself in Normal Mode. Drew Neil describes this in his fabulous book, Practical Vim, as being akin to an artist with his paint brush hovering over a canvas. When you are painting, you don’t spend the majority of your time with your brush making strokes. A good painter plans his strokes, choosing carefully where to put down his brush before making a careful and tactical mark.

Putting the brush to the page is akin to Insert Mode in vim. You get what you type, for the most part. You can enter Insert mode from Normal mode by typing the i key.

There are other modes in vim too, like Visual Mode, Visual-Block Mode, Select Mode, Command-Line Mode, and Ex Mode. For now lets just focus on Normal and Insert, though.


Normal mode allows you to move quickly across your document, skipping around by paragraphs, code blocks, jumping to function definitions, visually to the middle or bottom of the screen, and pretty much anywhere else you want to go in a matter of a few keystrokes. No, seriously, I’m not exaggerating here. Here’s a great diagram covering the movement keys available to you in vim without any special plugins or configuration.

Vim Movement Commands

Most commands in vim, whether they be actions or movement, have a mnemonic device to help you remember. A number of these are listed in the graphic above.


So what about when you want to do something? Delete something, copy to the clipboard, replace a string? These are the verbs of vim and their mnemonics are strong as well.

  • d = delete
  • c = change
  • y = yank (like copy, but from a time before copy/paste existed)
  • p = put (like paste)
  • v = visual (select)

There are a few more, but in day-to-day usage, these will get 99% of your work done. Not a whole lot to memorize, is it?

If you want to delete something, press d. But wait… what are you deleting? The power of vim starts to become apparent when you realize that the d command on its own does nothing at all. In order to complete the verb, we need to tell it where to apply it.

Think of this combo as “Verb -> Movement”, and it is the introduction to composition.


All of those fancy movement keys you learned earlier aren’t just for jumping around Normal mode. They’re also part of every action you’ll take. I think this is best illustrated with an example.

If we want to delete a word we add the delete command with the movement command for a single word: d and w -> dw. If you want to yank everything to the bottom of the current paragraph, following the same pattern you would get: y}.

Vim Composition

In vim you can see the keystrokes being entered in the lower right corner of the screen. This is related to something called Operator Pending Mode, but for our purposes it means you can get a sense of the commands I’m entering to accomplish the text manipulations seen here.


Verb -> Movement” is a powerful thing, and it can be made more so by understanding the benefits of this encapsulation. If a full command is considered one of these complete compositions, then we can operate on the whole piece as well.

You can enter a number prior to your verb and the entire command will be repeated that number of times. Also, you can repeat the last command, whatever that composition might have been, by pressing the . key. Finally, your undo levels are defined by your commands as well. Each time you press u it will undo the last full composition.


There’s a lot to vim. It is a tool you can keep learning for years and years and never fully master. It is also easily accessible to new learners. It has a fantastic support group (#vim on for starters), tons of tutorials, guides, books, etc. It is quick to pick up the basics. It is installed on almost every system in the world, and easily added on the few (Windows) that it’s missing from. It is an investment that will pay for itself immediately, and continue to benefit you for as long as you write.

Now, if you haven’t already, go launch vimtutor and start vimming.

Have questions about vim or want more guidance? Disqus below!

The gifs included in this post were generated with imagemagick, ttyrec, and sugyan‘s version of ttygif. Here’s the script I put together to make things easier.


# This will record a vim session with no .vimrc and create vim.gif.
# The scripts are designed to run on default sized term window.
# The cropping is sized for iTerm, so you may need to tweak it.
# Dependencies
# brew (or apt-get) install ttyrec imagemagick
# ttygif -

# Record the vim session with no .vimrc
ttyrec -e "vim -u NONE"

# Convert session to gif

# Crop gif and get rid of the ugly header
convert tty.gif -coalesce -repage 0x0 -crop 570x354+0+22 +repage vim.gif

# cleanup
rm tty.gif
rm ttyrecord

For Parents

My son was born in the summer of 2012, capping off one of my most intense years. He was a month early, arriving the very night after being blessed in the womb by over one hundred Jesuit priests. Despite not having the crib assembled, we weren’t completely caught off guard. For one thing, my camera was charged and ready to go.

Photos of the kid completely took over my Flickr account. Where once you might find artistic shots of shadows across empty parks, now there was a chubby faced little boy. It’s one of the many wonders of being a new parent. You take thousands of pictures that most folks will only glance at out of politeness, but still there is an enormous pride and joy in seeing them for yourself.

The number one audience for my son’s pictures is our parents. Grandparents love pictures of their grandkids. It’s their God-given right to see them as much as possible and to proudly tout them to their friends. This is fairly simple if one’s parents are local, or if you send prints, but what happens when your family isn’t all that savvy at browsing Flickr?

I’m a developer and a developer solves problems. Here was a very simple issue:

My parents need a very simple interface to see pictures of my son. They only care about pictures of him, not my entire collection. They may look at these pictures on their phones or on the computer. The site should update automatically and involve as little work as possible by me to maintain it.

That’s not a hard solve when you get right down to it. What I decided to do was pull in my photos of my son from Flickr using their API and display the results in a single column infinite-scrolling website. For videos, I host my content on Vimeo, which I feel has a much better quality than YouTube and sees far fewer comment trolls. Here is the final site.

Toggling between pictures and videos is done with the icon in the top left corner. Each page follows the same rules: responsive display of content, scrolling to see more. Finally, in the header of the home page I added a tiny bit of JavaScript to display my boy’s current age in either weeks, months, or years (appropriately for his total age).

There were some challenges, especially with Flickr. First of all, when querying a set or album in Flickr’s API, the result is nice enough to give you the id of the image, but not the URL for the actual image source. Therefore, you have to waste two API calls before you can get the image. Knowing which size you’re getting is a little tricky too. I decided to handle this in Python, which has a really friendly Flickr API library and is frankly a joy to write in.

flickr = flickrapi.FlickrAPI(config.get_api_key(), config.get_api_secret())
set = flickr.walk_set(config.get_set_id(), 500)

Getting a set is a pretty simple affair. It’s backwards, of course, because nothing is easy.

rev_set = list(set)[::-1]

But python makes working with lists early.

In my Python script, I decided I’d go ahead and cache the direct URLs for the images that are requested into a local sqlite3 database. Instead of wasting a ton of API calls for the same thing again and again, I can do a quick lookup in the table to see if I have the info already, then just request the new bits. Doing this server-side means only the first person to access the list will make the request and store it. It was surprisingly easy to set up in Python as well.

con = lite.connect('web.db')
cur = con.cursor()

Getting Python to run on the server is a little pain in the butt, too, but I won’t get into that here. Suffice to say, once I can properly format and respond back with a JSON feed of the images I want to display, jQuery can take that and put it to work.

function getPage ( page ) {
    $.getJSON ('' + page, function (data) {
        var photos =;
        for (var i=0; i < photos.length; ++i ) {
            var photo = photos[i];
            var img = $('<img></img>');
            var ahref= $('<a></a>');
            img.attr('src', photo.source );
            img.attr('alt', photo.photo_title);
            ahref.attr('href', '' + photo.photo_owner + '/' + photo.photo_id + '/');
            ahref.attr('data-label', photo.photo_title);
        if (!isScrollListenerActive) {
            setTimeout( addScrollListener, 2000 );

I’ve wrapped this method up with a faux pagination implementation. Adding a really simple scroll listener allows me to detect if we’ve scrolled to the bottom of the display. If so, bam, next page loads. Infinite scrolling in 6 or 7 lines of code.

That’s really all there is to it. The photo site loads quickly and works across devices. I bookmarked it on the home screen of my mom’s cell phone and now she can see the latest pictures any time she wants.

The video portion of the page is even easier. Vimeo provides an RSS feed of albums, so getting that working in JavaScript was just a matter of converting XML to JSON.

class XMLtoJSON {
    public function Parse ($url) {

        $fileContents = file_get_contents($url);
        $fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
        $fileContents = trim(str_replace('"', "'", $fileContents));
        $simpleXML = simplexml_load_string($fileContents);
        $json = json_encode($simpleXML);
        return $json;

header('Content-Type: application/json');
print XMLtoJSON::Parse("");

I found the class portion of this code with a quick google search on converting XML to JSON. Toss in a header and output the results, done.

Vimeo does have one annoying quirk. When you make the request to embed the player on your page, you need to tell it the width of the video or it can be a little ridiculous.

var width = (window.innerWidth > 0) ? window.innerWidth : screen.width;
width = Math.min(640, width);
var url = '//' +encodeURIComponent(videoUrl) + '&callback=embedVideo&width=' + width;

Frankly there’s nothing innovative here. I used Python for half the site and PHP for the other. There’s no rhyme or reason to it, and my code isn’t particularly elegant. Still, it was a very effective solution to my little problem. That counts for something.

Vim in Context

I’ve mentioned before that I firmly love vim and use it for just about everything. Well, that’s probably underselling it.

I spent most of this Saturday taking my copious notes and materials that I use to write my book and moving them into a Markdown work-flow, powered by vim. At the heart of this move is an absolutely wonderful series of plugins written by Reed Esau. The most famous of these is vim-pencil (links to the others are at the bottom of his README) which has too many awesome writing enhancements to count here. With his deft hand at writing in Vimscript, he’s turned my code editor into my everything editor.

I’ve been using vim to author this blog and my other blog for some time, but writing prose in vim for anything longer than these posts had a bit of overhead to it. Vim-pencil and its sister plugins really eased things out. Still, taking on the writing of my book would bring the challenge to another level.

I had two immediate problems I needed to solve to make the book authoring via vim a reality. They were:

  1. A custom spelling dictionary for the vast number of made up words, places, and people.
  2. A tool to handle abbreviations for long, cumbersome names—some names having special characters as well.

These weren’t insurmountable, but they did take some effort to solve that I thought was worthy of a post. Both problems necessitated the same core ability: to define part of my vim configuration on a per-project basis.

Take the spelling dictionary for example. I have thousands of words that I’ve created from the gibberish of my mind and weaseled into my novel. I don’t want these words to show up in my blog posts, code, emails, or anything else without vim highlighting the misspellings and making suggestions. I don’t want to muddy up my whole environment with that spelling dictionary.


On the other side of that argument, by default the spelling dictionaries in vim are stored in your ~/.vim/ directory on the system you’re working on. This means that if I use the full ubiquity of vim—AKA use it on multiple systems—then my dictionary won’t be everywhere. Not a good solve.

Luckily others have paved the way forward for setting up a .vimrc solution on a per-project basis. I tried a few plugins and they all seem to work to some degree or another, but the most robust and solid plugin I found was Localvimrc.

Localvimrc allows me to create a .lvimrc file anywhere on my file system. If I edit a file, the plugin searches up the directory tree for any of those .lvimrc files and executes them in order (highest to lowest by default). There’s a number of configuration options, but my setup only required these:

" Local vimrc loading
let g:localvimrc_sandbox=0
let g:localvimrc_ask=0

No more prompting or security sandboxing for me! I’m happy to see those options enabled by default, but they’re not necessary in my setup.

So, now I have a .lvimrc file that will execute after my .vimrc setup is complete. This is good! It’s time to solve the problems.

Solution #1 - local project dictionary

In this code block, I read in the directory location of the current file (my .lvimrc) and I use that path and some vim options to create a new spelling dictionary in the same folder. I give it a dot-name so it’s hidden from view, and all is well.

" Spellfile to use project based file
let s:safespelllang = join(split(&spelllang, "_"), "-")
let s:safeencoding = join(split(&encoding, "_"), "-")
let s:spellfileurl = expand('<sfile>:p:h') . '/.proj.' . s:safespelllang . '.' . s:safeencoding . '.add'
let &spellfile = s:spellfileurl

Solution #2 - local project abbreviations

The native vim abbreviation tools are robust, but a bit wordy. Tim Pope’s Vim-Abolish plugin comes to the rescue here. It’s really a collection of three separate features, but one of them does exactly what I want. Here’s my setup:

"Abolish Setup in this project

" We'll use this location to save new abbreviations as
" they are added.
let g:abolish_save_file = expand('<sfile>:p')

" quit if no Abolish
if !exists(":Abolish")

" Setup abolish abbreviations below

In a similar solution to the dictionary, I’m using the location of the .lvimrc file in the file itself. In this case, I’m letting the vim-abolish plugin know that any new abbreviations I add on the fly should be saved to the end of my .lvimrc by default. Handy! Now I can set up any that I want to use and not even bother to open the file.

The best part of all of this is that my .lvimrc file and my project dictionary files are part of my git repository now. When I clone the project down on another machine, as long as my vim setup is the same, so is my writing environment.


Wearable Devices

With wearable devices on the rise, small screen and no-screen user experience is a hot topic. We’ve been witnessing powerhouses in industrial and technical design take on the problem over the last two years, but the market’s response continues to be lackluster. What is it about this format is holding us back?

On the one hand, some argue that the limited screen size means we can only show very little information at a time. It’s tantamount to wearing hundreds of dollars of technology for what amounts to a status-bar. Some tech enthusiasts make the plunge readily, but the price and limited offerings are clearly a barrier to most. Most recently Apple made a wild attempt to drive new use cases into their release by spelling out oddball new interactions. Their idea, it seems, is to overwhelm their audience with these nonsensical uses until they overlook the fact that the device is no more powerful than a fitbit glued to a swatch.

I empathize with that stance. Small screens do have innate problems finding useful interactions. Still, the cell phone had a similar issue as well before industry rose to meet the challenge. How long ago was it when playing a game with a friend required being face-to-face, or at least on a desktop? The immediacy of our interactions blended to meet the challenge of the interface quickly as ingenuity spread virally across the medium. That’s why I believe the real challenge to wearables is not in the size of their screen, but rather in the size of their input.

RSVP, or rapid serial visual presentation, gives users an alternate way of taking in text that is at once futuristic and suddenly familiar. By flashing one word quickly in place of another, our eyes pick up the flow of a sentence, paragraph, and so on automatically but without wasting all the time tracking from one place on a screen to another. RSVP has been around for years, but gained internet fame when Spritz introduced their demo a few months ago. Already their technology has found its way into smart watches, handheld applications and web plugins. Familiarity and ease-of-use combine quickly before our eyes as one of the key challenges of the small screen is tackled.

What then of input, though? Where are our tiny keyboards? Microsoft is trying pretty much anything they can think of at this point. We’ve had incredibly smart folks working on how to turn handwriting into text since the 80s (and maybe earlier). Will this finally be the push that makes it happen? Perhaps we’ll see a new type of keyboard emerge that meshes the all-popular Swipe with our small interfaces. One thing is abundantly clear. QWERTY is not getting it done.

We should not expect to see a keyboard interface as we know it on our watches or goggles. The challenge on mobile phones was enough to take us through the world of T9 and other predictive text implementations and out again into the world of quick thumbs. I really can’t say where we’re going to end up with any certainty, but there’s one part of the puzzle I find incredibly frustrating. Our best and brightest seem to feel that introducing a new interface or even a comepletely new input style is not a big deal for end users, and yet they ignore altogether one alternate input method that we’ve been using successfully for over a hundred years. Morse Code.

Dots and dashes. Tap and longer tap. The method is there, easy to use, and blindingly fast with the meagerest of practice. So maybe it’s a bit more intimidating than writing out each letter with your fingernail, but remember we don’t need to listen to it anymore, just input. The small screen and RSVP style feedback can show us the results in plain text. We just need to be able to put our thoughts down quickly, and without distractions. You don’t even need to look at your screen to use morse code. Does it need work? Likely. I wouldn’t want to put the same HAM radio style shortcuts and rigamarole into a consumer level interface. How about our smart friends at Samsung, Sony, Apple and so on task a few of their brilliant minds with making Morse for the Masses.


If you want to start a war between coders, just ask them to start describing their tools. Text editors, programming languages, and even platforms form entrenched camps of dispute. Even so, like the great religions of the world, we cannot but help proselytizing about the virtues of our one true way. This post is no different.

I’ve been coding a long time and I’ve had all that time to become embittered and crotchety about my development environments. I won’t lie, when I see someone coding in Textmate, I judge. Sure, there’s an element of jest, but I do it all the same.

Not all tools are created equally. This is my set up. It is the one true way. If you’re doing something else, may locusts descend upon your backups, and may your keyboards be sticky.

On a serious note, though: Using a good set of tools, whether hardware or software, can make an enormous difference in a developers speed, accuracy, and even happiness. I’ve come to these through a rather convoluted past, and they work really well for me. I will happily promote them to others, but I think it’s more important that you have tools than that you have my tools.

If you’re reading this and you’re a coder and you haven’t spent time really customizing your environment, setting things in just the right way and just the right place, and using just the right brushes; you need to stop what you’re doing and get to it. I don’t care how brilliant your mind is, if you’re writing in Notepad, or TextWrangler, then you’re not at your best.

The list below is a snapshot of what I’m doing now. I’ll give the reasons for each and try and highlight some of the benefits. I hope some of you will find this interesting and useful. Perhaps you’ll even find something here to add to your own collection. If you have questions or comments, lets Disqus.

Dvorak Simplified Keyboard

The Dvorak Simplified Keyboard is the first fundamental difference in my computer interface. For those unfamiliar, the Dvorak key-map rose in response to the outdated keyboard layout at the time, QWERTY. The history in short is this: QWERTY was designed in the age of typewriters, when speed led to jams. The letters used most often were spaced apart to avoid these mechanical problems. Unfortunately, while we’ve outgrown these problems their patch-work solution has remained. QWERTY became the de-facto standard, and its proponents have promulgated through the ages.

Dvorak was created a bit more scientifically, with a focus on speed and conservation of movement. The most commonly used keys (in English) were placed on the home row. The left hand home row contains all the vowels, for instance. Again, unfortunately, time was not kind to Dvorak. Like Betamax, it has lost out. Still, there are users, and quite a number of them. It is available on all major operating systems, and for those that know it, we cannot live without it.

Not only is my typing speed much greater than it was in QWERTY, the real reason for my adoption was more health related than designed to eek out those last few WPM. I had been beginning to develop repetitive stress injuries in my wrists from prolonged typing. Dvorak has all-but-cured that issue.

Now I wont lie. Switching from QWERTY to Dvorak was not easy. It took me at least a month to make the switch, during which time I was not working and my constant typing was not necessary. I also lost my ability to type in QWERTY as I developed the new layout. I know some folks have managed to hold on to both, but not me. I can type my name, common passwords, and that’s about it. Anything else requires me to hunt and peck.


At the heart of my operating systems is the UNIX philosophy, a tiny piece of wisdom:

Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can’t be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools.

In short, “do one thing, and do it well.”

With a UNIX based operating system, you gain the power of composition. You no longer just do a task, but have the power to chain them together, feeding the output of one thing into the input of another. Before you know it, little commands like sed, awk, grep, and so on become instruments of magic.

find "${SRC}" -type f -exec grep -H 'TODO:' {} \; 2> /dev/null | grep -v -e -e -e pre-commit | awk '{for (i=1; i<=NF-1; i++) $i = $(i+1); NF-=1; print}' | sed -e "s/.*TODO:[ ${TAB}]*//" | sed -e "s/^/- /" >> $TODO 2> /dev/null

Take the line above as an example. It’s one line of a script I’m using in a pre-commit hook for my latest front-end web boilerplate. Whenever I commit code to my repo, this spiders my source directory and finds any TODO comments I’ve littered throughout the code. It parses them and returns a markdown formatted list of them, which the script then outputs to a README file inside the repository. With some basic use of built-in utilities I can compose a sophisticated script that keeps and up-to-date TODO list for my active projects. How neat is that?

Windows is getting better at this sort of thing through projects like cygwin. Apple means nothing to me, but their decision to buy out NeXT and use it to create OSX was fantastic. That was the game changer that saved the operating system, and it’s the only reason I use their products now. Build a consumer friendly UI on top of the UNIX philosophy and you combine ease of use with true power.


Running OSX or Debian or Ubuntu or whatever is great, but there’s a lot of customization that can be done to make things more personal. The first step of that is your dotfiles.

These are my dotfiles.

I define my environments, common aliases, and even some helper functions. Git settings and shortcuts and my vim (coming soon) customization. I’m very proud of my dotfiles, from the organization and installation to my prompt.


The other half of my working OS is my collection of binfiles that I carry with me from machine to machine.

This is my bin repo. They tie in to my dotfiles quite closely. Some of these are handy things I use all the time, and others are extremely specific tasks I do for work that should never, ever be run unless you know what you’re doing. It’s like a fun minefield. Enjoy!


I work predominantly at the command line. I build my projects there, use source control, and—as you’ll see in a moment—do my development there. Sometimes it’s necessary to do more than one thing at a time. I could make a new tab, but there are better options. The best option I’ve found for session management is tmux. It’s the inheritor of the old screen program and it enables you to create sessions, windows and panes, jump around, re-size, and dance across your system with ease.

Right now I am in tmux writing this post. I am in the second window, first pane, of the session called “personal”. The pane to my right is running make devserver, a script that runs both a development webserver but also watches the file system for changes to this blog and re-compiles it as they happen. It is a part of Pelican, my blog platform, which I’ve written about in the past.

I have context to my activities, whether it be work or play. This is thanks to tmux. Like everything else, tmux customization is key.

Here is my tmux configuration. It’s a part of my dotfiles repo.


Finally we come to the most important part of my tool box, vim. If arguing developer tools can start a war, arguing with a vim (or emacs) user must signal the end of days.

If you don’t know what vim is, shame on you. Also, go read this explanation in six kilobytes. I couldn’t possibly do a better job than that.

Suffice it to say, vim is what makes my system work. The reason I can develop entirely in the console is because I have a fully featured IDE right there at the command line. I have more power at my fingertips without a mouse than pretty much anyone I’ve encountered in my career. Sublime Text 3 is a great editor. PHPStorm is a great editor. And yet they’re worthless next to vim (or emacs. Seriously… not gonna fight you guys).

I code in vim. I author in vim. I take notes in vim. I’ve done presentations in vim. I rebind my keys in first person shooters based on the HJKL navigation in vim. I play vimgolf. I’ve gotten on the high score board for it too.

The best thing I can say for vim is that it makes my desires transparent. I want to move this block of code to another area, done. I want to mark this particular word so I can jump back to it later, even from another file… done. I want to reverse every line of the file (why? no idea): That’s as easy as typing :g/^/m0.

Vim isn’t easy. It’s a power tool. If you haven’t bothered to learn a real editor yet, or if you’re just starting out your career, then do yourself a favor and master vim. I’m serious, it will change your life.

You can do pretty much anything in vim out of the box, but if you want to simplify some things or don’t want to code it yourself, there’s probably a great plugin that someone has made already to help you. I’d recommend hitting up VimAwesome to see what’s popular. I’ll call out a few of my favorites below as well.


Regular Expressions

This post is a HOWTO guide I wrote for my development team. I thought it would have some better sharability here.

Regular expressions, or regex, are a symbolic language that can define or identify a sequence of characters. This language can then be used to test, match, or replace a given body of text.

  • By test we mean it can evaluate if text is equivelent to the regex we defined. This is typically used for validation of things like email addresses, zip codes, phone numbers, and so on.

  • By match we mean it will evaluate parts of a body of text and return back the portion that matches our regex. We use this to parse text, grabbing the bits we want and discarding the rest.

  • By replace we refer to a combination of match and substitution. We match something, then replace the matched portion with new content, updating the original string.

Getting Started

In the examples below I will attempt to show you sample regular expressions in three line groupings. The top line represents the string we are attempting to test/match/replace. Our regular expressions will appear below it between the /.../ characters. Finally, the output of the operation will appear on lines starting with >. For example:

"Sample String"
> Output

Note: The > character is just there to show you the result. It isn’t part of the result itself.

You can go to this page to try out any of these regular expression examples or make up your own. Simply copy the string we are trying to match to the big section on the bottom.

Note: Don’t copy the string’s surrounding quotes. It may break later examples.

Then, copy your regex, or retype it onto the top line.

Note: When copying the regex, the surrounding slashes will disappear in the testing tool.

Basic Structure

Regular expressions vary by program, platform, and language, but not by very much. In the examples I’m going to teach you, you will see almost no difference if you are using these in Unix’s sed command or using them in a Windows copy of Excel.

A typical regex looks something like this:


Most of you are probably looking at that and seeing:


That’s where most people stop when it comes to regular expressions. They see the gibberish and say, “That’s way over my head.” I’m here to tell you that despite it looking complex, regex is actually extremely simple.

Regex, like most symbolic languages, treats each character as if were a whole word. When you learn the words (and there are not many), the giant string of gibberish becomes an elegantly simple sentence. In the example above, the sentence would read in English:

Look for an optional pound sign followed by exactly six characters that can be lowercase or capital A through F, or a digit 0 through 9.

This is a regular expression that matches a 24bit hexidecimal color value (think RGB). As you can see, writing the rules in regex was much simpler than doing so in English.

Now, lets take our example and break it apart into its components to see what each one does.

  • /.../ - The outer forward slashes donote the start and end of a regular expression. This is the format you’ll see in JavaScript, ActionScript, sed, vi, sublime, and many more. VBScript in Excel often uses "..." instead.
  • #? - In this case, the # sign means a literal pound sign. The question mark after it means that it is optional. If it’s not there, that’s ok too.
  • [...] - Everything between square brackets is summed up as a single character. This enables us to say, “the next character will be…” then lay out all the rules for it inside the brackets.
  • 0-9 - A number between 0 and 9, inclusive
  • A-F - A character between capital A and capital F
  • a-f - Lowercase is ok too
  • {6} - Whatever the last character rule was, it applies to exactly 6 characters

That is a fairly complex regular expression. If you didn’t follow along for everything, that’s ok. We’ll go over each rule in sequence in a moment. For now just try to understand that regex isn’t made up of sorcery. Each character has a rule, and if you learn them, that’s all there is to it.

Matching Literals

The simplest way to match something using Regex is to use a literal string.

For example:

"The quick brown fox jumps over the lazy dog."
> quick

"The quick brown fox jumps over the lazy dog."
> jum

"The quick brown fox jumps over the lazy dog."

When searching a string for a literal, we get back either the literal we just searched for (if found) or nothing (if not found). Normal alpha-numeric characters automatically are literals in most versions of regex. The Perl programming language is one exception, but none of you are using that, so I’ll just move on.

You can probably see that using regular expressions with literals isn’t very helpful. It works, but it doesn’t get you anything a normal find wouldn’t get. Still, it’s important to know because we can use literals with any of the other techniques you’re about to learn.

Optional, Zero or More, One or More

Wildcards are very common in search parameters. There are three types of wildcards in regex:

? - Optional. You’ve seen this before. It means that whatever character preceeded it is optional. We will match the string if it is there or not.

> pam

> am

* - Zero or more. This will match the string if there are 0 of the preceeding character or 5000. Any number is ok.

> aaaaah

> h

+ - One or more. Just like the asterix, but we require at least one character to match.

> aaaaah


See the subtle difference between * and +? Good!

Any character

Sometimes vagueness is a good thing. What if you want to match any character at all? In that case the . character is your friend. Unlike most literals, the . doesn’t just match a period, it matches any character at all.

> Sup

> Superman

> Supercalifragilisticexpialadocius

In our second example I’ve mixed the . character with the * wildcard we learned in the last section. This regular expression matches zero or more of any character!


Ok, now we’re getting to the core of regex. Remember those square brackets from our example in the beginning? Those were a set, and sets are what make regex amazing. They let us define a bunch of rules that all apply to a single character.

While we are in between brackets, all letters or special characters are interpretted as valid options for the single character that the bracket represents. For instance, if we wanted to match only even numbers we could write [02468]. If we wanted to match any lowercase letter we can cheat a bit and use a range like [a-z]. Or maybe a combination of the two like [a-z02468]. Lets see how they match in some examples:

"Pennsylvania 6-5000"
> 6

"Pennsylvania 6-5000"
> 6

"Pennsylvania 6-5000"
> 6-5000

"Pennsylvania 6-5000"
> Pennsylvania

Notice how in the second example we still only match a single six despite looking for bigger numbers. Regular expressions only find the first occurance of a match (by default) and the - prevented our regex from extending to the 5000. In the third example we accounted for the - character and were able to find the whole number section. Finally in the last example note how the space after Pennsylvania ended our match.

If we want to match special characters, spaces, and the like in our brackets, we need to escape them. That’s a term that means, prefix it with a backslash (\). By prefixing those characters, we tell the regex to treat it as a literal and not use it for its special purpose. To match all of our sample string using this method, we might write something like:

/[A-Za-z0-9\ \-]+/

Notice the first \ has a space after it. We can even escape empty spaces! Now all of these characters are considered valid matches, and we are looking for one or more of them.

Negative Sets

What if we wanted to match all characters except for one? That would be an enormous bracket, wouldn’t it? What if there were a shortcut? Regex solves this for us as well.

Introducing the ^ character! The caret serves two purposes in regular expressions depending on whether it is inside a square bracket or not. In this section we’ll just cover what happens when it is inside the brackets.

> abcdef

By putting the caret inside the square brackets as the very first character it means that anything in those brackets does NOT match. It is the complete opposite of a normal set. Handy!


Lets take a little break and review what you’ve learned.

  • Literals
    • Do this by just typing the chars, and using \ to escape regex symbols you want to match.
  • Any Character
    • Use the . (dot) to match any one char.
  • Sets
    • Use […] to make a set, including ranges of characters to match like [0-9]
  • Negative Sets
    • Put a ^ inside a set and it inverts: [^a-z].
  • Optional Modifier
    • Put a ? after a regex symbol, character, or set and it will make that thing optionally matched.
  • One or More
    • A + after a regex symbol, character, or set and it will match one-or-more of them.
  • Zero or More
    • A * after a regex symbol, character, or set and it will match zero-or-more of them.

Beginnings and Endings

Sometimes you want to match something at the very beginning or very end of a string. This happens a lot when testing for validation, but also when trying to grab the first or last word with a match. There are two characters responsible for this behavior, ^ and $.

Remember when I mentioned that the caret worked differently outside of a bracket? Here it is! The caret marks a search as applying only to the beginning of a string.

"Somewhere I have never travelled, gladly beyond"
> Some

"Somewhere I have never travelled, gladly beyond"

Despite travelled being a valid match, it is not at the beginning of the string. Therefore this regex returns nothing.

We can test the end of the string by putting the $ character at the end of our regular expression.

"Somewhere I have never travelled, gladly beyond"
> beyond

"Somewhere I have never travelled, gladly beyond"

Since travelled isn’t at the end of the string, it doesn’t match either.

You can use both of these together to test an entire string in its completion. This is very common with validation tests. Here is a simple zip-code validation example. We’re almost ready to build something like this ourselves!

/^[0-9]{5}([- \/]?[0-9]{4})?$/

This or That

Sometimes you need to match one thing or another. Maybe your string is valid if it ends in .com or .org. The | operator will do that for you.

In this example, we want to write a regular expression that will match either strings that are all alphabetical or all numeric, but not ones that do both. See if you can pick this regex apart into its pieces and follow along.


Here’s an example that will match either .com or .org:


Some special characters

You have quite a library of tools at your disposal and you can now accomplish most simple tasks with regex. This section aims to simplify some of those tasks by introducing a few special characters to make your lives easier.

\w - Word character. Matches any character that is alphanumeric or an underscore.

\d - Digit character. Matches any digit 0-9.

\s - Whitespace character. Matches spaces, tabs, or line breaks.

\W - NOT word character. Matches anything that isn’t a word character.

\D - NOT digit character. Matches anything not a digit character.

\S - NOT whitespace character. Matches anything not a whitespace character.

In addition to these shortcut characters, there are a few special characters that don’t have a specific keyboard representation. For these we use the special character syntax as well.

\t - Tab.

\n - New line.

\r - Carriage Return.

\xFF - A hexidecimal character represented by it’s code.

\\ - A backslash. You need to escape a backslash in order to test it because a single backslash is reserved for indicating the start of a special character. This rule follows for other special regex characters, such as: .+*?^$[]{}()|/

Specifying a number of characters

Occasionally you need an exact number of characters. In the case of zip codes, you either need 5 or 9 digits. If you need to specify the number of characters, curly braces will denote this.

/.{4}/ - will match any 4 characters

/\d{3|5}/ - will match any 3 or 5 digits


The last piece of regular expressions I want to cover is grouping. By wrapping all or part of your expressions in parentheses you can match not only the entire string, but smaller portions as well.

In a real world example from JavaScript, we have grabbed the css class names off of a button. The string we have looks like this:

button button_2 draggable index_15

We want to get the number off of the button_2 portion of this string. Lets start by searching for the first occurrance of one or more digits.

> 2

But what if those classes might not be in that order? What if index_15 happened to be first? We need to look more carefully for the right class name.

> button_2

This gets us the right class no matter what, but we have too much information. We only want the number, not the whole word.

> button_2, 2

By wrapping part of our regular expression in parentheses, that portion is returned as an additional match. In all of our previous examples we were getting back matches that were a list with only one item. Once we start adding grouping to our regex, those lists will grow. In javascript, this list is an Array, and we can easily grab the second item from it. Your various programs may find different ways of getting at these lists.

Global Searches

Regular expressions will capture only the first match by default. They can be configured to act globally, though, and return all matches in a string. This feature is supported by almost every implementation of regex, but often in different ways. The most common way is to append a g after the end of the regex.


Ignore Case

If you want your regex to ignore the case of characters it is matching, you can usually dictate that in a similar way to how you make the search global. Instead of appending a g to the regex, you should append an i.


Greedy vs Lazy Searches

*? - Matches 0 or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token.

+? - Matches 1 or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token.

Group without creating Capture Group

At times you may want to use the grouping feature of regular expressions but do not want it to return a new capture group. For instance, if you want to look for either the word “center” or “centre”, you might do something like this:

> center, er

But by using a special syntax in the group, you can omit the second capture group.

> center

Advanced Techniques

Positive Lookahead

Matches a group after your main expression without including it in the result.


Negative Lookahead

Specifies a group that can not match after your main expression (ie. if it matches, the result is discarded).


Positive Lookbehind

Matches a group before your main expression without including it in the result.

Note: Javascript cannot perform Lookbehinds.


Negative Lookbehind

Negative lookbehind. Specifies a group that can not match before your main expression (ie. if it matches, the result is discarded).

Note: Javascript cannot perform Lookbehinds.



This past weekend I finally shed my Wordpress blogs and moved into the world of static site publishing. This site, and my personal blog are now built using Pelican, a Python based static site generator.

What does that mean, exactly? Well, for one thing, it means I no longer have to worry about someone exploiting a vulnerability in my server-side code to run malicious code or take over my website. My blogs may not be very popular or have much appeal to them from that perspective, but it’s best to be safe anyway. Additionally, since the server no longer has the burden of compiling my web pages whenever they are requested, the site content serves much faster and with fewer cpu cycles. Since I use cloud hosting and pay based upon my traffic and CPU usage, this actually saves me money! (Not a lot, but some.)

The experience of migrating content from Wordpress to Pelican wasn’t very difficult. Setting up Disqus for comments was a breeze as well, though it did take a bit of vim work to convert the URLs to their new locations. All in all, it was about 4 hours work for both blogs, much of which was spent cleaning up small formatting errors in the generated files.

I’m really looking forward to building this blog from the command line going forward. Now that I’ve written this post (in markdown mind you) I can build, test, and publish it by typing:

make html
make serve
make rsync_upload

Open Source Science

Mark this one down in the list of cool ideas I’ll never follow through with.

Github meets Academic Publishing.”

Here’s the full idea: We create an open system for people to share their scientific studies by providing them with all the tools, visualizations, and data warehousing necessary to truly host the science. Then, the community can rate the project, duplicate the results, grow from it, or reference it in another work. The interconnectedness that’s already inherent in academic publishing becomes a network in itself.

There will obviously be studies that don’t measure up to the rigorous evaluation of peers and those that are above the heads of many folks. To solve this, we first invite a group of verified scientists. Who are these folks? They’re people that have been published in academic peer reviewed journals in the past. This status gives their opinions on their peers extra weight. What they say has vastly more influence than the average Joe. It’s not hopeless for the rest of the world, though. When a verified scientist rates another project highly, the authors of that work gain reputation. They in turn can raise the reputation of others they approve of. As you move farther from the verified folks, the effect is lessened.

As the system grows, so too can the list of verified scientists and their sphere’s of influence. Everyone can benefit from what we all recognize as good science, and all the results are free and open to the public.

Iteration ideas: teams, university/college connections, certificate or degrees to add to reputation, invite system for colleagues, bounties on challenging tasks/experiments, bounties on verification through independent duplication of results. Science-on-demand.

git changelist

Today I needed to get a list of all the files that had changed in a git repository over the last two weeks. I played around with some great git commands, awk and sort to make the following git alias (toss it in the [Alias] block of your .gitconfig):

changelist = "!git whatchanged --since='\$1' --oneline | awk '/\^:/
{print \$6}' | sort -u; \#"

To use:

git changelist "2 weeks ago"

You can use a lot of different unix date formats in there.


Chess Ratings

Chess Rating

See it in action

The development team where I work will soon be celebrating the launch of our new company website a good old fashioned chess tournament. Now, like any good development team, we have our fair share of geeks; geeks with interests in a wide variety of geekery. One such geek is a big fan of fantasy sports, so we tasked him with organizing said tournament. As a result, we will be doing a round-robin tournament to establish a relative ELO rating for each player, then use these ratings to seed a double elimination bracket tournament (I’m about 60% sure I got the names right for all that stuff). Anyway, the key component for the round robin is having a method to establish our ratings.

The ELO rating system is the most widely used in the chess world, and with good reason. When you have a sport played by some of the greatest minds in the world, it only makes sense to have an overly complex and highly accurate way of showing relative strength. In fact, it’s so impressive that just about nobody outside of official chess organizations actually does it properly. The rest of the world kind estimates an ELO, or approximates it. I am happy to be one of those folks.

I have neither the time nor the care to implement a 100% accurate chess rating system. All I need for the tournament is something that works decently well. So, I built it!

My chess ratings page allows you to enter the starting rating for each player, pick the outcome of the game, and it will show you the new ratings. How do I do this? Well, I use a formula I lifted from! I didn’t steal their code or anything. I just followed the instructions on their FAQ (mostly).

Why don’t you go try it out! And if you’re interested in my algorithms and junk, here’s the main code.