Ashton Kemerling

AK

Internationalization Golf

Martin Grüner had a fun article about his experience writing an internationalized app. I thought it would be fun to share my own experiences.

My first job out of college was working on a Common Lisp (CL) web application. The application was only a few years younger than me, and had originally written in CL due to a particularly good HTML/XML library available in CL at the time. Unfortunately in the intervening years the HTML library stopped being state of the art, and the whims of enterprise software engineering had left CL behind for web development, resulting in a serious lack of common programming conveniences.

Right after joining the company, I was informed that the sales person had a potential lead with clients in a Spanish speaking country. The application at this point was English only, but the QA engineer was married to a native Spanish speaker who was willing to help translate the application. All we needed to do was wire the application up for internationalization and localization. I was tasked with picking or creating a library and interspersing it throughout the application. The only criteria was that it both looked good and was capable of displaying different languages to different users depending on their browser’s “accept-language” header. So compiling or packaging up a new application with hard-coded languages was not an option.

I eventually decided that all the existing libraries were insufficient and we needed to make our own. I’m still not sure if that was the right choice or not. CL doesn’t have the strongest library ecosystem around, but I was also a very young engineer and more susceptible to the “Not Invented Here” syndrome than I am now. Although a quick perusal through the current offerings involves libraries whose home pages are 404s, libraries who are nothing but FFI bindings to a GNU C library, and those whose list of defects includes “no documentation”, “undocumented code” and “slow PO parser”.

Compounding the issue was CL’s format function. CL has an exceedingly powerful formatter that is capable of unwrapping loops and interspersing the correct combination of “,” and “and” for a list of strings. This function was used with (reckless) abandon throughout the codebase, something for which I deserve some blame. The lack of dedicated template files compounded the issues; it’s a lot easier to reach for format when you’re producing HTML the handler-function itself.

There was no way I was going to explain the (non-technical) translator how to deal with format directives like this: “~#[NONE~;~a~;~a and ~a~:;~a, ~a~]~#[~; and ~a~:;, ~a, etc~].”, and I didn’t want to dig through 500kloc and unroll all the directives. So any translation system I made would need to support at least a subset of the CL formatting directives while hiding them for the translator’s sanity.

Worse still, CL’s formatter accepts positional arguments only, to my knowledge. Thus there’s no particular way to convince the formatter to modify the order of the parameters if the target language has different language structure than English. So my system would need to deal with that.

The final format I settled on would look something like this. The programmer (me) would change (format nil “~a” var) to (jibberish:format “<~a:variable-name>” “Descriptive sentence for translator” :variable-name var). We could then convince our code to print out a file for a language like this:

1
2
3
4
5
6
7
8
####################################
File: filename.cl
####################################

Original: <variable-name>
Translation: Translation goes here
Note for translator: Descriptive sentence for translator
Original Argument Order: [variable-name]

And so on. At runtime the language file would be parsed into a in-memory hash map, which would allow us to replace the format-string with the new one, re-order the argument list according to the translator’s needs, and strip the identifiers from the formatting string leaving the raw format directives.

So, how’d I do? Mixed results. The conversion was long and painful, requiring that each format directive and raw string be touched. A huge portion of the “Notes for translator” were either blank due to difficulty and fatigue, or were something along the line “Description, part 1/n” due to multiple calls being joined together in HTML. Changes in formatting calls required that the matching translation entry be hunted down in every translation file for the translation to still work.

But technical challenges are usually not a big deal, after all that’s a fairly large portion of what engineers do. Probably the worst problem with my system was how it worked for non-technical folk. The first attempt at giving this file to a translator resulted in her helpfully translating all the variable names into spanish, changing “It is <today>” to “que es <hoy>”, which resulted in some rather exciting errors. I think my own design sabotaged me in this particular instance, as it required way too much careful explanation to be usable. It was my first encounter with the major difference between using a program whose internals you are familiar and explaining its use to someone else who has never seen anything similar.

I think if I had to do it again, I would’ve probably spent more time and modified the way the program generated strings instead. I suspect that my custom library avoiding refactoring everything was a case of false economy, as the time saved up front would have been lost in the time required to train translators and maintain overly fragile translation files. I also learned that if you plan on selling an application in another language market, you need to think about that before you start writing code. Internationalization limits some of the choices you can make with your software and design, and it’s a lot easier to use that restricted set up front than it is to unwind them after the fact.

As a final irony, while attempting to write this post in Octopress (which uses Jekyll), it crashed several times because the SASS files in Octopress use unicode, which Octopress appears to hate out of the box. You have to change a few environment variables to convince it that Unicode is ok.

Thoughts on RubyMine

As part of my new job at Pivotal Labs I’ve been pair programming almost every day. The obvious challenge with pair programming, especially in a popular language like ruby, is in choosing what tools to work with. Vim, Emacs, RubyMine, TextMate, the choices are various and divisive.

To make peace among the engineers, it makes sense to dictate one set of tools to make peace among all your employees, and to make provisioning the machines easier. Pivotal has decided to standardize on RubyMine with a dark color scheme and a few custom configurations.

The Good

RubyMine, being set up for ruby in particular, works very well with navigating, indenting, and colorizing ruby code. With the exception of setting a variable as the result of an if expression, I have never seen RubyMine indent code incorrectly or get confused on coloration.

It also does a decent job of navigating to ruby classes and functions, something that is quite hard since ruby lacks explicit import semantics. It definitely makes hunting down odd test harness functions down a breeze, and has saved me in the past. It also is good at identifying the view that cooresponds to a controller method, but due to the way that tracker is laid out, I haven’t had a chance to use this feature often.

And as the cherry on top RubyMine includes a “Textmate like” quick find feature. In large code bases this will save you about a minute trying to find a particular file, so ling as you have an idea of what it’s called!

The Bad

If you only have unique controllers and models, you will absolutely love the ability to jump to a class definition. Since we use a lot of similarly named controllers inside namespaces to control API versions, it often gets confused about what version I want. The quick find feature sometimes ignores the path if you provide it, which makes copying files from stacktraces occasionally unreliable.

Javascript and less support is mediocre. No real complaints, but nothing to set it apart from other environments in my opinion. Maybe my colleagues who work on the front end would have a more nuanced opinion.

Also sometimes with large files the coloring or error checking can lag behind. This usually kicks in at files longer than 3000 lines, which is not unusual for test files. For the most part this is just an annoyance which doesn’t affect editing in any serious way.

It has git integration, which lags behind the offerings from Emacs, Vim, and Eclipse in my opinion. I end up using the command line instead of the built in tools.

The Ugly

RubyMine is incredibly dim witted when it comes to parenthesis and quotes. If you wish to put an escaped quote at the end of a string, RubyMine will let you escape your ending quote, then insert a matching pair right afterwords when you try and fix the mistake! To add insult to injury, you must move the cursor before fixing the issue lest RubyMine delete both of the extra quotes. Very frustrating.

RubyMine is also one of the most memory consuming programs I use. At least once a week it will grind my machine to a halt due to memory usage, which is impressive on a machine with 16G in memory.

Conclusions

Out of the box, RubyMine works very well. I think it is a good compromise for large teams. But if you have time to learn some more complicated tools, I believe you would be way better off learning a more customizable editor like Emacs or Vim. It might take some effort, but these editors can do just as much as their commercial cousins and will continue to do well no matter what language you decide to use in the future.

Org Mode

Recently I’ve taken interest in making myself more productive, and at least so far it’s going well. I personally attribute part of my current productivity to a new “shut up and get back to work” mentality, and partly to a new (to me) organization system.

The problem I’ve had is that every single organization app is broken in some way. The only truly flexible system uses paper and pen, and I don’t really want to deal with that.

The solution appears to be org-files. Think of them as a close cousin to Markdown, but for organization instead of HTML. The systems that use org-files support basically any kind of work flow imaginable because the storage mechanism is just text, so in the worst case scenario you can always just edit the file manually.

Org-files are originally from the Emacs plugin org-mode, but there are now org-file readers for Vim, iPhone, and Android.

Example File.

#+LAST_MOBILE_CHANGE: 2013-10-01 19:59:54
#+TODO: TODO IN-PROGRESS DONE


* Butler                                                           :projects:
Butler is an Emacs plugin for Jenkins/Hudson use. Currently it supports:
+ Viewing jobs.
+ Encrypted authentication info.
+ Triggering non-parameterized jobs.
** Wish List.
*** Paramtererized jobs
Possible to get the parameter information along with the job status in one query.
*** DONE Better formatting.
DEADLINE: <2013-11-02 Sat>
CLOCK: [2013-05-06 Mon 20:21]--[2013-05-06 Mon 20:40] =>  0:19
:PROPERTIES:
:ID:       6A634BFA-F18E-4C92-A48D-DC3254A67CAE
:END:
Tabular mode?
*** DONE COMPLETED Job Progress
*** TODO Improved HTTP
:PROPERTIES:
:ORDERED:  t
:ID:       F720BD2F-4966-455B-8F14-5530393340CD
:END:
**** DONE Avoid roundtrip to encrypted auth file
**** TODO Silence message output
:PROPERTIES:
:ID:       DE9913E9-AC68-42D3-B7D2-613BB775A16B
:END:
**** TODO Auto refresh
 :PROPERTIES:
:ID:       01EC3DB8-ABCA-4063-8999-AA68D8D05528
:END:
**** TODO Console output
:PROPERTIES:
:ID:       112F3FD6-7D23-485F-8B5E-43F0B0BA839E
:END:

Okay Ashton, What the Hell is That?

Let’s break this down bit by bit. Any line starting with #+ is used to tell the org-mode reader something. In this case it lets us know the last time we synced this to our mobile device, and the possible TODO states that an item can be in. It’s not unusual to see formatting directives at the top of the file to control color, indentation, and similar settings. A lot of settings can either be set globally, or on a per-file basis.

Next we have the individual headings. Each heading starts with a number of asterisks. The number of asterisks indicates the level. So ** Wish List. is a child of * Butler. Most editors allow you to fold headings to show or hide their children, for convenience.

Each heading can optionally have a TODO state, which is displayed in front of the heading title.. The default is one of blank, TODO, IN-PROGRESS, and DONE. A simple keystroke in most editors cycles from one state to the next. It’s also possible to support multiple paths that a project can flow through. I personally use the default along with REFERENCE and ABANDONED, which allows for me to easily filter for abandoned projects, along with notes that aren’t actionable yet.

Org-mode supports 3 different kinds of time tracking and scheduling. Each heading can have up to one deadline entry, one scheduled entry, and multiple clock entries. Deadlines are for when an item should be finished, and scheduled is for when a project should be started. Clocks are for recording the amount of time used for an item. You can see an example of both the deadline and clock under the “Better formatting.” header. Org-mode also supports a very flexible type of repeat scheduling.

With any good organization system you’ll need the ability to tag and filter items. Org provides two ways to do this, properties and tags. Tags are generally shared across multiple entries, with values like “work” and “home”. They are displayed on the far right of the tagged line, such as “:projects:” on the Butler line above. You can tag a line with as many tags as you want.

Unlike tags, properties are composed of keys and values, many of which are unique to a specific entry. If you use any kind of syncing system, each item will get an ID property used to enforce uniqueness. A lot of systems will also respect the LOCATION property if you sync to a system that understands this, such as iCal.

And finally you can have notes and text underneath each entry. Most org-mode readers understand free text, unordered lists (- or +), numbered lists, and check boxes (- [ ] and - [X]). Many of these can be easily manipulated with simple keyboard macros, such as the check boxes.

As you can see, org-mode basically provides a super set of all the organizational features available. It’s up to you how you wish to nest these, mark them due, and prioritize. That’s why I love it, I’m not locked into one particular layout that may or may not fit me, instead I can slowly evolve it as my systems mature. Expect more posts in the future about Emacs specific configuration settings to make org-mode easier to use.

The Best of Lisp™

Edit: Dan Benjamin himself informed me that I missed the sarcasm boat here. I’m leaving the relevant bits, and cutting out the now irrelevant criticism.

I’m slowly working my way through the Back to Work podcast. I’m way behind, so please excuse that this post references an episode over 2 years old at this point.

Episode 11 was about “future proofing your passion”, among other things. In it, Merlin mentions that one thing you can do as a low risk investment in a programming career is to maintain current on new languages. He mentions Scala, Erlang, and Lisp as a few. With an uncharacteristic lack of self-awareness unusual amount of sarcasm, Dan expresses pretty much disdain and lack of interest in anything other than Ruby, even going as far as saying “Ruby took the best of Lisp”.

While Dan may have been joking, it is probably worth examining the relationship between Ruby and Lisp. As someone who has worked in Ruby and Lisp professionally, I feel uniquely qualified to talk about their relative strengths. While there are a ton of cases where Ruby clearly learned and copied from Lisp, there is also a laundry list of areas where Ruby failed to learn from Lisp’s successes and failures.

Functions

Being a functional language, functions are pretty important to Common Lisp. Ruby thankfully took some of the highlights from CL during its development, like a heavy emphasis on using map & filter over iterative loops, but it missed some important things.

CL only has one kind of function, the function. There are some nice syntax features for the way you define them, and a few extra features for how they’re called, but they’re all just functions. Some might be anonymous, while others might be bound to a symbol. Sometimes you call them by name, or with funcall or apply if you need to pass them around. Sometimes they’re a multimethod from CLOS which dispatches depending on the class of the argument, but at the end of the day they’re all just functions which behave the same. Ruby picked up on the idea of using different syntax for some functions (think Blocks), but for some reason included Procs and Lambdas as well. Most Rubyists I’ve met express confusion over what the exact differences are, and always say “Just use one kind and one kind only”.

Common Lisp also has much better syntax for parameters. In particular it provides for required and optional positional parameters, enumerated keyword parameters, and extra parameters to be collected into a list. Ruby learned well by also allowing default parameters to be evaluated at call time, but it ended up with a way worse syntax. Particularly bad is the way Ruby handles extra keyword arguments, simply dumping them into a lisp, optionally merging it on top of your default arguments hash if needed. CL provides a way for default values to be defined for keyword arguments in the method header, which is far cleaner than the Ruby way.

DSLs

Ruby often gets compared to Common Lisp in its ability to produce Domain Specific Languages, or DSLs. Both blocks and Ruby’s special dispatch mechanisms are used and abused by programs such as Chef to create an easy interface between high level concepts and low level details.

And Ruby is pretty good at DSLs, I would estimate that it can hit a good 60-80% of the DSL cases fairly easily. But especially once you step outside the bounds of configuration DSLs like Chef, Common Lisp truly shines. In Ruby you can’t really add new syntax or special characters, and you can’t easily force it to recompile a given section of code to behave in a specific way.

But you can do that in Lisp. Lisp’s macros are so powerful, many languages are first prototyped as a large section of Lisp macros. Clojure was originally implemented in a few hundred lines of Lisp macros. This involved adding new meanings to symbols that don’t exist in CL, like { and }. But with Lisp’s reader and regular macros, this is perfectly reasonable.

Fragmentation

One of the biggest weaknesses of Common Lisp was its weak standard. The standard did not specify a lot of common behavior, such as threading. The result was a proliferation of both libraries that were implementation specific, and libraries designed to bridge the gap, like Bordeaux Threads. At this point every library and application is so set in its ways, the chances of the standard being unified and fixed is almost none.

Yet somehow Ruby ended up with no spec at all, only a reference implementation. It’s so bad that the Rubinius people proudly proclaim that they created 20,000 specifications to as closely as possible match MRI, the reference implementation. While this is a laudable effort, I think the general opinion is that only the reference implementation is to be trusted, so most libraries assume that you’re not using Rubinius, MacRuby, or anything else. This is sad, because it means that the Ruby community is very unlikely to use anything other than MRI, no matter how good a competing runtime might be.

The Takeaway

I love Ruby. It’s a good language. But if you want truly mind opening programming moments, Lisp is the only way to go. No other language can give you the sense of “I can do anything” quite like it. I highly recommend every Rubyist at least try Common Lisp for a bit, to at least understand where their language of choice came from.

Advanced Fear

One of the most scary moments in most young adults’ life is the realization of mortality. Teenagers and younger children often understand on an intellectual level that people die, but emotionally that’s something that happens to other people. There’s very few things quite like the moment when an adult realizes that they must die, there’s nothing they can do to stop it, and there’s not a lot they can do to delay it.

Most people stop there. But if you move beyond the “go to work, make money, enjoy weekend” stage of your life, you’ll get to experience what I call “Opportunity Cost Fear”.

Every action you take prevents you from taking another. Spending $10 on a lunch means you can’t spend that $10 on anything else. This is called opportunity cost. It applies to every limited resource in the world. Your time, attention, money, emotions, energy, and physical strength are all subject to opportunity cost, even though many people and organizations do not think about it.

Those who wish to accomplish many things can suffer a crippling fear because of it. Spending time on this blog post means I’m spending time not reading blogs, writing software, or talking with Leah. It also means I’m not spending my time going to the gym to be healthy, or trying to make friends in a new city. And the fact that I can’t have everything I want, just like the fact that I can’t live forever, can be deeply terrifying if you care about doing multiple things.

Just like the fear of death, the best way out of opportunity cost fear is to press forward. The big risk with both is that the fear will cripple you to the point where the worst might as well have happened. Are you really living if you spend every moment terrified of your own end? What’s the point worrying about opportunity cost if the fear prevents you from executing on any of your plans? In both these cases, the fear alone can be just as bad as the actual outcome.

Now, I’m not saying that the rational choice is disregard for the objects of your fear. If you’re afraid of death, riding a motorcycle without a helmet on will neither eliminate your fear or increase your lifespan. You should be aware of why you feared a thing, and take the most rational steps to reduce it. Nothing you will do can ever eliminate opportunity cost, but you can make sure that you’re spending your time as best you can. You’ll never do it perfectly, but it’s better than nothing at all.

Time, Attention, and Pairing

Merlin Mann gives a great talk about Time and Attention. The gist of his talk is that in order to create great work, you need to balance time vs. attention. Without using your time, you’ll never create. But you must use your attention to determine if you’re creating something great. Great producers balance between these two to both produce, and produce good work.

Part of my new job at Pivotal Tracker involves pair programming. While a lot of my programming friends express doubt about pairing, I’m very impressed with both how productive I am pairing, and how quickly I’m learning the code base while pairing.

The way that a good pair switches between who is “driving” is very similar to Merlin’s talk. The person who is typing is responsible for producing, while the person who isn’t is responsible for ensuring that the pair is headed in the right direction. Pairs have a much easier time finding this balance and staying on track than individual workers do.

It’s hard to alternate between time and attention effectively when working alone. There’s no overt signal to when one is using time or attention, making it easy to mistake “research” (reading Wikipedia) as using your attention. As a pair this balance is much easier to achieve, because each member is tasked with one role, and it’s obvious when that role is not being fulfilled. The non-driving member can’t goof off on Reddit or Wikipedia for the sake of “research” because it will prevent the driving member from working. And the driving member can’t slow down too much without the other member taking over typing from them. This helps ensure that the pair produces better code faster.

I’m not sure if the creators of pair programming are familiar with Merlin’s model of work, but accidentally or not they’ve made it much easier for programmers to follow his advice. I’d recommend any company to give pairing a try and see what it does for the quality and quantity of work that their teams produce.

Moving to Denver

You know those hilarious “pranks” where someone posts something embarrassing someone else’s Facebook account? Apparently in the improv community of Chicago the standard prank is to post something saying that they’ve accepted a lead role, and that they’re moving to Denver, Colorado. We found this out when some improv actors at our going away party took some convincing that we were actually moving, and that it wasn’t an elaborate prank.

The main difference for us is that I’ve not decided to give up software engineering for comedy. Instead I’ve accepted a role as an engineer at Pivotal Tracker, which I personally think is a better deal. Pivotal is a great company with a unique and productive style, and I’m thrilled to be working with them.

While there’ll always be a special place in my heart for Chicago, I’m rather glad we’ve made the move. It was stressful and very expensive, but living in Colorado has already been a very pleasant experience. The weather is shockingly nice, apartments are a nicer for the price, and I can safely walk to work.

Expect photos from the mountains and trails above Denver in the coming months.

The Dangers of Partisanship

Anyone who knows me personally knows that I am a huge Clojure fan. I could go on and on about why it’s the best; the regularity, the macros, ClojureScript, core.logic, etc. etc.

But last week I attended LambdaJam, which was awesome by the way, and I came back with a different opinion. No, I don’t think Haskell or Erlang tops out Clojure, I’m still probably going to reach for Leiningen for all my personal projects. But I learned that it’s a good idea to keep an eye on other languages, since no language will ever be so good that you never need to touch anything else.

In particular I learned the joy of the APL based languages, especially J. I learned that the F# people have a novel way of dealing with statically typing external resources via type providers, and I learned that Haskell can indeed be tight and clean despite what all those monad tutorials imply. I will probably take each of these subjects apart in separate posts in the future, I’m still trying to pick up bits and pieces of my thoughts after several intense key notes overloaded my brain.

What’s the take away? That functional languages are a lot more diverse than most OOP/procedural languages, and that there’s a lot that can be learned from them. So instead of picking your favorite and only learning that, you really should try to learn several to cover your bases.

Stealing Terminology

One of my favorite tricks is to borrow terminology from other walks in life. This is particularly important for me, since it’s all too easy for engineers to end up living and talking in pure engineering speak. Looking to other careers for the correct words to describe something allows one to express ideas that might not be easily communicated otherwise.

If you look carefully, you can see examples of this in engineering blog posts. People talk about fighting entropy, or the metaphysics of data. These are terms that did not originally have any engineering context whatsoever, being brought in to express ideas and concepts about another subject.

My most recent borrowed term is Force Multiplier. A force multiplier is anything that allows an group or individual to be more effective without an increase in effort. It’s normally a military term referencing how better weather forecasts, equipment, or intelligence can allow a unit to accomplish much more without increasing effort/losses. I like to use it in reference to things like unit tests and automation, that make me more effective as an engineer without costing me more time. If I don’t have to babysit a build waiting to type in the next step, then I can get back to work while a CI server works for me. I’m instantly more effective without actually spending more time at work. With a comprehensive suite of unit tests I can spend less time worrying and validating that my refactoring has damaged the quality of the system, thus I can get more work done in the same period of time.

I’m sure there are far more examples of this than I’ve listed, but they’re always fun to find and analyze.

Fast Cheap Good

In the distant past (1950’s or so), project managers and engineers came up with what is known as the project management triangle: fast, cheap, or good; pick two.

While software engineering can be very different from mechanical, it does at least share the same project management setup. Quality software designed cheaply will be late, cheap software released early will be poor in quality, and quality software released on time will be expensive. These differences come from the quality and number (thus cost) of the managers and engineers, the choice of methodologies, scope of features, and internal organizational setups.

What is different is the fact that software engineers aren’t limited by physics the way that our mechanical brethren are. With few exceptions for high performance computing, the limitation of most software projects is the imagination and effort of its engineers, not hard limits in manufacturing technologies or physics. Combine this with a fad-heavy market for programming methods (scrum! extreme! agile! pair!), and it can be very tempting to assume that we can find the perfect balance with the correct management processes and the right methodology.

This is false, of course. Management and methodology is about dealing with the communication overhead when enough people are working on an project. The pipe dream of management and methodology is for a group of N producers to produce N times more than one person alone. This is of course rubbish, as the Mythical Man Month demonstrated handily, it’s simply impossible to manage or process your way to good, cheap, and fast.

So what’s the point of it all then? Why don’t we just go back to waterfall? Because the point of agile, scrum, pair programming and friends is not to get us all the way to good, cheap, and fast. The point is to go from choosing one of three, to choosing two of three. A poorly managed, low discipline team can only choose one of good, cheap, and fast; and this is of course worthless. Cheap software that’s late will probably still run out of VC and be beaten by the competition. Bad and cheap software will struggle to take over the market, and software that’s both bad and late probably shouldn’t be written. However a well managed, well disciplined team can survey the market, measure the competition, and knowingly choose what compromises they wish to make in speed, quality, and cost. Poorly managed teams blunder into one of the choices, usually cheap and bad, and end up having very little control over their own fate.