Ragged Clown

It's just a shadow you're seeing that he's chasing…


May
12
2012

What’s your trick, Python?

The folks who wrote Pragmatic Programming recommend that you learn a new language frequently because, with each language, you’ll learn a new trick or a new way of thinking about programming that you never thought of before. When you go back to your old language you’ll take your new trick with you. Last year, I learned Objective C.

There is a lot to hate about Objective C. When I first started learning it, I felt like I had been time-warped back to 1987 along with some aliens from the planet Zarg but, over the last year, the language has improved so dramatically and so many of the rough edges have been smoothed that I could almost recommend it.

It’s still an ugly language, of course. The moments when you are confronted with bits of C in the middle of your Objective C method are like discovering that your ice cream topping is cod-liver oil.

It’s verbose too. The libraries feel like they were designed by colonial administrators in early-nineteenth century India. But, with automatic reference counting (ARC), it is no longer daunting to programmers who have forgotten how to alloc and dealloc.

Objective C has a couple of nice tricks though. My favourite is the fact that nil is an object and you can call its methods. In most languages, this would explode (or at least start a small fire):

collection = nil;
for (int i = 0; i < collection.length; i++) {
  id item = [collection objectAt: i];
  [item doSomething];
}

but it’s perfectly natural in Objective C. You can happily call methods on nil, it will return nil or 0 so you can just get on with your work without dodging NullPointerExceptions at every turn.

I like Objective C’s syntax for calling methods too, strange as it is. There is something heart-warming about the way that the method name wraps itself around the arguments so that in,

[object populate: collection
        fromFile: filename]

the method name is actually populate:fromFile:. It feels more comfortable than named arguments, in my humble opinion, and the way Xcode wraps the method call and aligns the colons makes it easy to read. If only the method names weren’t designed by colonial civil servants who mistook verbosity for clarity, it would be pleasant even. The names in the Cocoa libraries have that odd do the needful feel about them, like the authors learned grammar in a faraway country, probably one with steam trains, punkah wallahs and government forms in triplicate and it’s hard to love a language that doesn’t have a syntax for accessing array elements.

Ruby is the biggest trickster of them all. My only complaint about that language is that sometimes – especially in Rails – the whole language feels like one big trick. Every time I come back to it, I am constantly saying – “Wow! You can do that? That is awesome! Wait! How does that work again?”.

Ruby taught me blocks:

collection.each { |item|  item.do_something }

Sure, every language has blocks or lambdas these days, but there is just something very soothing about the simplicity of Ruby’s syntax that puts me at ease. In C#, I have to concentrate really hard to get the syntax right and, in Objective C, I doubt there is anyone in the world who remembers how to make a callback without looking it up online. I like to imagine that there was one primordial Objective C block written in a prototype at One Infinite Loop in 1994 and it has been copy-pasted ever since.

The trait I like most about Ruby is its humanity. If it seems like you can do something, you can. All these expressions work and do exactly what you might expect:

2.years.ago
3.times { print 'Ho! ' }
Date.today + 5.days
[1..100].each do |number|
  puts "#{number} is even." if number.even?
end

If only the Objective C folks would glance at the Ruby libraries and learn that terse does not have to be obscure and that verbosity is not intrinsically a good thing. Just ask the COBOL people.

A couple of years ago there was a debate online about the relative benefits of adding methods to objects to make a programmer’s life easier. The proposition was that such methods result in bloat which makes the API harder to learn but, really, how can you seriously argue that this:

if( array.length > 0 )
  element = array[array.length-1];

is more humane than this:

element = array.last

Meta-programming takes the Ruby language into the astroplane where the angels live and foolish mortals tread carefully. Here’s a builder for generating an xml file:

xml.slimmers do
  @slimmers.each do |slimmer|
    xml.slimmer do
      xml.name slimmer.first_name
    end
  end
end

And here’s the code for parsing some xml (OK, it’s not meta-programming but it is neat and tidy):

xml = File.read('posts.xml')
parser = XML::Parser.new
doc = parser.parse xml
doc.find('//posts/post').each do |post|
  puts post['title']
end

In Objective C, that would be over 7 million lines of code.

C# learned all of Java’s tricks and smoothed away its rough edges. It added lots of little tricks of its own to make it at least 9% better than Java. But its big, new trick is LINQ.

LINQ is essentially a functional language rammed right in the middle of a curly-braced imperative language. Once you get the hang of it, it’s amazing. I never did get the hang of it though and wrote all of my LINQ by typing it out in longhand and then clicking the helpful green squigglies that cause Resharper to turn this:

public IList<Album> FindAlbumsToGiveAway(IList<Album> albums)
{
  var badAlbums = new List<Album>();

  foreach (Album album in albums)
  {
    if (album.Genre == "Country")
    {
      badAlbums.Add(album);
    }
  }
  return badAlbums;
}

into this:

public IList<Album> FindAlbumsToGiveAway(IList<Album> albums)
{
  return albums.Where(album => album.Genre == "Country").ToList();
}

or, more ambitiously, into:

public IList<Album> FindAlbumsToGiveAway(IList<Album> albums)
{
  return from album in albums
         where album.Genre == "Country";
         select album
}

if I was in a functional mood (example stolen shamelessly from Alvin Ashcraft).

To achieve its lofty status of 9% better than Java, C# has had to add about 83% more syntax and therein lies its downfall. There is no way that one person can fit all that syntax into their brain unless he dedicates a lifetime to learning it, and why would anyone do that when there are so many finer languages to learn?

Less syntax is more, et cetera paribus, and this:

frequency = {}

is nicer than this:

Dictionary<string, int> frequency = new Dictionary<string, int>();

which brings us to Python, the language where whitespace is syntax.

At first blush, significant whitespace is Python’s big trick. There’s no need to add loop delimiters; just indent correctly – and you were going to do that anyway, right? – and Python will know what you mean. Once you get used to it, indenting loops is just so easy and obvious that you wonder a) why all the other languages didn’t copy it years ago and b) if Python has a better trick for me to learn.

Since Ruby, I am no longer impressed by parallel assignment,

a,b = 2,3

or generators,

def fib():
     a, b = 1, 1
     while True:
         yield a
         a, b = b, a + b 

sequence = fib() 

sequence.next()
>>> 1 

sequence.next()
>>> 1 

sequence.next()
>>> 2

or default values for arguments,

def f(a, b=100):
  return a + b

f(2)
>>> 102

or the myriad other ways that Ruby and Python are more pleasant to use than Java or C# (OK. I am a still a little bit impressed by generators).

List comprehension is a nice little trick,

numbers = range(1..100)
squares = [x*x for x in numbers]

but it’s not dramatically better than Ruby’s collect method,

numbers = 1..100
squares =  numbers.collect { |x| x*x }

or C#’s,

var numbers = Enumerable.Range(4, 3);
var squares = numbers.Select(x => x * x);

(OK, it’s a lot better than C#’s)

In fact, Python is so similar to Ruby that I feel forced to compare based on æsthetic terms alone and, æsthetically Python loses big time. If Guido and Matz were cousins, Guido would be the awkward, bookish cousin who is perfectly happy typing underbar underbar init underbar underbar open paren self close paren colon instead of initialize. Python has a strong mark of the geek about it.

Python also throws a lot of exceptions and you can barely shake a stick without causing a ShakenStickException. I mean, honestly, what is exceptional about getting something from a hash without checking to see if it’s in the hash first? Even Java gets that right, for Gosling’s sake!

Python’s inclination to hurl exceptions at the slightest provocation has cured me of the last traces of a youthful folly that said you should write the happiest of happy paths inline and put the rarer cases in exception handlers. Exceptions are nasty things and shouldn’t be tossed around lightly and guard clauses are not much better. The PragProgs (again) have a coding kata that requires you to minimize the number of boundary conditions in the implementation of a linked list. It’s a fine aspiration and finessing boundary conditions seems to result in less complexity and complexity is where the bugs hide.

The one Python feature that I haven’t seen anywhere else is the tuple. They are said to be magnificent and the distinction between

[1,2,3]

and

(1,2,3)

is allegedly profound but so far the significance escapes me. I’d be delighted if a commenter would help me understand or point me to some other feature that would make them choose Python over Ruby.

All this harsh buzz over Python might make you wonder why I would be foolish enough to decide to choose Python rather than Ruby at my new gig. The answer is that there is a specific library, nltk, I needed to use.

The natural language toolkit does cool stuff like this:

text = 'Mary had a little lamb. Its fleece was white as snow.'
sentences = nltk.sent_tokenize(text)
>>> ['Mary had a little lamb.', 'Its fleece was white as snow.']

which is harder than it looks. Once you have your sentences, you can find the words and, teleporting back to 6th grade language arts (assuming you grew up in America) or first year Latin (if you didn’t) you can analyse the parts of speech with:

words = [nltk.word_tokenize(sentence) for sentence in sentences]
>>> [
  ['Mary', 'had', 'a', 'little', 'lamb', '.'],
  ['Its', 'fleece', 'was', 'white', 'as', 'snow', '.']
]
parts_of_speech = nltk.pos_tag(words[0])
>>> [('Mary', 'NNP'), ('had', 'VBD'),
    ('a', 'DT'), ('little', 'RB'), ('lamb', 'NN'), ('.', '.')]

Hmmm. I think Mr Hickey would’ve gone with adjective rather than adverb for ‘little’ there. So would I. Anyhoo…

Once you have your parts of speech, you can diagram the sentence automatically (ssshhhh. Don’t tell your middle school kids):

That’s gotta be handy for something, right?

Now that we are stuck with Python, we get to wrestle with Django which is like Rails but brought to you by the same people that thought def __init__(self): was a good idea. I’m sure it’ll be great when it catches up with the state of the art but, Dudes! A separate language for templating!? I’m already learning a new language? You’re gonna make me learn another one for generating HTML? Didn’t you learn anything from JSP?

I think the folks who decided that separate languages for templates are descended from the folks who thought separate drinking fountains were a good idea. Is it really easier for designer folks to type

<ul>
{% for slimmer in slimmers %}
    <li>{{ slimmer.name|lower }}</li>
{% endfor %}
</ul>

than

<ul>
{% for slimmer in slimmers %}
    <li>{{ slimmer.name.lower() }}</li>
{% endfor %}
</ul>

Suddenly that significant whitespace business doesn’t seem so clever, does it? But, seriously, separate is rarely equal when it comes to template languages and the soft bigotry of low expectations hurts those it aims to help.

Template languages are the one area where the microsofties are ahead of the game with their Razor template syntax. It reduces the number of angle brackets and other unwanted syntax by 83%. Guaranteed!

<ul>
@foreach (var slimmer in slimmers)
{
   <li>@slimmer.name.ToLower();</li>
}
</ul>

How, you might wonder, if you know all these languages, are you supposed to keep all the various syntaxes straight? The plain answer is… I don’t. I immediately forget everything I knew about the previous language about two weeks after I stopped using it. That makes for embarrassing interviews when they ask me a Java question and, despite having 12 years of Java on my resume, I can’t remember how to construct and initialize a List, or is that a Vector? Or an ArrayList? One of them, anyway.

Fortunately for the forgetful among us, there is JetBrains. Even more fortunately, they have just released a brilliant Python IDE, PyCharm, to go along with the also brilliant, RubyMine and IntelliJ. They also have the brilliant Resharper for the microsofties but you have to use it inside the not-quite-so-brilliant Visual Studio and they don’t get along entirely well together. They both enjoy a lot of memory consumption for a start.

PyCharm amazes me a little bit every day despite my 10 years of being amazed by JetBrains. The type inference system is, frankly, spooky. PyCharm knows the type of a variable that I merely whispered to a colleague the day before and knows all its methods and parameters, what it likes to have for lunch and its taste in science fiction. It handles renaming and more sophisticated refactorings even better than Resharper and it doesn’t even have .NET’s type system to help it along.

So. Python.

To summarize:

  • It’s not quite Ruby.
  • It’s jolly excellent at text mining.
  • It’s a lot nicer than C# (except in html templates) or Java.
  • PyCharm. Oh yeah.

I’m happy with our choice so far but ask me again when I get good enough to stop needing to refer to my cheat sheet every ten minutes. I might have a more informed opinion.