Erlang's typing system: less is more?

Posted on Tue 16 October 2007

Erlang has a very rudimentary typing system, there are some primitive types (atoms, binaries, integers and floats), functions (it's a functional language), aggregate types (lists and tuples) and a bunch of Erlangisms (pid, port, and reference – a kind of UUID). The record type is really a tuple in disguise, since a record is a tuple whose first term is the record name, followed by the record fields in their declaration order.

And that's all. There are no means to define complex types as we are used to in traditional OO languages, nor is it possible to to associate behavior to a data structure by means of a class (or a prototype in JavaScript). This is surprising for a long-time OO developer that spends a lot of time to carefully craft class diagrams. But I think this is actually part of Erlang's strength.

The lack of formally defined structure types is actually not much a problem because of Erlang's powerful pattern matching features. Rather than indicating an expected type, you define an expected data layout by means of a pattern. I miss native string and dictionary types though.

Also, despite claiming it's a functional language, Erlang is also object-oriented. But the objects are not data structures and code as in Java or C++, they are processes, Erlang's lightweight concurrency unit, method calls being messages sent to processes. This is also reflected in modules, which are more than compilation units, since they can be required to implement a behaviour (a set of callback functions), which is actually equivalent to a Java interface.

So we have a strong separation between behavior described in processes, parameters which are messages sent to processes and behavioral state, which is the data held by a process. So behavior and state is "stuck" at a particular location in a process, and data moves around.

Now there is something more: data is immutable. You can't change the contents of a tuple or an element of an array, you need to create a modified copy. What seems a waste of resources at first is really the key to build distributed applications: since data is immutable, you don't have to care if the process you send it to is local or remote. Because it is immutable, the receiver can only consume it and not modify it, thus producing side effects that would be very complex to handle transparently in a distributed setup.

And the fact that there are no named data structures simplifies distribution a lot, since you don't have to care about the structure's definition being available and up to date on the remote machines. It's all primitive types.

And finally, there is no "null" in Erlang. Allowing an unspecified value must be explicitly taken care of with a special value rather than being implicitly allowed as in most languages. No more tests against null, no more NullPointerExceptions!

So in the end, Erlang's rudimentary typing system and immutable data, which is surprising at first, is actually a key element for what it was designed for: building large and robust distributed server applications that move around lots of data without modifying it much.

But Erlang is not a general-purpose language (I wouldn't want to build a number crunching application with it) , and this will IMO prevent its larger adoption, even in the massive distribution world where the Web 2.0 has moved us to. Maybe a competitive advantage for the startups using it?



Dynamo: Amazon's key/value store

Google unveils OpenSocial