Coderific

rss feed

blog - debugging Erlang-style concurrency

posted by witten on May 27, 2007

Although Erlang is not a widely used language today, you'll find that many developers heap praise upon its concurrency model. Instead of traditional and error-prone shared-everything locking semantics, Erlang employs a sort of lock-free asynchronous message passing between threads. If you haven't yet read about this approach or need a refresher on the subject, I recommend the excellent article "Erlang Style Concurrency" at http://www.defmacro.org/ramblings/concurrency.html

As cool as this style of asynchronous concurrency happens to be, it is not without its own set of problems. In a previous post (http://coderific.com/blog/post/489), I argued that:

A hugely important and oft-overlooked factor in selecting a code library is the relative ease of tracking down and exterminating bugs when using it. If a library you're using makes debugging unnecessarily difficult, then you're going to spend more time hunting down bugs and less time actually writing code.

As it turns out, one of the problems introduced by Erlang-style concurrency is a decrease in debuggability. Let's say that you're used to standard Java-style locks-all-over-the-place synchronous concurrency. When something goes wrong, such as a thrown exception or a deadlock, you have easy access to a complete call stack. With one glance you can see which thread initiated the call and each function that was subsequently synchronously invoked, all the way down to where the exception actually occurred. There's no question about the order of execution because the execution is ordered.

But in a program written in the style of Erlang's asynchronous message passing, when something goes wrong, there's no call stack to fall back on for debugging purposes. Each successive passing of a message from one thread to another is performed without any blocking whatsoever, so as soon as a thread sends a message to its destination, that caller immediately goes on to do other things without waiting to find out whether the message eventually results in an exception of some sort. If an exception does occur within the destination, there's no record of how you got there.

The result is that naively written asynchronous concurrency code can be more difficult to debug than its synchronous lock-based counterpart. Sure, lock-based concurrency has its own host of debugging problems such as deadlocks and race conditions during access to shared data, but despite these problems you still have access to a call stack for ease of debugging.

Given this disparity in debuggability, some programmers shake their heads and resign themselves to writing lock-based concurrency in Java for the rest of their lives. However, neither the baby nor the bathwater requires throwing out just because Erlang-style concurrency makes it difficult to get at call graph data. The simple solution is to take the little bit of extra effort required to make that call graph data available to the programmer, even if it's not available "for free" as it is with synchronous concurrency.

You can start by just adding logging everywhere. The logging can optionally be compiled out in production code for performance reasons, but the general idea is that by looking at the program's log, you can get a sort of poor man's call stack as each successive thread passes messages. When thread A sends a message to thread B, log it. When thread B then, as a result, sends a message to thread C, log that too. And so on. That way with some basic log analysis you can discern the series of events just as if you had an actual stack dump. For instance:

thread A sent message M to thread B
thread B sent message N to thread C
thread C sent message O to thread D

Now this logging approach isn't entirely fool-proof. If you have multiple messages firing all at once, which is a fairly realistic use case for any program written with asynchronous concurrency, then the messages will be interleaved in the logs and you won't have any idea which messages fired as a result of which other messages:

thread A sent message M to thread B
thread A sent message N to thread B
thread B sent message O to thread C
thread C sent message P to thread D
thread B sent message Q to thread C
thread C sent message R to thread D

In this example, unless the contents of the messages themselves somehow indicate a correlation between successively fired messages, you'll have no idea which message sent from thread A to B caused which message to be sent from B to C.

So logging has its limitations as far as debuggability goes. Another slightly more complex solution involves actually building up a faux call stack associated with each path of messages. When thread A passes a message to thread B, tack on a bit of additional information with that message that basically includes the sender, message, and receiver. Then when thread B dispatches the message to one of its handlers and decides as a result to send its own message to thread C, tack on additional info for the new sender, message, and receiver. Repeat this, tacking on call information to each new message that is passed. Include as much or as little information as you think will be useful during debugging.

In this manner you can build up a record of the call graph as with the logging above, but the list of previous calls will be associated with each message individually rather than just dumped to the log for all messages. Then when an exception occurs, simply print out the simulated call stack for the message you've just received, and you'll have a full list of all the messages passed that got you to the handler you're currently in.

One concern with this scheme that immediately jumps to mind is performance. If you're doing all this extra work just to simulate a call graph that you would otherwise get for free with traditional synchronous concurrency, then is it really worth it? Well, think of it this way. When you use standard lock-heavy concurrency, your call graph isn't really free. Every time you make a function call, you've got to push a fair amount of data onto the stack. So in order to be able to dump the stack within a deeply nested synchronously invoked function, you're relying on call data that was copied to the stack previously. With the simulated call stack approach, you can store that exact same call data, and if you like, you can store it in the same way on a stack data structure of some sort. That should hopefully allay any concerns about performance differences between the two approaches.

Another valid concern is that the various simulated call stacks will grow too large and therefore eat up undue amounts of memory. Since calls with this approach aren't synchronous and therefore never return such that call data is popped off the stack, a series of asynchronous messages will just cause the simulated call stack to increase in size without bound. Whether this actually turns out to be a problem in practice depends on your message passing usage. If your program really does have a need for very long chains of message passing, such as with bounded but deeply nested recursion, then you could always truncate the simulated call graph at some maximum size. This will save some memory at the cost of make debugging slightly more difficult.

However if a series of passed messages in your program tends to terminate after a certain point, then you shouldn't run into unbounded call stack problems. This could look like thread A messaging thread B, which as a result messages thread C, which then decides not to do any message passing in its own handler. The simulated call stack could then at that point be manually or automatically freed, and so won't grow any further.

Hopefully this post has given you some ideas for making your own use of Erlang-style concurrency easier to debug, whether that involves Erlang itself, Python's generator-based microthreads, or something else entirely.

3 comments

Write a comment!
  • Re: debugging Erlang-style concurrency posted on May 28, 2007 01:21 AM

    Hello,

    i have been developing an erlang based application for about six months now. While you are right that there is no "system wide" stack trace, i never felt the need for them.

    You have two aspects when programming in erlang. Most code is straigth forward functional code which is easy to debug. Out of these pieces of code you build your processes, where each process still has a pretty simple flow of control. (main loop, state machine, ...)

    If there is a fault in one process, the bug is either within the process or results from a faulty message to that process.

    If you have structured your application by the OTP principles (which you should), you get the last state of the process and the last message to the process (and the call stack within that process). This is all you need to decide if the bug is within the process or the process sending the message.

    If you need to dig deeper there are various powerful (multi node, ...) tracing functions and a debugger.

    reply | quote

  • Re: debugging Erlang-style concurrency posted by andrewcooke on May 28, 2007 03:41 AM

    I have been wondering about this too. On a slightly different tack I have started identifying some habits/techniques that help avoid problems, or make them easier to diagnose:

    - Match for unexpected messages (and call error/2 when you receive them).

    - Use tagged tuples for messages (standard Erlang)

    - Use {name, payload} pairs for messages, where payload is a separate tuple. This allows matching on the name without hard-coding the payload structure in each pattern (eg you can match the payload using "_").

    - Fail deeply. When failure occurs, fall back to as basic a state as possible in your FSM.

    - Avoid invariants across processes. This is a hard one, because invariants are so useful, but maintaining an invariant across several threads requires detailed negotiation that cannot always "fail deeply".

    - Use explicit, restricted protocols. "Fail deeply" and "avoid invariants" drive the system towards a "permissive" protocol. This should be resisted as much as possible, since it becomes difficult to detect errors (ie "Match for unexpected messages" becomes less useful).

    reply | quote

  • Re: debugging Erlang-style concurrency posted by witten on May 28, 2007 10:41 AM

    someone wrote:
    i have been developing an erlang based application for about six months now. While you are right that there is no "system wide" stack trace, i never felt the need for them.

    I actually wrote this post originally not so much because I personally find a constant need for stack traces across processes/threads, but because the lack of such stack traces is one argument I've heard people make against Erlang-style concurrency.

    You have two aspects when programming in erlang. Most code is straigth forward functional code which is easy to debug. Out of these pieces of code you build your processes, where each process still has a pretty simple flow of control. (main loop, state machine, ...)

    Definitely the case with languages like Erlang. However the call stack idea might be more useful to users of imperative languages like Python or Java, who don't have the benefit of nice easy-to-debug functional code.

    If there is a fault in one process, the bug is either within the process or results from a faulty message to that process.

    If you have structured your application by the OTP principles (which you should), you get the last state of the process and the last message to the process (and the call stack within that process). This is all you need to decide if the bug is within the process or the process sending the message.


    I agree that the majority of the time, this sort of information is sufficient. But sometimes it's nice to not only know where the last message came from, but where the message that caused it came from, and where the message before that was sent from, and so on. A basic stack dump can save you from having to delve into the program with a debugger to obtain this sort of info manually.

    reply | quote