Thursday, May 20, 2010

Scalable concurrent deterministic event processing via lazy topological sorting

Quite often in designing software you come across the case where you have a stream of events or messages and you have to correlate them to update a set of entities (distinctive state objects). In a financial system, you might have a stream of transaction events (not to be confused with transactions in the consistent/concurrent data sense) that are used to update multiple accounts (i.e. transfer money from one account to another) and in a video game you might have a set of things that happened (collisions, triggered events, timers, events from entities) that cause entities to be updated (rocket hits player, rocket explodes, player dies). Of course, in an MMORPG with a full economy system, you might have all of the above!

In general, processing events in a deterministic order per entity is also an important feature of these systems. In the case of a bank, it's usually illegal to process transactions out of order, because if I transferred money into an account and then paid a bill, I wouldn't want to end up in overdraw. For a game it may be important for reproducibility, which can be very important for debugging and keeping state in sync across a network for multi-player.

Doing this as a completely serial process is straightforward and easy. However, doing it concurrently can be quite a tricky problem, one that can be very hard to solve with fine grained locking. Interestingly enough, a simple traditional transactional system (as implemented in an RDBMS or Software Transactional Memory) alone isn't quite enough to handle the problem of deterministic ordering as transactions themselves will need to be processed in a deterministic order, which brings it back to a fully serial problem!

What we need is a lazily processed topological sort for events, where events are processed in dependent order and dependencies are implicit based on what entities must be consistent at the time each event is processed and the order the events arrive in. That sounds like quite a big problem to solve, especially with concurrency in the mix!

It turns out however that there is a solution, as long as you know what entities need to be processed for an event ahead of it being processed. Firstly, the system involves two processes running concurrently; the inherently serial event ordering (which may be split into two concurrent sub-processes) and the event processing, where multiple events can potentially be processed concurrently. I say "potentially" because in the worst case, every event involves the same entity, forcing serial processing.

Secondly, it turns out that once we have the system in place, we don't need a transaction system for consistency as with this solution each event is guaranteed to be processed consistently. This is good, because it means less overhead overall.

In the simplest and most naive terms, this ordering and consistency is accomplished by each entity having a FIFO queue. The event ordering process places events at the back of the queue for each entity the event processing for that event must touch. This is done in the order events arrive, to enforce the deterministic ordering of event processing.

The set of events for which this has been done are placed in a list (let's call it the "ordered" list). The ordering process will then poll through this list, checking which events have reached the front of all of the queues they have been placed in. These events may now be passed off for processing (on a work queue, or some such). This can be done concurrently to the event queuing done in the previous step.

The only thing that the processing must do (apart from the actual entity updates and event logic) is remove the events from the per-entity queues after the event has been processed. As such, queues must either be lock protected (although, only a single queue must be locked at a time, meaning no possibility of deadlocks) or using a lock-free algorithm (note this is a special case single provider/single consumer FIFO, as each entity may only be processed by on event processor at a time).

All these queues and such however impose an extra unnecessary overhead on processing. Luckily, we can virtualize the queues using a technique similar to the ticket spin-locks implemented in the Linux kernel. So in place of a queue, each entity has a ticket counter, one number indicating the ticket which is being processed, the other the back of the line. Instead of putting the event at the back of the queue, the event orderer takes a ticket from the back of the queue, incrementing the counter. The event processor will increment the "processing" counter on each entity after processing is completed (just like your number flashing up at the DMV) and during polling events which have all their tickets equal to the "processing" tickets of their entities will be dispatched for processing.

In fact, it turns out that if the counters are large enough they will never overflow in practice, only the *sum* of the tickets need to be stored in the event. It also turns out that there is only ever a single process that may increment each counter at any one time, so that only consistent atomic writes and reads are required, not even a full atomic increment. No locks, no composite atomic operations required (although your processing system may use them for dispatch and you may queue events that way)!

Another interesting thing about this algorithm is that it can be implemented so that event ordering is done on-demand when there are more events need to be processed or added. Given a queue of events to process for each "processing" thread, the ordering process can be invoked when the size of a processing queue falls below a certain threshold (with some sort of locking mechanism preventing two event ordering processes running at a time, although some finer grained locking mechanism can be used if only the polling part of the event ordering is done) and the ordering process can also be invoked when enough events have been added that need ordering. Doing it before the processing queue is completely empty means that other threads need not block waiting to get more events to process. In this way, all your event processing/queueing need only use limited size single producer/single consumer queues to work (for which there are some very fast lock-free implementations).

The main disadvantage of this method however is obvious; it's only useful if you know ahead of time what entities must be consistent for each event. Also, readers might block other readers, although in practice, this isn't a performance problem if you have lots of events that work on unrelated entities, because the other events can be processed as long as they're non-dependent. Finally, there is the potential for event ordering itself to become a bottleneck, although at this stage it is likely that some finer grained implementation of the ordering system and some micro-optimization can probably provide a reasonable boost.

Friday, February 5, 2010

The Very Real Economics of Climate Change

By now, most of us who aren't in denial understand why climate change is a problem. For those of you that don't I suggest coming here to an Australian summer and sitting in a car exposed to the sun. Then, everyone, read on.

The true economics of climate change show in the solutions proposed to it. The solutions that actually might be considered to work and are being used in common debate come at a high cost to the "consumer". That is, you and me, the people who buy energy, directly on indirectly.

Now, either by self interest or by ideology (and usually both) the powers that be that actually admit to the problem usually propose one of two effectual solutions; carbon taxes and emissions trading. These are respectively taxing carbon dioxide emissions directly and selling "permits" for people to emit a certain amount of carbon dioxide and allowing those permits to be traded. Both of these solutions put a direct cost impost on the consumer. In fact, I can guarantee that if either measure is passed, you'll see the cost on your power bill.

The unfortunate choice of the consumer, who right now can make very few decisions, is choosing between consuming less until the arrival of cheap efficient devices or lowering their standard of living. This isn't really a choice at all, it's a means test, relegating those who can pay to hanging on to their standard of living and those who can't to a worse one. The people who make decisions and expand market choices (such as choosing renewable energy) still make money, they aren't forced to change what they're doing or choose a lower emissions scheme. Electric generation is usually a monopoly in any location and other large carbon emitters are in the best case oligopolies.

Either way, the average person will be worse off, but those that made the decisions will still not be impacted apart from their consumption (just like the rest of us), which they can easily afford. What we really have to do is force the hand of the decision makers, those who can not only choose to give us lower emissions but who benefit from our consumption, to choose efficiency and clean energy over other forms.

In fact, with an emissions cap, carbon taxes and emissions trading don't even work to reduce emissions. Why? Because the cost is passed on to the consumer, who will likely consume the same amount of energy anyway.

If you want to reduce carbon emissions, the trick is to put the impost on the decision makers. Tax dividends, executive renumeration and capital gains in proportion to the carbon consumption of primary producers, heavy energy using industry and energy production. Make those with the power to make real decisions regarding carbon dioxide emissions pay for them directly.

This is the only recourse to real change.

Friday, January 8, 2010

A new hope... and by hope I mean blog.

I've decided to cancel my dreamhost subscription and switch to blogging here. All in all, it seems to be a positive move at a much more pleasant cost (free!).

Dreamhost had a bunch of goodies like shell access, subversion, unlimited MySQL database, oodles of space and a huge traffic capacity allowance. But really, in the end, it was all about the blog which I'm fairly sure can be done here just as well.