Spinlocks and You

April 03, 2018

Spinlocks are a building block of concurrent programs. As long as you have more than one actor in your system, you’re going to need to be able to control access. We use spinlocks to maintain mutual exclusion - if process 1 is changing something in memory, we want to prevent all other processes from doing so. The idea is so simple that it seems too easy.

A Basic (and Wrong) Spinlock

Here’s our fist stab at a spinlock, badly written in C:

// lock_value is an integer that's shared between multiple threads
while (lock_value != 0) {
  // spin
}

lock_value = 1;

do_something();

lock_value = 0;

The general idea here is correct - we have some lock_value and we only want to allow processes into the critical section of code if the another process doesn’t “hold the lock.” Holding the lock means that lock_value is set to a non-zero value. There are few things that make this code bad. These things, coincidentally, are part of what make concurrent programming difficult.

The Operating System Scheduler

Operating systems have a scheduler and the job of the scheduler is to schedule processes to run for some appropriate and arbitrary amount of time. The scheduler doesn’t care too much about what you’re doing - when it’s time for another process to run, your code will be paused, anything currently in the registers will be set aside, and that other process will start running. Eventually your process will get put back on the CPU and keep running. What happens if your process has made it through the while loop and is about to set lock_value to 1 when my code comes along, gets scheduled, and does the same thing? What if my code runs manages to finish do_something() before the scheduler runs again? At this point, I technically hold the lock. You think nobody holds the lock. If your code is scheduled again before mine, we could leave the system in an inconsistent state. This would be bad.

Multiple CPUs

When there’s more than one CPU, things get even worse - each CPU may have a separate cache. So if both of our processes are running at the same time, but on separate CPUs, then each process will have a separate copy of lock_value. No matter how careful we construct our program, we can’t guarantee (with the above code) much. Two copies of our function could run on separate CPUs, believe that they both hold the lock, and then move forward, but do potentially conflicting and dangerous to the system.

Atomic Operations

Hopefully you remember what we were talking about in the first paragraph - the goal is to make sure that we prevent other actors in our system from being in the critical section of the code. This is mutual exclusion. Instead of just testing the variable, we need some kind of atomic exchange. This can’t be accomplished without hardware support. Thankfully, modern computers provide this kind of support. On Intel hardware, we have an xchg assembly operator that attempts to exchange the value stored in a lock variable with the value in a particular register. It then returns the value that was previously stored in the register. Here’s what that means:

If nobody held the lock, lock_value now contains 1 and the xchg returns 0.
If the lock was already held, lock_value still contains 1 and the xchg returns 1.

The Compiler

There’s one other thing that can cause problems: the compiler. As part of optimization, compilers can rearrange code. That sounds terrible, right? To prevent the compiler from rearranging our code, we can use special primitives that force some ordering. On Linux-like systems, we can use CMM_LOAD_SHARED (see User-space RCU for more details on these primitives). We don’t need to worry about writes being re-ordered because we’re using our xchg function for the atomic write. The reason we use xchg instead of the equivalent to CMM_LOAD_SHARED is that we want to make sure the write is visible across all CPUs, not just our own CPU. Otherwise, we could have two processors thinking that they’ve got the write!

A Proper Spinlock

With those three things out of the way, we’re in a good place to make a working spinlock. Spinlocks are simple but, as you can see, there are subtleties to them in order to get things working correctly. One Assumption: We’re going to assume that we’ve written an xchg function macro in C that will inline the right bits of assembly for us. If you don’t know what that means, it just means that we’ve written some code to call assembly language from within C. Here we go:

// lock_value is an integer shared across threads
while (xchg(CMM_LOAD_SHARED(lock_value), LOCKED) != UNLOCKED) {
  // spin
}

do\_something();

// unlock, but discard the result
xchg(CMM_LOAD_SHARED(lock_value), UNLOCKED);

That looks pretty simple, but we’ve had to take into account problems that can arise from hardware and compilers.

One More Thing…

There’s one more thing to keep in mind - this spinlock kind of sucks. The atomic exchange operation forces every CPU to clear that variable from cache, even if that CPU already has the lock. And so, the next time any other CPU goes to grab the lock, it’ll experience a cache miss and have to read that lock variable from memory. Most of us are used to thinking of memory as being incredibly fast, but when we’re comparing the difference between cache speeds and memory speeds memory is slow. Our spinlock will fire off xchg commands rapidly until it manages to acquire the lock. On a busy multi-threaded system, this will lead to a lot of cache misses caused by constantly clearing that lock variable from memory. Remember: there are ways to make spinlocks better and we’ll definitely want to use them, but for now, this is a good enough spinlock.

Log Gaol Padlock by Chris Fithall licensed under CC BY 2.0