Division Without Division

March 12, 2019

Not all processor architectures support the division operation. What if you’re writing software for one of these CPUs and you need to support arbitrary division? We’re in luck: we can perform the same operations that the CPU would perform.

It’s possible to perform division using a combination of left shifts, right shifts, addition, and subtraction. Earlier CPUs had to do this kind of thing before engineers decided to build this logic into the CPU. This code is inspired by the C code, and general algorithm, present in Hacker’s Delight (Section 9-4 in the Second Edition). This algorithm won’t be as efficient as if your processor supported division, but at least it makes division possible.

Creating Division

Hardware division assumes that the numerator is twice the width of the denominator. If you’re performing 64-bit division on Intel hardware, the CPU itself assumes that the numerator is a 128-bit number made up of two 64-bit words. Most programming languages hide this from you and only allow you to divide data types of the same width (many languages will evenly sneakily cast your data types to be of the same width). By hiding that the numerator is twice the size of the denominator, your favorite language removes a possible error: the result of division being too large for the variable that’s supposed to contain that result.

divmodlu      :: Unsigned               -- numerator high bits
              -> Unsigned               -- numerator low bits
              -> NonZero (Bit WordSize) -- denominator
              -> (Unsigned, Unsigned)   -- division and modulus result
divmodlu x y z = loop x y z 0
  where loop :: Unsigned                -- numerator high bits
             -> Unsigned                -- numerator low bits
             -> NonZero (Bit WordSize)  -- denominator
             -> Ix 31                   -- position in the word
             -> (Unsigned, Unsigned)    -- division and modulus result
        loop x y z i = let t  = x `shiftR` 31
                           x' = (x `shiftL` 1) .|. (y `shiftR` 31)
                           y' = y `shiftL` 1
                           xy = f x' y' z t
                       in case incIx i of
                            Just j  -> loop (fst xy) (snd xy) z j
                            Nothing -> xy
        f :: Unsigned -> Unsigned -> NonZero (Bit WordSize) -> Unsigned -> (Unsigned, Unsigned)
        f x y z t | x .|. t >= z = ((x - z), (y + 1))
                  | otherwise    = (y, x)
          where z' = Unsigned [ bits = z.bits ]

An Explanation

Here’s what is going on:

x holds the high bits of the numerator. If you want this division to work like the division you’re familiar with, just always set x to 0.
y holds the lower bits of the numerator.
z holds the denominator. This code is saying that z must be a binary array (Bit) that is the size of a CPU word (the WordSize) and this value cannot be zero. The compiler will make sure that z is not zero either by compile-time analysis or by inserting run-time checks.
The result of this computation is going to be the result of both division and modulus. Both results are computed, so we might as well return them.

The loop function makes sure that work is performed the appropriate number of times. But using an index type (Ix 31) we’re able to control just how much iteration occurs in a very explicit way. Whenever we attempt to increment the index type, the result is a Maybe (Ix 31). If it’s possible to increment the value, we’ll get Just n where n is the new value; we can use that value to continue looping. If we can’t loop anymore (our index value has reached the maximum allowed value), then we’re done and we can return a pair that contains the result of both division and modulus

Where Should We Divide In Software?

Unless the CPU does not support division at all, we should use the CPU’s instruction set to perform division. Hardware will almost certainly be faster than this operation. I suspect that modern compilers can perform better optimizations on straightforward division (e.g. x / y).

This is definitely not something you should consider lightly. And, as always, consult your CPU vendor’s instruction set architecture for the relevant. That being said, it’s still pretty cool to write our own division.

CaLcuLaTor by Sykez Tom is licensed under CC BY-ND 2.0