Tuesday 11 February 2014

Parallel Delaunay Triangulation

Resource handling post will happen in a bit, as I got distracted after reading a fascinating paper entitled Localizing the Delaunay Triangulation and its Parallel Implementation (Renjie Chen/Craig Gotsman). The paper itself is quite googleable, although in some cases it hides behind a paywall.
After a few false starts, it translated to Haskell relatively cleanly. As ever, this is not terribly idiomatic but I thought it might be interesting to some people.

First, some elided code: I have some very, very simple linear algebra types kicking around of the form FloatN. They're designed to mimic the ones found in shading languages or in SIMD intrinsics rather than behave correctly, so you can multiply them together and add a scalar to them if you like.

Further, I cobbled together a quick k-d tree implementation for doing spatial queries. In this case, the only important one is fetching all the objects within a circle. For my purposes I'm only interested in periodic triangulations and corresponding Voronoi cells, and will be restricting the point distribution to be dense with a known minimum distance between points. Anticipating a move to a more efficient spatial structure at a later date, here's the interface I'm broadly coding against:

class SpatialStructure2D a where
  type Query a
  positionOf :: a -> Query a -> Float2
  queryCircle :: a -> Float2 -> Float -> [Query a]

In particular, the Query a type and positionOf function exist so some future structure can use indices into a flat array or vector of positional data, which will be useful when constructing vertex buffers later. Might be thinking too far ahead there.

On to the implementation! A quick record type holds the local Delaunay one-ring (vertices directly connected to the point under consideration in the triangulaiton) and the circumcircles described by this fan of triangles. The circumcenters obviously form our nice little Voronoi cell at the end.

data Delaunay = Delaunay {
  dlyPoint :: !Float2,
  dlyRing :: [Float2],
  vorCell :: [(Float2,Float)] } deriving (Eq, Show)

initDly :: Float2 -> Delaunay
initDly c = Delaunay {
  dlyPoint = c,
  dlyRing = p0,
  vorCell = zipWith (circumcircle c) p0 (rotateList p0)
  } where
    p0 = [float2 1.5 0.5, float2 0.5 1.5, float2 (-0.5) 0.5, float2 0.5 (-0.5)]

The utility function initDly just makes a massive quad around the test point so we can start from a known (but uselessly huge) triangulation. Some more tangential stuff:

circumcircle :: Float2 -> Float2 -> Float2 -> (Float2,Float)
circumcircle a b c = (cen,rad)
  where
    cen = circumcenter a b c
    rad = circumradius a b c

rotateList :: [a] -> [a]
rotateList [] = []
rotateList (x:xs) = xs ++ [x]

rotateUntil :: (a -> Bool) -> [a] -> [a]
rotateUntil _ [] = []
rotateUntil _ [a] = [a]
rotateUntil p xs = go (take (length xs) (iterate rotateList xs))
  where
    go [] = xs
    go (ys@(y:_) : yss)
      | p y = ys
      | otherwise = go yss

The functions to calculate the circumcenter and circumradius of a triangle are omitted because I shamelessly translated the implementations from here, and they're quite long and uninteresting. The list rotations are ugly but made things easy.

data Halfplane = Halfplane Float2 Float2 Float
halfplane :: Float2 -> Float2 -> Halfplane
halfplane v o = Halfplane o (normalise d) (0.5 * vecLen d) where d = v - o

halfplaneTest :: Halfplane -> Float2 -> Float
halfplaneTest (Halfplane o n d) x = d - (n `dot` (x - o))

inHalfplane :: Halfplane -> Float2 -> Bool
inHalfplane h x = halfplaneTest h x >= 0

Testing whether or not points are inside the halfplane defined by the bisector of two points is key to the incremental bit of the algorithm, where we test a point against the current local triangulation:

updateDT :: Delaunay -> Float2 -> Delaunay
updateDT locDT v = if inHvQ == q0 then locDT else locDT { dlyRing = p1, vorCell = q1 }
  where
    c = dlyPoint locDT
    p0 = dlyRing locDT
    q0 = vorCell locDT
    hv = halfplane v (dlyPoint locDT)
    passHv = inHalfplane hv . fst . fst
    failHv = not . passHv
    --ensure the prefix of the one-ring/voronoi polygon is in halfplane
    qp0 = rotateUntil passHv (rotateUntil failHv (zip q0 p0))
    (inHv, outHv) = span passHv qp0
    (inHvQ, inHvP) = unzip inHv
    --avoid strictness, profit
    dp0 = snd (head outHv)
    dp1 = head inHvP
    newQ0 = circumcircle c dp0 v
    newQ1 = circumcircle c v dp1
    q1 = newQ0 : newQ1 : inHvQ
    p1 = dp0 : v : inHvP

Again, ugly. Apart from the noise of packing and unpacking tuples, this code is essentially rotating both the one-ring and Voronoi cell until the circumcenters outside the halfplane under consideration comprise the suffix of the list. If no points fall outside the halfplane, the triangulation remains unchanged and we just return it. Otherwise we introduce the new point in place of those we removed and stitch everything together. Note that although we furiously and pointlessly rotate both the one-ring and Voronoi polygon, they never change winding order.

Finally, to actually calculate the local triangulation from scratch:

localDT :: (SpatialStructure2D a, Query a ~ Float2) => a -> Float -> Float2 -> Delaunay
localDT tree r c = updateFrom dly0
  where
    dly0 = foldl' updateDT (initDly c) (filter (/= c) v0)
    v0 = queryCircle tree c r
    updateFrom dly = if null vs then dly else updateFrom (updateDT dly (head $ head vs))
      where
        fl = filter (`notElem` (c: (dlyRing dly)))
        vss :: [[Float2]]
        vss = map (fl . uncurry (queryCircle tree)) (vorCell dly)
        vs = dropWhile null vss

Technically I should rewrite this to use the positionOf and relax the restriction on Query a. Oops. Anyway!
Because we know the minimum distance between points and that the point set is relatively dense, passing in an initial query radius to quickly cut the initial working set down to size is a huge win. Then all that is left is to try each circumcircle in the current triangulation - if it might add a new point, update the local triangulation and recurse. When every circumcircle is empty (excepting those points already in the triangulation), we're done.

Now all that we need to do is something like map (localDT tree (r*2.0)) points `using` parListChunk 32 rseq, where tree is a spatial structure containing our random points, r is the minimum distance between those random points... and we get parallel computation of all the interesting things.

Now, there are lots and lots of bits of code that are ugly and somewhat hacky in here, and it could certainly be optimised hugely (profiling suggests most of the time is spent traversing my k-d tree), this still manages to spit the periodic Voronoi diagram for ~64k points to a .obj file on disc in around ten seconds (on an 8-core i7). Should absolutely be sufficient for my purpose as it stands.

And courtesy of Blender and the subdivision modifier, here are those little cells with a bit of smoothing applied:

No comments: