Brush Tasks

The backend for Blender’s sculpt system is a simple coarse bounding volume hierarchy, which we call PBVH. It’s a KD-tree that uniquely assigns vertices to leaf nodes. To access node vertices we have a rather hideous C macro, BKE_pbvh_vertex_iter_begin. All of this had lead to a code pattern that’s fairly verbose with lots of copy-and-pasting of code. For example, here is is the code for the smooth brush (scroll down for the improved version):

static void do_smooth_brush_task_cb_ex(void *__restrict userdata,
                                       const int n,
                                       const TaskParallelTLS *__restrict tls)
{
  SculptThreadedTaskData *data = static_cast<SculptThreadedTaskData *>(userdata);
  SculptSession *ss = data->ob->sculpt;
  Sculpt *sd = data->sd;
  const Brush *brush = data->brush;
  const bool smooth_mask = data->smooth_mask;
  float bstrength = data->strength;

  PBVHVertexIter vd;

  CLAMP(bstrength, 0.0f, 1.0f);

  //Copy and pasted code block!
  SculptBrushTest test;
  SculptBrushTestFn sculpt_brush_test_sq_fn = SCULPT_brush_test_init_with_falloff_shape(
      ss, &test, data->brush->falloff_shape);

  //Copy and pasted code block!
  const int thread_id = BLI_task_parallel_thread_id(tls);
  AutomaskingNodeData automask_data;
  SCULPT_automasking_node_begin(
      data->ob, ss, ss->cache->automasking, &automask_data, data->nodes[n]);

    BKE_pbvh_vertex_iter_begin (ss->pbvh, data->nodes[n], vd, PBVH_ITER_UNIQUE) {
      //Copy and pasted code!
      if (!sculpt_brush_test_sq_fn(&test, vd.co)) {
        continue;
      }

    //Copy and pasted code!
    SCULPT_automasking_node_update(ss, &automask_data, &vd);

    //Copy and pasted code.
    const float fade = bstrength * SCULPT_brush_strength_factor(
                                         ss,
                                         brush,
                                         vd.co,
                                         sqrtf(test.dist),
                                         vd.no,
                                       vd.fno,
                                         smooth_mask ? 0.0f : (vd.mask ? *vd.mask : 0.0f),
                                         vd.vertex,
                                         thread_id,
                                         &automask_data);
      if (smooth_mask) {
        float val = SCULPT_neighbor_mask_average(ss, vd.vertex) - *vd.mask;
        val *= fade * bstrength;
        *vd.mask += val;
        CLAMP(*vd.mask, 0.0f, 1.0f);
      }
      else {
        float avg[3], val[3];
        SCULPT_neighbor_coords_average_interior(ss, avg, vd.vertex);
        sub_v3_v3v3(val, avg, vd.co);
        madd_v3_v3v3fl(val, vd.co, val, fade);
        SCULPT_clip(sd, ss, vd.co, val);
        if (vd.is_mesh) {
          BKE_pbvh_vert_tag_update_normal(ss->pbvh, vd.vertex);
        }
      }
    }
    BKE_pbvh_vertex_iter_end;
  }

static void SCULPT_smooth(
    Sculpt * sd, Object * ob, Span<PBVHNode *> nodes, float bstrength, const bool smooth_mask)
{
  SculptSession *ss = ob->sculpt;
  Brush *brush = BKE_paint_brush(&sd->paint);

    SculptThreadedTaskData data{};
    data.sd = sd;
    data.ob = ob;
    data.brush = brush;
    data.nodes = nodes;
    data.smooth_mask = smooth_mask;
    data.strength = bstrength;

    TaskParallelSettings settings;
    BKE_pbvh_parallel_range_settings(&settings, true, nodes.size());
    BLI_task_parallel_range(0, nodes.size(), &data, do_smooth_brush_task_cb_ex, &settings);
}

What we want is something that’s more readable (and C++), something more like this:

void SCULPT_smooth_new(
    Sculpt * sd, Object * ob, Span<PBVHNode *> nodes, float bstrength, const bool smooth_mask)
{
  SculptSession *ss = ob->sculpt;
  Brush *brush = BKE_paint_brush(&sd->paint);

  blender::editors::sculpt::exec_brush(
      ob,
      nodes, nullptr,
      [&](auto &range) {
        for (auto &vd : range) {
          const float fade = bstrength * vd.falloff;

          if (smooth_mask) {
            float val = SCULPT_neighbor_mask_average(ss, vd.vertex) - *vd.mask;
            val *= fade * bstrength;
            *vd.mask += val;
            CLAMP(*vd.mask, 0.0f, 1.0f);
          }
          else {
            float avg[3], val[3];
            SCULPT_neighbor_coords_average_interior(ss, avg, vd.vertex);
            sub_v3_v3v3(val, avg, vd.co);
            madd_v3_v3v3fl(val, vd.co, val, fade);
            SCULPT_clip(sd, ss, vd.co, val);
            if (vd.is_mesh) {
              BKE_pbvh_vert_tag_update_normal(ss->pbvh, vd.vertex);
            }
          }
        }
      });
  //}
}

This is much better. But the challenge doesn’t stop with writing a cleaner task API.

Task Boundaries

We need to choose a good boundary for task splitting to efficiently use all CPU cores. Splitting tasks by PBVHNodes can be highly inefficient since the nodes themselves are too large to make good task boundaries. Oftentimes only one or two nodes will be under the user’s brush cursor.

The proper solution is to make leaf nodes smaller (this would also speed up ray casting quite a bit). Unfortunately GPU meshes live in leaf nodes, which limits how small we can make them.

The stopgap solution I implemented was to put all the verts in the nodes under the brush cursor into an array and then pre-filter it to throw out any that lie outside the brush; that way I can use any task grain size I want. Unfortunately while this is quite a bit faster in many cases if the user zooms out and applies the brush on the entire mesh (or is working with a particularly dense mesh) it becomes slower.

For this reason exec_brush is implemented with two back-ends, one that splits parallel tasks by nodes and another that builds a flat vertex list. This is why the main exec callback is a generic lambda, to facilitate the use of multiple iterator implementations.

Per-Node Custom Data

Often times when writing brushes you want to attach data to PBVHNodes. You might also want to do something to a list of nodes prior to the main execution task. This can be done via a special user-provided struct:

void SCULPT_smooth_new(
    Sculpt * sd, Object * ob, Span<PBVHNode *> nodes, float bstrength, const bool smooth_mask)
{
  SculptSession *ss = ob->sculpt;
  Brush *brush = BKE_paint_brush(&sd->paint);

  struct MyNodeData {
    bool modified = false;
  };

  blender::editors::sculpt::exec_brush<MyNodeData>(
      ob,
      nodes,
      /* Create per-node data. /
      [&](PBVHNode *node) {return MyNodeData()},
      [&](auto &range) { /* Main exec lambda. */
        for (auto &vd : range) {
          /* We have various options to thread-safely update
           * node_data->modified.  The safest is to make it an int
           * and use atomics. Note that x86 has implicit 
           * atomic stores, but of course we might be 
           * compiling on ARM. */
          vd.node_data->modified = true;
        }
      },
      [&](PBVHNode *node, MyNodeData *node_data) {
        if (node_data->modified) {
          /* Do something to node. */
        }
      })
}

PBVH Types and PBVHNode Vertex Iterators

Even though we have this nice high-level task interface, we still need an iterator for accessing PBVHNode vertices. This is not as easy as it looks since Blender’s sculpt mode actually has three different back-ends: one for simple meshes (PBVH_FACES), one for multires (PBVH_GRIDS) and one for dynamic topology (PBVH_BMESH). But thanks to the magic of templates it’s not too hard, either.

Clothoid Splines

A century ago engineers had very good and robust means of drafting 2D curves using specialized spline sets and curve templates (e.g. French curves). But these proved difficult to replicate in early computers; the need for fast, simple algorithms drove first Bezier, and then de Boor to embrace the idea of polynomial-based splines.

Creating a spline from independent polynomial functions along the x/y/z axes has a number of problems. Such splines cannot be represented in arc length form and, more importantly, controlling their curvature functions is not easy. Polynomial splines are computationally simple but mathematically complex.

Clothoids

FrenchCurve — Example of a polynomial clothoid

A better way to do splines is to start with a curvature function with the properties you want and integrate backwards. For example in civil engineering the three basic shapes are classical clothoids (whose curvature function is linear), circular arcs (constant curvature) and lines (zero curvature). Since these are all linear functions we can represent all three with a spline whose curvature function is simple linear interpolation:

$k = c_1(1.0-t) + c_2t$

Where $c_1$ , $c_2$ are constants and $t$ is a point on the curve in parameter space (It’s actually a bit more useful to use a cubic polynomial for curvature, but that’ll come later).

To integrate it in 2D, all we do is plug it into the following formula:

$th = \int_0^t k(t) dt$

$x = \int_0^t sin(th) dt$

$y = \int_0^t cos(th) dt$

Note that $th$ can be integrated analytically, but $x$ and $y$ must be integrated numerically. This isn’t terribly difficult on modern computers, especially since there’s a pretty easy to trick to integrating any arc-length-parameter plane curve.

So we now have a curve whose curvature starts at $c_1$ and ends at $c_2$ . But it isn’t very useful yet. We need to control the start and end points of the curve. For that, we just take a page from Raph Levien’s book and simply rotate, scale, and translate our curve to fit between whatever two points we want.

So now we have a function that takes two spatial points, $p_1$ and $p_2$ , and two curvature points, the point $t$ in parameter space. It outputs a point along the curve in space, along with the rotation and scaling necessary to fit the curve between its two endpoints.

But this isn’t good enough. We want to string several of these curves together, and we want to enforce constraints at shared vertices: tangential continuity, curvature continuity, whatever we want. For basic G2 continuity we have six constraints at the endpoints of each curve segment: the segment must span two points in space, have shared tangents with adjacent curves and shared curvatures. We’ve eliminated two of these constraints by how we’ve formulated the curve, leaving us with four. This leads us to Raph Levien’s idea of using a cubic polynomial for our curvature function. It doesn’t matter what form the polynomial takes; I like to use 1D Beziers myself, but you also use a four-segment piecewise-linear function if you want.

We can represent the cubic like this:

$lt(a, b, t) = a(1.0-t) + bt$

$cubic = lt(lt(lt(c_1, c_2, t), lt(c2, c3, t)), lt(lt(c_2, c_3, t), lt(c3, c4, t)), t)$

Which is the “intuitive” form of the Bezier polynomial. One can also use more traditional Bernstein basis functions to get the same equation:

$b(v, n) = binomial(n, v)x^v(1 - x)^(n-v)$

$cubic = b(0, 3)*x + b(1, 3)*x^2 + b(2, 3)*x^3$

Or we could dispense with interpolation formulas entirely and just use a raw cubic:

$cubic = c_1 + c_2*x + c_3*x^2 + c_4*x^3$

I like the Bezier (Bernstein) form as the curvature at the endpoints is equal to the first and last coefficients.

Numerical Integration

Integrating this curve isn’t terribly difficult. This is due to a nice property of arc-length parameter curves, where every point in $t$ space is equally spaced from the last.

$normal = conjugate(tangent) * k$

$dx = sin(th) dt$

$dy = cos(th) dt$

$d^2x = dy * k$

$d^2y = -dx * k$

Thus, the higher derivatives are all functions of the first derivative the curvature function and its derivatives. This makes it easy to construct a Taylor approximation.

$f(x) = dx * x - \frac{dy * k}{2}x^2$

$f(y) = dy * x + \frac{dx * k}{2}x^2$

Quadrature

However, a simple Taylor approximation will not work. We need to use it in a quadrature. For example, we might do something like this bit of python code:

def quadrature(t):
  steps = [some number of steps, e.g. 5]
  s = 0
  ds = t / steps
  x = 0
  y = 0
  
  for i in range(steps):
    k = curvature(s)
    th = [integral of k]
    
    dx = sin(th)
    dy = cos(th)

    x += dx*ds - 0.5*dy*ds*ds
    y += dy*ds + 0.5*dx*ds*ds

    s += ds
  return [x, y]

This might be wrapped in another function, that fits the clothoid between points p1 and p2:

def getpoint(t, p1, p2):
  start = quadrature(0.0)
  end = quadrature(1.0)
  vector = end - start
  
  angle = atan2(vector.y, vector.x)
  scale = length(p1 - p2) / length(vector)

  p = quadrature(t) - start
  p = rotate(p, angle)
  p = scale(p, scale)
  p = p + p1;

  return [p.x, p.y, scale, angle]

Practical Splines

Now it’s time to build a spline out of these curves. To do so we need some way of maintaining continuity of the vertices. The idea is to build a list of constraints, e.g. tangent constraint, curvature constraint, etc (see page 119 of Raph Levien’s phd thesis). Then use any non-linear constraint solver to solve them. The number of parameters in the curvature function should be equal to the maximum number of constraints.

Each constraint outputs an error value. For example, a tangent constraint would take two segments, and return the angle between the tangents at their shared vertex. A curvature constraint would return the difference between the curvatures at the vertex (made sure to put the curvature in global space by dividing by the scale that was returned in getpoint).

For a non-linear constraint solver, you can use pretty much anything. I like to use a modified version of the Geiss-Seidal solver from this game physics paper. It’s very flexible and doesn’t require that the number of constraints and parameters perfectly match. Raph Levien uses a simple Jacobian method of multiplying the parameters by the inverse of the constraint matrix Jacobian.

Results

What you get is a nice, pretty looking spline that’s incredibly versatile. It’s useful for everything from designing high-speed rail networks to vector graphics to even 2D animation.

Prologue: Why “Modern” Polynomial Splines Suck

It’s funny. A hundred years ago engineers had a perfectly good, mathematically robust means of drafting curves. Curvature was easy to control, with specialized spline sets and curve templates (e.g. French curves) that made smooth curved shapes easy to make.

And then Bezier came on the scene, followed by de Boor, and set engineering drafting back to the dark ages. Modern computers can replicate the robustness and ease of traditional drafting tools, but this wasn’t true at the dawn of CAD in the late 60s. The need for fast, simple algorithms drove first Bezier, and then de Boor to embrace the idea of polynomial-based splines.

Simple polynomial splines have an enormous number of problems. They can’t be represented in arc length form. It is extremely difficult to control their curvature, which is often “wavy.” But most of all it just doesn’t make sense mathematically to split curves into independent x, y, and z functions.

Traditional drafting understood this. The wooden splines of old guaranteed curvature continuity in a way that’s very difficult to do with simple polynomials today. Curve templates were based on the “ideal elastica”, which was alternatively called the clothoid, the Cornu spiral, the Euler spiro, and the French curve (in the art world).

Dither Masks With Controlled Noise Spectra

Sampling is the bane of computer graphics. Aliasing, accuracy, and noise must all be traded off against each other. A sampling method that works well for low sample counts might be inferior at high sample counts. As an example of the sort of problem sampling is meant to solve, take this simple grid:

grid1

None of the red points fall inside the red rectangle. This is known as aliasing. We can mitigate this by adding noise to the sampling points:

grid2

Now some of the red sampling points do fall within the red rectangle (I know, I know, I should have picked contrasting colors). But how do we “add noise” exactly? This is actually a fairly difficult problem in computer graphics. Let’s look at another problem, image stippling. Say we want to render an image entirely in black and white (no grey). To do so we compare every pixel in the image with a mask. If the pixel is brighter than the mask value we make it white, otherwise we make it black.

What should this mask look like? If we use uniform white noise, the results are terrible:

uniform

The stippling mask is in the top right corner (it’s tiled across the image). Luckily there is another, better type of noise: blue noise.

voidcluster

Much better! The mask in the upper right corner was generated using the “void and cluster” method, originally developed for the printing industry. Void and cluster generates so-called “blue noise”, with a 2D Fourier spectrum that looks like this:

voidcluster_dft

The line in the middle is a bug in my FFT code. There aren’t many good FFT libraries in JavaScript, my preferred language for prototyping research stuff. I could have used a DFT, but that would have been too slow for what I’m doing.

As you can see, the high frequencies in the middle are suppressed. This makes nice, visually pleasing noise patterns that aren’t too noticeable.

Methods for generating blue noise generally fall under two categories: ones that generate simple point sets and ones that generate progressive point sets suitable for image stippling, importance sampling in ray tracing, etc. Void and cluster falls under the latter category.

A lot of research has been done on generating simple point sets with blue noise properties, but virtually no research has been done for progressive sets since the 90s. An important exception is Wei’s Multi-Class Blue Noise Sampling, published in 2010. Playing around in this area is one of my hobbies. One of the challenges is simply to know what’s possible to achieve in terms of sampling properties. It’s relatively simple to make progressive blue noise sets in 2D; void-and-cluster is a pretty simple algorithm. As explained in this excellent blog post, however, things get more complicated in 3D. For higher dimensional noise it’s often desirable that the blue noise property hold in lower dimensions as well as higher ones. As explained in the blog post, this isn’t possible as a general matter. Trade-offs must be made.

Blue noise methods typically work by either enforcing a set of constraints or minimizing some function, but another method is to transform a uniform white noise distribution into blue noise by targeting the frequency spectra directly. This is actually how the first progressive blue noise algorithm, the predecessor to void-cluster, worked. It’s very computationally intensive and not very practical in higher dimensions, but it can help explore different Fourier spectra.

It’s pretty difficult to optimize the Fourier spectrum of point sets that aren’t aligned with a grid, mostly because there aren’t any FFT algorithms for this purpose (that I know off) and normal DFT is pretty slow. But if we restrict ourselves to void-cluster-style progressive point sets that generate points inside of grids then a really simple algorithm suggests itself:

Fill a grid (2D or otherwise) with white noise.
Compute the FFT.
Randomly swap some points.
Computer the FFT again. If the new FFT spectrum is closer to our target spectrum than the old one, keep the swapped points and the new FFT. Otherwise, unswap them and keep the old FFT.
Repeat from 3 until convergence.

Here’s an example of the output of this process, scaled up 4x:

fft_opt1

The target FFT is on the right, the generated point set is on the right and its FFT is in the middle. As you can see, the target FFT is a bit weird, and the point set isn’t terribly good, but that’s what this method is for: to see what sort of noise is generated by different FFT spectra and what sort of trade-offs we can make.

This is especially useful in 3D:

fftopt3d

Here we have four sets of slices of a 16×16 noise set and it’s associated FFTs. The top left is the 3D target FFT. The top right is the generated noise. The bottom left is the noise’s 3D FFT, while the bottom right are 2D FFTs for each slice of the noise along the X axis.

Versatile, eh? One must love brute-force stochastic optimization. So easy. Of course this method suffers from the “curse of dimensionality” like no other, but it’s not really meant for practical use, it’s just to see what’s possible.

Anyway, I plan on publishing the code for this soon, so watch out for that. Note that the patent on generating blue noise via FFT optimization recently expired, so no need to worry about that (I don’t even know if it would have covered my method or not, I’m not a patent lawyer).

First Post

I’ll be using this blog to post notes on various computer graphics research topics I’m interested in and/or have contributed to, e.g. clothoid splines, blue noise sampling, image stippling, ray tracing, CAD software, etc.