LARA

Barnes-Hut Simulation

In this assignment, you will implement the parallel Barnes-Hut algorithm for N-body simulation. N-body simulation is a simulation of a system of N particles that interact with physical forces, such as gravity or electrostatic force. Given initial positions and velocities of all the particles (or bodies), the N-body simulation computes the new positions and velocities of the particles as the time progresses. It does so by dividing time into discrete short intervals, and computing the positions of the particles after each interval.

Handout

Click here to obtain the handout for this assignment. Do not forget to create a branch for this assignment. Instructions on how to do so are available here.


Let us recall some basics of classical mechanics. What makes a particle, such as an atom, a comet or a star, move through space? First, the particle moves through space as a consequence of having some initial velocity compared to other particles. Second, the particle changes its velocity as a result of external forces. These forces are in direct correlation with the proximity of other particles. For example, Earth moves through space with some initial velocity, which is constantly being changed due to the gravitational force between the Earth and the Sun.

Before we study the Barnes-Hut algorithm for the N-body simulation problem, we will focus on a simpler algorithm – the direct sum N-body algorithm. The direct sum algorithm consists of multiple iterations, each of which performs the following steps for each particle:

Step 1

The particle position is updated according to its current velocity (`delta` is a short time period).

      x' = x + v_x * delta
      y' = y + v_y * delta
Step 2

The net force on the particle is computed by adding the individual forces from all the other particles.

      F_x = F_1x + F_2x + F_3x + ... + F_Nx
      F_y = F_1y + F_2y + F_3y + ... + F_Ny
Step 3

The particle velocity is updated according to the net force on that particle.

      v_x' = v_x + F_x / mass * delta
      v_y' = v_y + F_y / mass * delta

In this exercise, we will assume that the force between particles is the gravitational force from classical mechanics. Let's recall the formula for the gravitational force between two stellar bodies:

  F = G * (m1 * m2) / distance^2

Above, `F` is the absolute value of the gravitational force, `m1` and `m2` are the masses of the two bodies, and `r` is the distance between them. `G` is the gravitational constant.

The gravitational force vector always points towards the other body. The `F_x` and `F_y` components of the gravitational force on the body `m1` can be computed by observing the following triangle similarity:

  distance = math.sqrt((x2 - x1) ^ 2 + (y2 - y1) ^ 2)
  F_x / F = (x2 - x1) / distance
  F_y / F = (y2 - y1) / distance

This is shown in the following figure:

For each particle, the net force is computed by summing the components of individual forces from all other particles, as shown in the following figure:

The direct sum N-body algorithm is very simple, but also inefficient. Since we need to update `N` particles, and compute `N - 1` force contributions for each of those particles, the overall complexity of an iteration step of this algorithm is `O(N^2)`. As the number of particles grows larger, the direct sum N-body algorithm becomes prohibitively expensive.

The Barnes-Hut algorithm is an optimization of the direct sum N-body algorithm, and is based on the following observation:


If a cluster of bodies is sufficiently distant from a body A, the net force on A from that cluster can be approximated with one big body with the mass of all the bodies in the cluster, positioned at the center of mass of the cluster.


This is illustrated in the following figure:

To take advantage of this observation, the Barnes-Hut algorithm relies on a quadtree – a data structure that divides the space into cells, and answers queries such as 'What is the total mass and the center of mass of all the particles in this cell?'. The following figure shows an example of a quadtree for 6 bodies:

Above, the total force from the bodies B, C, D and E on the body A can be approximated by a single body with mass equal to the sum of masses B, C, D and E, positioned at the center of mass of bodies B, C, D and E. The center of mass `(massX, massY)` is computed as follows:

  mass = m_B + m_C + m_D + m_E
  massX = (m_B * x_B + m_C * x_C + m_D * x_D + m_E * x_E) / mass
  massY = (m_B * y_B + m_C * y_C + m_D * y_D + m_E * y_E) / mass

An iteration of the Barnes-Hut algorithm is composed of the following steps:

  1. Construct the quadtree for the current arrangement of the bodies.
    1. Determine the boundaries, i.e. the square into which all bodies fit.
    2. Construct a quadtree that covers the boundaries and contains all the bodies.
  2. Update the bodies – for each body:
    1. Update the body position according to its current velocity.
    2. Using the quadtree, compute the net force on the body by adding the individual forces from all the other bodies.
  3. Update the velocity according to the net force on that body.

It turns out that, for most spatial distribution of bodies, the expected number of cells that contribute to the net force on a body is `log n`, so the overall complexity of the Barnes-Hut algorithm is `O(n log n)`.


Now that we covered all the necessary theory, let's finally dig into the implementation! You will implement:

  1. a quadtree and its combiner data structure
  2. an operation that computes the total force on a body using the quadtree
  3. a simulation step of the Barnes-Hut algorithm

Since this assignment consists of multiple components, we will follow the principles of test-driven development and test each component separately, before moving on to the next component. That way, if anything goes wrong, we will more precisely know where the error is. It is always better to detect errors sooner, rather than later.

Data Structures

We will start by implementing the necessary data structures: the quadtree, the body data-type and the sector matrix. You will find the stubs in the `package.scala` file of the `barneshut` package.

Quadtree Data Structure

In this part of the assignment, we implement the quadtree data structure, denoted with the abstract data-type `Quad`. Every `Quad` represents a square cell of space, and can be one of the following node types:

  1. an `Empty` node, which represents an empty quadtree
  2. a `Leaf` node, which represents one or more bodies
  3. a `Fork` node, which divides a spatial cell into four quadrants

The definition of `Quad` is as follows:

  sealed abstract class Quad {
    def massX: Float
    def massY: Float
    def mass: Float
    def centerX: Float
    def centerY: Float
    def size: Float
    def total: Int
    def insert(b: Body): Quad
  }

Here, `massX` and `massY` represent the center of mass of the bodies in the respective cell, `mass` is the total mass of bodies in that cell, `centerX` and `centerY` are the coordinates of the center of the cell, `size` is the length of the side of the cell, and `total` is the total number of bodies in the cell.

Note that we consider the top left corner to be at coordinate (0, 0). We also consider the x axis to grow to the right and the y axis to the bottom.

The method `insert` creates a new quadtree which additionally contains the body `b`, and covers the same area in space as the original quadtree. Quadtree is an immutable data structure – `insert` does not modify the existing `Quad` object. Note that `Body` has the following signature:

  class Body(val mass: Float, val x: Float, val y: Float, val xspeed: Float, val yspeed: Float)

In this part of the exercise, you only need to know about body's position `x` and `y`.

Let's start by implementing the simplest `Quad` type – the empty quadtree:

  case class Empty(centerX: Float, centerY: Float, size: Float) extends Quad

The center and the size of the `Empty` quadtree are specified in its constructor. The `Empty` tree does not contain any bodies, so we specify that its center of mass is equal to its center.

Next, let's implement the `Fork` quadtree:

  case class Fork(nw: Quad, ne: Quad, sw: Quad, se: Quad) extends Quad

This node is specified by four child quadtrees `nw`, `ne`, `sw` and `se`, in the northwest, northeast, southwest and southeast quadrant, respectively. The constructor assumes that the children nodes that represent four adjacent cells of the same size and adjacent to each other, as in the earlier figure. The center of the `Fork` quadtree is then specified by, say, the lower right corner of the quadtree `nw`. If the `Fork` quadtree is empty, the center of mass coincides with the center.

Inserting into a `Fork` is recursive – it updates the respective child and creates a new `Fork`.

Finally, the `Leaf` quadtree represents one or more bodies:

  case class Leaf(centerX: Float, centerY: Float, size: Float, bodies: Seq[Body])
  extends Quad

If the `size` of a `Leaf` is greater than a predefined constant `minimumSize`, inserting an additonal body into that `Leaf` quadtree creates a `Fork` quadtree with empty children, and adds all the bodies into that `Fork` (including the new body). Otherwise, inserting creates another `Leaf` with all the existing bodies and the new one.

The Body Data-Type

Next, we can implement the `Body` data-type:

  class Body(val mass: Float, val x: Float, val y: Float, val xspeed: Float, val yspeed: Float) {
    def updated(quad: Quad): Body = ???
  }

Here, `xspeed` and `yspeed` represent the velocity of the body, `mass` is its mass, and `x` and `y` are the coordinates of the body.

The most interesting method on the `Body` is `updated` – it takes a quadtree and returns the updated version of the `Body`:

  def updated(quad: Quad): Body

This method is already half-completed for you – you only need to implement its nested method `traverse`, which goes through the quadtree and proceeds casewise:

  1. empty quadtree does not affect the net force
  2. each body in a leaf quadtree adds some net force
  3. a fork quadtree that is sufficiently far away acts as a single point of mass
  4. a fork quadtree that is not sufficiently far away must be recursively traversed

When are we allowed to approximate a cluster of bodies with a single point? The heuristic that is used is that the size of the cell divided by the distance `dist` between the center of mass and the particle is less than some constant `theta`:

  quad.size / dist < theta

Hint: make sure you use the `distance` to compute distance between points, the `theta` value for the condition, and `addForce` to add force contributions!

Before proceeding, make sure to run tests against your `Quad` and `Body` implementations.

The Sector Matrix

The last data structure that we will implement is the sector matrix. In this data structure, we will use the auxiliary class `Boundaries`, which contains the `minX`, `maxX`, `minY` and `maxY` fields for the boundaries of the scene:

  class Boundaries {
    var minX: Float
    var minY: Float
    var maxX: Float
    var maxY: Float
    def size = math.max(maxX - minX, maxY - minY)
  }

We will also rely on the `ConcBuffer` data structure, mentioned in the lecture:

  class ConcBuffer[T]

The `ConcBuffer` class comes with efficient `+=`, `combine` and `foreach` operations, which add elements into the buffer, combine two buffers and traverse the buffer, respectively. The sector matrix additionally has the `toQuad` method, which returns a quadtree that contains all the elements previously added with the `+=` method. Recall from the lectures that this combination of methods make the `ConcBuffer` a combiner.

The `SectorMatrix` is just a square matrix that covers a square region of space specified by the boundaries:

  class SectorMatrix(val boundaries: Boundaries, val sectorPrecision: Int) {
    val sectorSize = boundaries.size / sectorPrecision
    val matrix = new Array[ConcBuffer[Body]](sectorPrecision * sectorPrecision)
    for (i <- 0 until matrix.length) matrix(i) = new ConcBuffer
    def apply(x: Int, y: Int) = matrix(y * sectorPrecision + x)
  }

The `sectorPrecision` argument denotes the width and height of the matrix, and each entry contains a `ConcBuffer[Body]` object. Effectively, the `SectorMatrix` is a combiner – it partitions the square region of space into `sectorPrecision` times `sectorPrecision` buckets, called sectors.

Combiners such as the `SectorMatrix` are used in parallel programming to partition the results into some intermediate form that is more amenable to parallelization. Recall from the lecture that one of the ways to implement a combiner is by using a bucket data structure – we will do exactly that in this part of the exercise! We will add three methods on the `SectorMatrix` that will make it a combiner. We start with the `+=` method:

  def +=(b: Body): SectorMatrix = {
    ???
    this
  }

This method should use the body position, `boundaries` and `sectorPrecision` to determine the sector into which the body should go into, and add the body into the corresponding `ConcBuffer` object.

Next, we implement the `combine` method, which takes another `SectorMatrix`, and creates a `SectorMatrix` which contains the elements of both input `SectorMatrix` data structures:

  def combine(that: SectorMatrix): SectorMatrix

This method calls `combine` on the pair of `ConcBuffer`s in `this` and `that` matrices to produce the `ConcBuffer` for the resulting matrix.

The nice thing about the sector matrix is that a quadtree can be constructed in parallel for each sector. Those little quadtrees can then be linked together. The `toQuad` method on the `SectorMatrix` does this:

  def toQuad(parallelism: Int): Quad

This method is already implemented – you can examine it if you would like to know how it works.

Congratulations, you just implemented your first combiner (or second, if you've attended the live exercise sessions)! Before proceeding, make sure to run those unit tests.

Implementing Barnes-Hut

Now that we have all the right data structures ready and polished, implementing Barnes-Hut becomes a piece of cake.

Take a look at the file `Simulator.scala`, which contains the implementation of the Barnes-Hut simulator, and in particular the `step` method. The `step` method represents one step in the simulation:

  def step(bodies: Seq[Body]): (Seq[Body], Quad) = {
    // 1. compute boundaries
    val boundaries = computeBoundaries(bodies)
    
    // 2. compute sector matrix
    val sectorMatrix = computeSectorMatrix(bodies, boundaries)

    // 3. compute quadtree
    val quad = computeQuad(sectorMatrix)
    
    // 4. eliminate outliers
    val filteredBodies = eliminateOutliers(bodies, sectorMatrix, quad)

    // 5. update body velocities and positions
    val newBodies = updateBodies(filteredBodies, quad)

    (newBodies, quad)
  }

The pre-existing code in `step` nicely summarizes what this method does.

Computing the Scene Boundaries

First, we must compute the boundaries of all the bodies in the scene. Since bodies move and the boundaries dynamically change, we must do this in every iteration of the algorithm. The `computeBoundaries` method is already implemented – it uses the `aggregate` combinator on the sequence of bodies to compute the boundaries:

  def computeBoundaries(bodies: Seq[Body]): Boundaries = {
    val parBodies = bodies.par
    parBodies.tasksupport = taskSupport
    parBodies.aggregate(new Boundaries)(updateBoundaries, mergeBoundaries)
  }

How does this work? The `aggregate` method divides the input sequence into a number of chunks. For each of the chunks, it uses the `new Boundaries` expression to create the accumulation value, and then folds the values in that chunk calling `updateBoundaries` on each body, in the same way a `foldLeft` operation would. Finally, `aggregate` combines the results of different chunks using a reduction tree and `mergeBoundaries`.

So, we need the `updateBoundaries` method:

  def updateBoundaries(boundaries: Boundaries, body: Body): Boundaries

Given an existing `boundaries` object and a body, the `updateBoundaries` updates the `minX`, `minY`, `maxX` and `maxY` values so that the boundaries include the body.

Next, the `mergeBoundaries` method creates a new `Boundaries` object, which represents the smallest rectangle that contains both the input boundaries:

  def mergeBoundaries(a: Boundaries, b: Boundaries): Boundaries

Question: Is `mergeBoundaries` associative? Is it commutative? Does it need to be commutative?

Implement these two methods, and test that they work correctly!

Building the Quadtree

Next, we need to build a `Quad` tree from the sequence of bodies. We will first implement the `computeSectorMatrix` method to get the `SectorMatrix`:

  def computeSectorMatrix(bodies: Seq[Body], boundaries: Boundaries): SectorMatrix

Hint: aggregate the `SectorMatrix` from the sequence of bodies, the same way it was used for boundaries. Use the SECTOR_PRECISION constant when creating a new `SectorMatrix`.

Test that these methods work correctly before proceeding!

Eliminating Outliers

During the execution of the Barnes-Hut algorithm, some of the bodies tend to move far away from most of the other bodies. There are many ways to deal with such outliers, but to keep things simple, we will eliminate bodies that move too fast and too far away.

We will not go into details of how this works, but if you'd like to know more, you can try to understand how the `eliminateOutliers` method works.

Updating Bodies

The `updateBodies` method uses the quadtree to map each body from the previous iteration of the algorithm to a new iteration:

  def updateBodies(bodies: Seq[Body], quad: Quad): Seq[Body]

Recall that we already implemented the `updated` method which updates a single body.

Running Barnes-Hut

At last, the parallel Barnes-Hut algorithm is implemented. Note that, despite all the work, we kept our Barnes-Hut algorithm implementation simple and avoided the details that a more realistic implementation must address. In particular:

  1. we represented each body as a single point in space
  2. we restricted the simulation to two-dimensional space
  3. we ignored close encounter effects, such as body collision or tearing
  4. we ignored any relativistic effects, and assume classical mechanics
  5. we ignored errors induced by floating point computations

You can now run it as follows:

  > runMain barneshut.BarnesHut

To visualize the quadtree, press the Show quad button, and then hit the Start/Pause button.

Play with the parallelism level and the number of bodies, and observe the average speedups in the lower right corner. Then sit back, and enjoy the show!