Axel Paris - Graphics Researcher
Home   Publications   Resume   Email   Twitter

Terrain Erosion on the GPU

April 15, 2018. Last updated on June 21, 2019.

I have been playing with different type of terrain erosion lately and one thing I would like to do is implementing all the things I do on GPU. Erosion algorithms take many iterations to converge and are very costly when done on CPU. Most of these algorithms take advantage of parallelism: many have been implemented on GPU, but there is not always an open source implementation.

This is the first article of a series about terrain erosion and procedural generation. I will try to implement the things I find the most interesting, both on CPU and GPU to compare results. Let's start by taking a look at the state of the art on terrain erosion.

State of the Art

There are different type of erosion:
Musgrave was the first to show some results on both Thermal and Hydraulic erosion. These algorithms were ported to the GPU by Št’ava in 2008 and Jako in 2011. In this post, we will focus on Thermal Erosion and its implementation on the GPU. Hydraulic Erosion might be the subject of a future post.

Thermal Erosion

Thermal erosion is based on the repose or talus angle of the material. The idea is to transport a certain amount of material (mostly sediments) in the direction of the slope, if the talus angle is above the threshold defined the material. If you want an in depth-article explaining the concept of thermal erosion, I recommend reading this one: Olsen 2004. A simple optimized version which gives almost identical visual results is presented. This process leads to terrains with a maximum slope that will be obtained by moving matter downhill. By chance, the algorithm is easily portable to the GPU: in fact, the core algorithm is almost identical to the CPU version. The difficulty resides in which buffer we use, how many we use and how much we care about race condition. I will first describe the CPU implementation and code structure.

CPU Implementation

Most of the work done in Terrain Modeling uses a heightfield structure: a regular grid where every vertex store the height. You can find many articles on the web describing such a structure (which is just an array and a bounding box in world space) and so I will not go into details on the implementation. Just keep in mind that we use a one dimensionnal array to store the height and access it in 2D by combining the coordinates and the grid dimension: index1D = i * n + j for a grid vertex at coordinate (i, j) and a grid dimension n².

In research papers, you might find a different structure called layer field, which stores multiple heights for multiple materials using multiple heightfields. The layer field has the advantage of representing different materials, such as bedrock, sediment or vegetation, and the interactions between them.

Despite the fact that a layer field is more appropriate (because of a dedicated layer for sediment) and for the sake of simplicity, we will use only a single height field structure to represent our terrain.

struct HeightField
    std::vector<float> heights;     // Array of heights
    Box2D domain;             // A World space box
    int n;                 // Grid size

    float GetHeight(int i, int j) const
        return heights[i * n + j];
A simple implementation of the HeightField structure to represent terrains. As a side note, heightfields are also used to represent water surface, where height is also a function of time.

The core loop of the Thermal Erosion is quite simple: loop through all terrain vertices, detect unstable points and stabilize them by moving matter in the slope direction. The version below is an implementation of the steepest method, where matter is only distributed in the steepest direction. A slightly more complicated version exists where matter is distributed equally based on the relative slope to all neighbour below the current grid point.
void ThermalErosionStep()
    const float tanThresholdAngle = 0.6f; // ~33°
    const float cellSize = domain.Vertex(0)[0] - domain.Vertex(1)[0];
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            float zp = GetHeight(i, j);
            int iSteepest - 1, jSteepest = -1;
            float steepestSlope = -1;
            int steepestIndex = -1;
            for (int k = 0; k < 8; k++)
                int in, jn,
                Next(i, j, k, in, jn);
                if (in < 0 || in >= n || jn < 0 || jn >= n) // Outside terrain
                float zDiff = zp - GetHeight(in, jn);
                if (zDiff > 0.0f && zDiff > steepestSlope)
                    steepest = b;
                    steepestSlope = zDiff;
                    iSteepest = in;
                    jSteepest = jn;
            if (steepestSlope / (cellSize * length8[steepestIndex]) > tanThresholdAngle)
                Remove(i, j, 0.1);
                Add(iSteepest, jSteepest, 0.1);
CPU Implementation of a Thermal Erosion step. Note that this is not strictly correct because our points are not sorted by decreasing height, therefore we could be adding matter to a point that will not be stabilized later. This function should be used in a loop to make sure all points are stabilized.

The erosion amplitude is usually very small (~0.05/0.1 meter) so that the stabilization process can eventually converge (without infinitely moving matter from one cell to another). The lower the amplitude, the more iteration will be required to get a stabilized terrain. The Next(i, j, k, in, jn) function above returns the index (in, jn) of the k-th neighbour on the 1-ring neighborhood of the point (i, j). The figure below shows the 1-ring neighborhood of a grid vertex.
The 1-ring neighborhood of the point (i, j) is computed with the Next() function in the example above, and k refers the index of the neighour as seen in the figure.

GPU Implementation

The race condition

GPU are parallel by nature: hundreds of threads are working at the same time. Thermal erosion needs to move matter from a grid point to another and we can't know which one in advance. Therefore, multiple threads can be adding or removing height on the same grid point. This is called a race condition and it needs to be solved in most cases. Sometimes however we are lucky: after trying a few version of the algorithm, I found that the best solution was to just not care about the race condition happening.

The solution(s)

There are multiple ways to solve this problem. My first implementation used a single integer buffer to represent height data on the GPU. I had to use integers because the atomicAdd function doesn't exist for floating point values. Heights was then converted to floating point data at the end on the CPU. This solution worked and was faster than the CPU version, but could only handle erosion on large scale (amplitude > 1 meter) because of integers. This version is called "Single Integer" on the results graph below.

In my next attempt I used two buffers: a floating value buffer to represent our height field data, and an integer buffer to allow the use of the atomicAdd glsl function. The floating point values were handled with intBitsToFloat and floatBitsToInt functions. You also have to use a barrier to make sure your return buffer is filled properly with the correct final height. This solution worked as intended and was also faster than the CPU version but slower than my previous implementation because of the use of two buffers. The main advantage of this method is that we are no longer limited by the use of integers. This version is called "Double buffer" is the results graph below.

My last idea was the one that I should have tried in the first place: simply ignore the race condition and use a single floating point value buffer to represent height data. Of course, the result will not be deterministic and will contain errors because of race conditions but at the end, the algorithm will converge to the same visual result after a few more iterations. The results are very similar to the other methods and this is the fastest, simplest method that I implemented. Here is a code snippet of the last method:

layout(binding = 0, std430) coherent buffer HeightfieldDataFloat
    float floatingHeightBuffer[];

uniform int gridSize;
uniform float amplitude;
uniform float cellSize;
uniform float tanThresholdAngle;

bool Inside(int i, int j)
    if (i < 0 || i >= gridSize || j < 0 || j >= gridSize)
        return false;
    return true;

int ToIndex1D(int i, int j)
     return i * gridSize + j;

layout(local_size_x = 1024) in;
void main()
    uint id = gl_GlobalInvocationID.x;
    if (id >= floatingHeightBuffer.length())
    float maxZDiff = 0;
    int neiIndex = -1;
    int i = int(id) / gridSize;
    int j = int(id) % gridSize;
    for (int k = -1; k <= 1; k += 2)
        for (int l = -1; l <= 1; l += 2)
            if (Inside(i + k, j + l) == false)
            int index = ToIndex1D(i + k, j + l);
            float h = floatingHeightBuffer[index];
            float z = floatingHeightBuffer[id] - h;
            if (z > maxZDiff)
                maxZDiff = z;
                neiIndex = index;
    if (maxZDiff / cellSize > tanThresholdAngle)
         floatingHeightBuffer[id] = floatingHeightBuffer[id] - amplitude;
         floatingHeightBuffer[neiIndex] = floatingHeightBuffer[neiIndex] + amplitude;

You can see some results in the following figures.
The base height fields on the left and the results of three hundred thermal erosion iteration on the right


I ran a quick benchmark to compare all the method I tried. Here are the results after 1000 iterations:
On the left, a comparison between all the methods on small grid resolution. On the right, bigger resolution without the CPU version. All time are in seconds. I didn't try to increase the grid resolution past 1024 on CPU because it took too much time, hence the two separate graphics

As expected, the single floating point buffer is the most efficient one: there is no conversion back and forth between integers and floats, and only one buffer to handle. This is an interesting solution because we compensate our error by increasing iteration count, which is not the most elegant but the most efficient way according to my benchmark in this case. Code is available here: C++ and glsl.


Interactive Erosion in Unity - Digital Dust

Realtime Procedural Terrain Generation - Jacob Olsen

Interactive Terrain Modeling Using Hydraulic Erosion - Ondrej Št’ava

Fast Hydraulic and Thermal Erosion on the GPU - Balazs Jako

Large Scale Terrain Generation from Tectonic Uplift and Fluvial Erosion - Guillaume Cordonnier et al.

The Synthesis and Rendering of Eroded Fractal Terrains - Kenton Musgrave et al.