Home Publications Resume Email Twitter

I have been playing with different type of terrain erosion lately and one thing I would like to do is implementing all the things I do on GPU. Erosion algorithms take many iterations to converge and are very costly when done on CPU. Most of these algorithms take advantage of parallelism: many have been implemented on GPU, but there is not always an open source implementation.

This is the first article of a series about terrain erosion and procedural generation. I will try to implement the things I find the most interesting, both on CPU and GPU to compare results. Let's start by taking a look at the state of the art on terrain erosion.

**Thermal Erosion**: is defined as "the erosion of ice-bearing permafrost by the combined thermal and mechanical action of moving water". It is the simplest one to implement (and the focus of this article) but does not give realistic results by itself.**Hydraulic Erosion**: simulates water flows over the terrain. There are many different implementation available on the internet and many are not explained properly or contains bugs. It is a difficult algorithm to implement.**Fluvial Erosion**: is the erosion of the bedrock material and its transportation downhill by streams. Usually modeled by the stream power equation as denoted by Guillaume Cordonnier in 2016.

In research papers, you might find a different structure called layer field, which stores multiple heights for multiple materials using multiple heightfields. The layer field has the advantage of representing different materials, such as bedrock, sediment or vegetation, and the interactions between them.

Despite the fact that a layer field is more appropriate (because of a dedicated layer for sediment) and for the sake of simplicity, we will use only a single height field structure to represent our terrain.

struct HeightField

{

public:

std::vector<float> heights; // Array of heights

Box2D domain; // A World space box

int n; // Grid size

float GetHeight(int i, int j) const

{

return heights[i * n + j];

}

};

{

public:

std::vector<float> heights; // Array of heights

Box2D domain; // A World space box

int n; // Grid size

float GetHeight(int i, int j) const

{

return heights[i * n + j];

}

};

The core loop of the Thermal Erosion is quite simple: loop through all terrain vertices, detect unstable points and stabilize them by moving matter in the slope direction. The version below is an implementation of the

void ThermalErosionStep()

{

const float tanThresholdAngle = 0.6f; // ~33°

const float cellSize = domain.Vertex(0)[0] - domain.Vertex(1)[0];

for (int i = 0; i < n; i++)

{

for (int j = 0; j < n; j++)

{

float zp = GetHeight(i, j);

int iSteepest - 1, jSteepest = -1;

float steepestSlope = -1;

int steepestIndex = -1;

for (int k = 0; k < 8; k++)

{

int in, jn,

Next(i, j, k, in, jn);

if (in < 0 || in >= n || jn < 0 || jn >= n) // Outside terrain

continue;

float zDiff = zp - GetHeight(in, jn);

if (zDiff > 0.0f && zDiff > steepestSlope)

{

steepest = b;

steepestSlope = zDiff;

iSteepest = in;

jSteepest = jn;

}

}

if (steepestSlope / (cellSize * length8[steepestIndex]) > tanThresholdAngle)

{

Remove(i, j, 0.1);

Add(iSteepest, jSteepest, 0.1);

}

}

}

}

{

const float tanThresholdAngle = 0.6f; // ~33°

const float cellSize = domain.Vertex(0)[0] - domain.Vertex(1)[0];

for (int i = 0; i < n; i++)

{

for (int j = 0; j < n; j++)

{

float zp = GetHeight(i, j);

int iSteepest - 1, jSteepest = -1;

float steepestSlope = -1;

int steepestIndex = -1;

for (int k = 0; k < 8; k++)

{

int in, jn,

Next(i, j, k, in, jn);

if (in < 0 || in >= n || jn < 0 || jn >= n) // Outside terrain

continue;

float zDiff = zp - GetHeight(in, jn);

if (zDiff > 0.0f && zDiff > steepestSlope)

{

steepest = b;

steepestSlope = zDiff;

iSteepest = in;

jSteepest = jn;

}

}

if (steepestSlope / (cellSize * length8[steepestIndex]) > tanThresholdAngle)

{

Remove(i, j, 0.1);

Add(iSteepest, jSteepest, 0.1);

}

}

}

}

The erosion amplitude is usually very small (~0.05/0.1 meter) so that the stabilization process can eventually converge (without infinitely moving matter from one cell to another). The lower the amplitude, the more iteration will be required to get a stabilized terrain. The Next(i, j, k, in, jn) function above returns the index (in, jn) of the k-th neighbour on the 1-ring neighborhood of the point (i, j). The figure below shows the 1-ring neighborhood of a grid vertex.

In my next attempt I used two buffers: a floating value buffer to represent our height field data, and an integer buffer to allow the use of the atomicAdd glsl function. The floating point values were handled with intBitsToFloat and floatBitsToInt functions. You also have to use a barrier to make sure your return buffer is filled properly with the correct final height. This solution worked as intended and was also faster than the CPU version but slower than my previous implementation because of the use of two buffers. The main advantage of this method is that we are no longer limited by the use of integers. This version is called "Double buffer" is the results graph below.

My last idea was the one that I should have tried in the first place: simply ignore the race condition and use a single floating point value buffer to represent height data. Of course, the result will not be deterministic and will contain errors because of race conditions but at the end, the algorithm will converge to the same visual result after a few more iterations. The results are very similar to the other methods and this is the fastest, simplest method that I implemented. Here is a code snippet of the last method:

layout(binding = 0, std430) coherent buffer HeightfieldDataFloat

{

float floatingHeightBuffer[];

};

uniform int gridSize;

uniform float amplitude;

uniform float cellSize;

uniform float tanThresholdAngle;

bool Inside(int i, int j)

{

if (i < 0 || i >= gridSize || j < 0 || j >= gridSize)

return false;

return true;

}

int ToIndex1D(int i, int j)

{

return i * gridSize + j;

}

layout(local_size_x = 1024) in;

void main()

{

uint id = gl_GlobalInvocationID.x;

if (id >= floatingHeightBuffer.length())

return;

float maxZDiff = 0;

int neiIndex = -1;

int i = int(id) / gridSize;

int j = int(id) % gridSize;

for (int k = -1; k <= 1; k += 2)

{

for (int l = -1; l <= 1; l += 2)

{

if (Inside(i + k, j + l) == false)

continue;

int index = ToIndex1D(i + k, j + l);

float h = floatingHeightBuffer[index];

float z = floatingHeightBuffer[id] - h;

if (z > maxZDiff)

{

maxZDiff = z;

neiIndex = index;

}

}

}

if (maxZDiff / cellSize > tanThresholdAngle)

{

floatingHeightBuffer[id] = floatingHeightBuffer[id] - amplitude;

floatingHeightBuffer[neiIndex] = floatingHeightBuffer[neiIndex] + amplitude;

}

}

{

float floatingHeightBuffer[];

};

uniform int gridSize;

uniform float amplitude;

uniform float cellSize;

uniform float tanThresholdAngle;

bool Inside(int i, int j)

{

if (i < 0 || i >= gridSize || j < 0 || j >= gridSize)

return false;

return true;

}

int ToIndex1D(int i, int j)

{

return i * gridSize + j;

}

layout(local_size_x = 1024) in;

void main()

{

uint id = gl_GlobalInvocationID.x;

if (id >= floatingHeightBuffer.length())

return;

float maxZDiff = 0;

int neiIndex = -1;

int i = int(id) / gridSize;

int j = int(id) % gridSize;

for (int k = -1; k <= 1; k += 2)

{

for (int l = -1; l <= 1; l += 2)

{

if (Inside(i + k, j + l) == false)

continue;

int index = ToIndex1D(i + k, j + l);

float h = floatingHeightBuffer[index];

float z = floatingHeightBuffer[id] - h;

if (z > maxZDiff)

{

maxZDiff = z;

neiIndex = index;

}

}

}

if (maxZDiff / cellSize > tanThresholdAngle)

{

floatingHeightBuffer[id] = floatingHeightBuffer[id] - amplitude;

floatingHeightBuffer[neiIndex] = floatingHeightBuffer[neiIndex] + amplitude;

}

}

You can see some results in the following figures.

As expected, the single floating point buffer is the most efficient one: there is no conversion back and forth between integers and floats, and only one buffer to handle. This is an interesting solution because we compensate our error by increasing iteration count, which is not the most elegant but the most efficient way according to my benchmark in this case. Code is available here: C++ and glsl.

Realtime Procedural Terrain Generation - Jacob Olsen

Interactive Terrain Modeling Using Hydraulic Erosion - Ondrej Št’ava

Fast Hydraulic and Thermal Erosion on the GPU - Balazs Jako

Large Scale Terrain Generation from Tectonic Uplift and Fluvial Erosion - Guillaume Cordonnier et al.

The Synthesis and Rendering of Eroded Fractal Terrains - Kenton Musgrave et al.