Axel Paris - Research Scientist

Home   Publications   Email   Twitter

Terrain Erosion on the GPU #2

February 14, 2020.

In my previous post on thermal erosion, I implemented the naive, straightforward version of the algorithm which can be summarized as this: for every cell C, check the stability in regards to its neighbours. If C is unstable, distribute a certain amount of material to downwards neighbours.

The problem with this approach in a massively parallel context is the need to write to neighboring cells, which requires slow atomic operations. However, for several algorithms, what often works is to reverse the thinking: instead of trying to write to neighboring cells, change the method to read from neighbours and only write to the current cell, therefore avoiding costly atomic operations and race conditions.

By chance, this method can be applied to thermal erosion. The idea for a single cell C is to look for upward points that will be giving matter to C and do the operation (here, adding matter) on C only. Same thing for the distribution to downward cells: we remove matter from C only, and let the neighboring cells do the other part of the computation. This has to be done with two buffer: one for reading and one for writing. After each step n, the buffer must be swapped for the step n + 1. Here is the revised version of the algorithm in a compute shader:

layout(binding = 0, std430) readonly buffer ElevationDataBufferIn
{
	float data[];
};
layout(binding = 1, std430) writeonly buffer ElevationDataBufferOut
{
	float out_data[];
};

uniform int nx;
uniform int ny;
uniform float amplitude;
uniform float cellSize;
uniform float tanThresholdAngle;

bool Inside(int i, int j)
{
	if (i < 0 || i >= nx || j < 0 || j >= ny)
		return false;
	return true;
}

int ToIndex1D(int i, int j)
{
	return i * nx + j;
}

void NoRaceConditionVersion(int x, int y)
{
	// Sample a 3x3 grid around the pixel
	float samples[9];
	for (int i = 0; i < 3; i++)
	{
		for (int j = 0; j < 3; j++)
		{
			ivec2 tapUV = (ivec2(x, y) + ivec2(i, j) - ivec2(1,1) + ivec2(nx, ny)) % ivec2(nx, ny);
			samples[i * 3 + j] = data[ToIndex1D(tapUV.x, tapUV.y)];
		}
	}
		
	// Check stability with all neighbours
	int id = ToIndex1D(x, y);
	float z = data[id];
	bool willReceiveMatter = false;
	bool willDistributeMatter = false;
	for (int i = 0; i < 9; i++)
	{
		float zd = samples[i] - z;
		if (zd / cellSize > tanThresholdAngle)
			willReceiveMatter = true;
		
		zd = z - samples[i];
		if (zd / cellSize > tanThresholdAngle)
			willDistributeMatter = true;
	}
	
	// Add/Remove matter if necessary
	float zOut = z + (willReceiveMatter ? amplitude : 0.0f) - (willDistributeMatter ? amplitude : 0.0f);
	out_data[id] = zOut;
}

layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
void main()
{
	int i = int(gl_GlobalInvocationID.x);
    int j = int(gl_GlobalInvocationID.y);	
	if (i < 0) return;
	if (j < 0) return;
	if (i >= nx) return;
	if (j >= ny) return;
	
	NoRaceConditionVersion(i, j);
}

This method can generally be applied to other erosion algorithms, such as hydraulic or aeolian erosion. Compared to the previous post where the race condition was ultimately ignored, this version is 4 to 5 times slower in my benchmark. This makes sense since we are reading multiples values from the input buffer, and swapping input/output buffers after each step. The main advantage of this method is the deterministic output of the algorithm, which is quite important in the case of erosion simulation.
If you want to compare both version side by side, here they are: Race condition shader and Correct shader