Physics Example - 2D Heat Transfer

In this case we simulate a heat transfer in 2D. This is just for a practice.

Physics and Math

Here we omits some complex numerical details. We will use forward Euler method- you don't need to care what that is, you only need to follow my equations.

The equation is,

$\frac{\partial T}{\partial t} = \alpha \nabla^2 T$

We take $\delta t$ as time step and $\delta l$ as length step. We suppose n as time steps, p as x-axis length steps, and q as y-axis length steps.

The forward euler method is only stable if,

$\alpha \frac{\delta t}{\delta l^2} < \frac{1}{2}$

When this equation doesn't hold, we use backward euler method. However, backward euler method requires more math to solve, and converges slower.

Let's expand the equation to discreet form under forward euler method,

$\frac{\delta T_{n+1, p, q} - T_{n, p, q}}{\delta t} = \alpha \left(\frac{T_{n, p+1, q} - 2 T_{n, p, q} + T_{n, p-1, q}}{\delta l^2} + \frac{T_{n, p, q+1} - 2 T_{n, p, q} + T_{n, p, q-1}}{\delta l^2}\right)$

Obviously, we can easily solve $T_{n + 1, p, q}$ given the previous time step. That is,

$T_{n + 1, p, q} = T_{n, p, q} + \alpha \left(\frac{T_{n, p+1, q} - 2 T_{n, p, q} + T_{n, p-1, q}}{\delta l^2} + \frac{T_{n, p, q+1} - 2 T_{n, p, q} + T_{n, p, q-1}}{\delta l^2}\right) \delta t$

We enforce the following boundary conditions, and also add a heat source and initial condition,

$T_{\cdot, \cdot, q_{max}} = T_{\cdot, p_{max}, \cdot} = 0$

$T_{\cdot, \cdot, 0} = T_{\cdot, 0, \cdot} = 100$

And initially,

$T_{0, \cdot, \cdot} = 0$

To make it more complex, we suppose the transfer coefficient $\alpha$ is a second order function of temperature, that is,

$\alpha(T) = \alpha_0 + \alpha_1 T + \alpha_2 T^2$

Implementation

use ocl::ProQue;
use plotly::{common::{ColorScale, ColorScalePalette}, HeatMap, Layout, Plot};
use indicatif::ProgressBar;

const KERNEL_SRC: &str = include_str!("kernel.cl");

fn main() -> ocl::Result<()> {
    // Simulation parameters
    let size_x = 512;       // Grid size in x-direction
    let size_y = 512;       // Grid size in y-direction
    let dt = 0.001f32;          // Time step
    let dl = 1f32;           // Spatial step
    let a0 = 10.0f32;           // α₀ coefficient
    let a1 = 0.01f32;         // α₁ coefficient
    let a2 = 0.0001f32;        // α₂ coefficient
    let num_steps = 1000000;    // Number of time steps

    // Initialize OpenCL
    let pro_que = ProQue::builder()
        .src(KERNEL_SRC)
        .dims((size_x, size_y))
        .build()?;

    // Create initial temperature field
    let temp_data = vec![0.0f32; size_x * size_y];

    // Create double buffers
    let buffers = vec![
        pro_que.create_buffer::<f32>()?,
        pro_que.create_buffer::<f32>()?,
    ];

    // Initialize buffers
    buffers[0].write(&temp_data).enq()?;
    buffers[1].write(&temp_data).enq()?;
    // Create kernel
    let kernel = pro_que.kernel_builder("next_step")
        .arg(&buffers[0])  // Initial T_current buffer
        .arg(&buffers[1])  // Initial T_next buffer
        .arg(dt)
        .arg(dl)
        .arg(a0)
        .arg(a1)
        .arg(a2) 
        .local_work_size([16, 16])
        .build()?;

    let pb = ProgressBar::new(num_steps as u64);
    let mut computation_time = std::time::Duration::new(0, 0);
    let mut start_time = std::time::Instant::now();
    // Simulation loop
    for step in 0..num_steps {
        pb.inc(1);
        let (current_idx, next_idx) = (step % 2, (step + 1) % 2);

        // Set kernel arguments
        kernel.set_arg(0, &buffers[current_idx])?;
        kernel.set_arg(1, &buffers[next_idx])?;
        // Execute kernel
        unsafe { kernel.enq()?; }

        // Read results
        let mut temp_result = vec![0.0f32; size_x * size_y];
        buffers[next_idx].read(&mut temp_result).enq()?;
        computation_time += start_time.elapsed();
        start_time = std::time::Instant::now();
        // Visualize
        if step % (num_steps / 10) == ((num_steps / 10) - 1){
            plot_temperature(&temp_result, size_x, size_y, step).unwrap();
        }
    }
    println!("Time elapsed: {:.2?}", computation_time);

    Ok(())
}

fn plot_temperature(
    temperature: &[f32],
    size_x: usize,
    size_y: usize,
    step: usize
) -> Result<(), Box<dyn std::error::Error>> {
    // Prepare data matrix (transposed for correct orientation)
    let z: Vec<Vec<f32>> = (0..size_y)
        .map(|y| (0..size_x).map(|x| temperature[x * size_y + y]).collect())
        .collect();

    // Create heatmap trace
    let trace = HeatMap::new(
        (0..size_x).map(|x| x as i32).collect(),  // X coordinates
        (0..size_y).map(|y| y as i32).collect(),  // Y coordinates
        z
    )
    .name("Temperature")
    .color_scale(ColorScale::Palette(ColorScalePalette::RdBu));

    // Create plot layout
    let layout = Layout::new()
        .title(format!("Heat Distribution - Step {}", step).as_str())
        .x_axis(plotly::layout::Axis::new().title("X Position"))
        .y_axis(plotly::layout::Axis::new().title("Y Position"))
        .width(800)
        .height(800);

    // Create and save plot
    let mut plot = Plot::new();
    plot.add_trace(trace);
    plot.set_layout(layout);
    plot.write_html(format!("output/heatmap_step_{}.html", step));

    Ok(())
}

__kernel void next_step(
    __constant float* T_current,
    __global float* T_next,
    const float dt,
    const float dl,
    const float a0,
    const float a1,
    const float a2
) {
    int p = get_global_id(0);
    int q = get_global_id(1);

    int size_x = get_global_size(0);
    int size_y = get_global_size(1);

    if (p == 0 || q == 0) {
        T_next[p * size_y + q] = 100.0f;
    } else if (p == size_x - 1 || q == size_y - 1) {
        T_next[p * size_y + q] = 0.0f;
    } else {
        float current_T = T_current[p * size_y + q];

        float T_p_plus1 = T_current[(p + 1) * size_y + q];
        float T_p_minus1 = T_current[(p - 1) * size_y + q];
        float T_q_plus1 = T_current[p * size_y + (q + 1)];
        float T_q_minus1 = T_current[p * size_y + (q - 1)];

        float divider = dl * dl;
        float laplacian_x = (T_p_plus1 - 2.0f * current_T + T_p_minus1);
        float laplacian_y = (T_q_plus1 - 2.0f * current_T + T_q_minus1);

        float alpha = a0 + a1 * current_T + a2 * current_T * current_T;
        float delta_T = alpha * (laplacian_x + laplacian_y) * dt / divider;

        T_next[p * size_y + q] = current_T + delta_T;
    }
}

Time taken 418s. The result is as below,

Divergence

Here we need to introduce the concept of diveregence- since it happens here.

For each CU, it has smaller units called either warp (Nvidia) or wavefront (AMD). We use the term wavefront in the future passage.

Each wavefront shares one PC (program counter) register. And within one wavefront, all PE uses the same clock. A typical value of a wavefront is 32.

Each wavefront shares a PC register, so let's consider the following code,

int x = 2;
if(mask[get_global_id(0)]) {
    x = 1;
}
else {
    x = 0;
}

If a wavefront meets if(mask[get_global_id(0)]), and they have the same mask value, then it is fine. But what if not? They share the same PC register that they can't take the same path. This is called a warefront divergence.

Encountered by divergence, the actual execution will be like,

int x = 2;
int x_branch_true = 1;
int x_branch_false = 0;
x = select(mask[get_global_id(0)], x_branch_true, x_branch_false);

Yes, both if and else branches will be executed. So it slows down the performance. Thus, we should avoid using if-else in our programs.

The above code is simple, but what if there are thousands of lines? It will be disastrous. Thus, put in as few code in the branch as possible.

This doesn't mean you shouldn't use branches- they are unavoidable. But you need to remember that instead of the cost of one branch, the actual cost is the sum of all the branches.

So, in our previous code, divergence happens, which is necessary consdering our case, and low cost because the divergence only cases two extra lines. However, divergence is a problem you should consider, and do put as few code in a;; the branches as possible.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search