Parallel Patterns

Parallel patterns are fundamental building blocks in parallel programming, similar to how design patterns work in object-oriented programming. These patterns provide proven solutions to common parallel programming challenges.

Data Parallelism

Concept

Data parallelism is a pattern where the same operation is performed on multiple data elements simultaneously. Think of it like having multiple workers processing different parts of a large dataset at the same time. This pattern works best when:

The data can be split into independent chunks
The same operation needs to be applied to each element
There are no dependencies between calculations

Example Implementation

#include <sycl/sycl.hpp>
#include <iostream>

int main() {
    const size_t N = 1024;
    sycl::queue q;

    // Initialize input vectors
    std::vector<float> a(N, 1.0f);
    std::vector<float> b(N, 2.0f);
    std::vector<float> c(N);

    // Create SYCL buffers
    sycl::buffer<float> buf_a(a.data(), N);
    sycl::buffer<float> buf_b(b.data(), N);
    sycl::buffer<float> buf_c(c.data(), N);

    // Perform parallel vector addition
    q.submit([&](sycl::handler& h) {
        auto acc_a = buf_a.get_access<sycl::access::mode::read>(h);
        auto acc_b = buf_b.get_access<sycl::access::mode::read>(h);
        auto acc_c = buf_c.get_access<sycl::access::mode::write>(h);
        
        h.parallel_for(sycl::range<1>{N}, [=](sycl::id<1> idx) {
            acc_c[idx] = acc_a[idx] + acc_b[idx];  // Each element processed in parallel
        });
    });

    q.wait();
    return 0;
}

Task Parallelism

Concept

Task parallelism focuses on running different functions or tasks simultaneously. Unlike data parallelism, which splits data, task parallelism splits the work itself. It's useful when:

You have multiple independent operations to perform
Tasks might be different from each other
Tasks can run independently

Example Implementation

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q1, q2;  // Two separate queues for concurrent tasks
    const size_t N = 1024;
    std::vector<int> data1(N), data2(N);

    // Task 1: Double each element
    q1.submit([&](sycl::handler& cgh) {
        sycl::accessor acc(data1, cgh, sycl::write_only);
        cgh.parallel_for(sycl::range<1>(N), [=](sycl::id<1> i) {
            acc[i] = i[0] * 2;
        });
    });

    // Task 2: Calculate modulo 100
    q2.submit([&](sycl::handler& cgh) {
        sycl::accessor acc(data2, cgh, sycl::write_only);
        cgh.parallel_for(sycl::range<1>(N), [=](sycl::id<1> i) {
            acc[i] = i[0] % 100;
        });
    });

    q1.wait();
    q2.wait();
    return 0;
}

Map and Reduce

These are two fundamental parallel patterns often used together:

Map Pattern

Transforms each element in a dataset independently
Perfect for parallel execution since each operation is independent
Example: squaring each number in an array

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q;
    const size_t N = 1024;
    std::vector<int> data(N);

    // Map operation: square each element
    q.submit([&](sycl::handler& cgh) {
        cgh.parallel_for(sycl::range<1>(N), [=](sycl::id<1> i) {
            data[i] = i[0] * i[0];
        });
    }).wait();
}

Reduce Pattern

Combines all elements in a dataset into a single result
Uses divide-and-conquer to parallelize the reduction
Example: summing all numbers in an array

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q;
    const size_t N = 1024;
    std::vector<int> data(N, 1);  // Initialize with 1s
    sycl::buffer<int> buf(data.data(), sycl::range<1>(N));

    // Parallel reduction
    size_t size = N;
    while (size > 1) {
        size_t new_size = (size + 1) / 2;
        q.submit([&](sycl::handler& h) {
            auto acc = buf.get_access<sycl::access::mode::read_write>(h);
            
            h.parallel_for(sycl::range<1>(new_size), [=](sycl::id<1> idx) {
                size_t i = idx[0];
                size_t pair_idx = i + new_size;
                if (pair_idx < size) {
                    acc[i] += acc[pair_idx];  // Combine pairs of elements
                }
            });
        });
        size = new_size;
        q.wait();
    }
}

Data Parallelism​

Concept​

Example Implementation​

Task Parallelism​

Concept​

Example Implementation​

Map and Reduce​

Map Pattern​

Reduce Pattern​

Data Parallelism

Concept

Example Implementation

Task Parallelism

Concept

Example Implementation

Map and Reduce

Map Pattern

Reduce Pattern