Blogs: High performance computing - A self-scheduling algorithm

HPC for a data science and cloud project was set up - Self-Scheduling Algorithm with Parallel programming

Abstract

It has been a while since I last posted in this forum. Today, I prepared a schedule related to High-Performance Computing (HPC) and reviewed a recent project I completed for an HPC and Data Science course. This project focused on implementing HPC tools to address scheduling problems in cloud environments. I discovered that similar scheduling issues exist in battery pack management, as my co-project team member discussed at the time.

Currently, I have a comprehensive understanding of how to manage scheduling in cloud projects, especially for time series-dependent workloads. The study revolves around analyzing such a spiky workload using mpirun. We introduced a transformation step to the training dataset to normalize this spiky load. For the self-scheduling algorithm to effectively handle such a load, we monitored the mpithreads to ensure smooth resource usage.

Introduction

Applying a self-scheduling algorithm to solve real-world applications as an objective, the inputs considered include a square matrix and constraints, along with the system configuration for hybrid setups. The output was an optimized vector to demonstrate self-scheduling options

Literature review

The self-scheduling algorithm can be applied for real-time analysis in data science, specifically for:

Scheduling charge and discharge cycles of battery cells within a battery pack.
Managing resources such as memory, processors, and storage for cloud elasticity.
Introducing a novel programming technique determined during the project, utilizing the MPI (Message Passing Interface) package.

IBM_torc_py Architecture

All remote operations are conducted asynchronously through a dedicated server thread. This thread is responsible for the following tasks:

Inserting incoming tasks into the local queue of the process
Receiving completed tasks along with their results
Handling task-stealing requests

The internal architecture of torcpy is illustrated in the accompanying figure. As a result, tasks (also known as features) can be created and finalized using the `submit` and `wait` calls. For more details, refer to the link https://github.com/IBM/torc_py

Pseudo-code

Package steps based on torc

Init

# torcpy execution starts, and MPI initialization happens

Submit

# Switching to master-worker

>>Assign the calculations to workers via Submit

work inp=8.000, out=64.000 ...on node 0 worker 0 thread 139725221726080

Receive, create, execute

Received: 8.0^2=64.000
Elapsed time for col1 =47.07 s
Elapsed time for col2 =48.07 s
TORCPY: node[0]: created=94, executed=94

Self-Scheduler wrapper

Init
Submit

Work x^2
T x T Transpose

Receive, create, & Execute

node_id(): return the rank of the calling MPI process
worker_id(): return the global worker thread ID

Methodology

Maximize performance with Matrix-Vector Multiplication that enhances vector scheduling by leveraging MPI and OpenMP. Start with initial MPI optimization, then unlock even more potential through coarse optimization using OpenMP. #MatrixMultiplication #MPI #OpenMP #Optimization #TechTips

HTTP request dataset

No of nodes	No of workers	Elapsed time in seconds	No of Records
1	2	15/0.0	30/7
2	2	8/0.0	30/7
3	2	5/0.0	30/7
2	1	15/1	30/7
6	1	5/1	30/7
1	1	30/1	30/7

Sample Output

TORCPY: main starts
work inp=11.000, out=121.000 ...on node 0 worker 0 thread 140230837819264
work inp=16.000, out=256.000 ...on node 0 worker 1 thread 140229995116288
work inp=3.000, out=9.000 ...on node 0 worker 3 thread 140229978330880
ork inp=3.000, out=9.000 ...on node 1 worker 6 thread 140336443447040
work inp=1.000, out=1.000 ...on node 1 worker 5 thread 140337286150016
work inp=4.000, out=16.000 ...on node 1 worker 7 thread 140336435054336
work inp=5.000, out=25.000 ...on node 1 worker 8 thread 140336426661632
work inp=9.000, out=81.000 ...on node 1 worker 9 thread 140336418268928

Elapsed time=3.05 s

TORCPY: node[0]: created=30, executed=15

TORCPY: node[1]: created=0, executed=15

No of nodes	No of workers	Elapsed time in seconds	Rows/no of data records
1	1	47	94
1	2	15	30

Result discussion

The empirical evaluation of our self-scheduling algorithm demonstrates significant performance gains when leveraging hybrid parallel programming. By systematically scaling the architecture from 1 to 6 nodes and adjusting worker thread configurations, we observed a substantial reduction in overhead and processing times. For instance, in our HTTP request dataset benchmark, increasing the node count effectively minimized the elapsed runtime from 30 seconds down to just 5 seconds for a standard batch of 30 records. Furthermore, integrating the torc_py framework allowed for seamless, asynchronous master-worker task distribution. As highlighted by our node execution logs, tasks were efficiently balanced across available hardware threads (e.g., node[0] and node[1] evenly splitting 30 executed tasks), mitigating the spiky resource usage typically associated with heavy time-series workloads. These optimization vectors prove that the self-scheduling wrapper successfully balances computation and communication overhead. Ultimately, this framework provides a highly viable, real-time analysis solution for data science applications—ranging from maintaining cloud elasticity during unpredictable demand spikes to managing critical charge/discharge cycles within battery packs.

Future work

Sample Output for the Battery pack dataset (94/30 records) - with required training data shall be implemented for differing thread tests.

Conclusion

In conclusion, utilizing a self-scheduling algorithm can effectively address real-world applications by optimizing the processing of a square matrix while adhering to specified constraints, including system configurations for hybrid systems. The result of this approach will be a refined and optimized vector, ultimately enhancing the efficiency and performance of the system.

Labels: Cloud, Learn

Blogs

Sunday, July 5, 2026

High performance computing - A self-scheduling algorithm

HPC for a data science and cloud project was set up - Self-Scheduling Algorithm with Parallel programming

Introduction

0 Comments:

Post a Comment

About Me

Previous Posts