High performance computing - A self-scheduling algorithm
HPC for a data science and cloud project was set up - Self-Scheduling Algorithm with Parallel programming
Introduction
Applying a self-scheduling algorithm to solve real-world applications as an objective, the inputs considered include a square matrix and constraints, along with the system configuration for hybrid setups. The output was an optimized vector to demonstrate self-scheduling options3
Literature review
The self-scheduling algorithm can be applied for real-time analysis in data science, specifically for:
- Scheduling charge and discharge cycles of battery cells within a battery pack.
- Managing resources such as memory, processors, and storage for cloud elasticity.
- Introducing a novel programming technique determined during the project, utilizing the MPI (Message Passing Interface) package.
All remote operations are conducted asynchronously through a dedicated server thread. This thread is responsible for the following tasks:
- Inserting incoming tasks into the local queue of the process
- Receiving completed tasks along with their results
- Handling task-stealing requests
The internal architecture of torcpy is illustrated in the accompanying figure. As a result, tasks (also known as features) can be created and finalized using the `submit` and `wait` calls. For more details, refer to the link https://github.com/IBM/torc_py
Pseudo-code
Package steps based on torc
Init
# torcpy execution starts, and MPI initialization happens
Submit
# Switching to master-worker
>>Assign the calculations to workers via Submit
work inp=8.000, out=64.000 ...on node 0 worker 0 thread 139725221726080
Receive, create, execute
Received: 8.0^2=64.000
Elapsed time for col1 =47.07 s
Elapsed time for col2 =48.07 s
TORCPY: node[0]: created=94, executed=94
Self-Scheduler wrapper
Init
Submit
Work x^2
T x T Transpose
Receive, create, & Execute
node_id(): return the rank of the calling MPI process
worker_id(): return the global worker thread ID
Methodology
No of nodes | No of workers | Elapsed time in seconds | No of Records |
1 | 2 | 15/0.0 | 30/7 |
2 | 2 | 8/0.0 | 30/7 |
3 | 2 | 5/0.0 | 30/7 |
2 | 1 | 15/1 | 30/7 |
6 | 1 | 5/1 | 30/7 |
1 | 1 | 30/1 | 30/7 |
Sample Output
work inp=11.000, out=121.000 ...on node 0 worker 0 thread 140230837819264
work inp=16.000, out=256.000 ...on node 0 worker 1 thread 140229995116288
work inp=3.000, out=9.000 ...on node 0 worker 3 thread 140229978330880
ork inp=3.000, out=9.000 ...on node 1 worker 6 thread 140336443447040
work inp=1.000, out=1.000 ...on node 1 worker 5 thread 140337286150016
work inp=4.000, out=16.000 ...on node 1 worker 7 thread 140336435054336
work inp=5.000, out=25.000 ...on node 1 worker 8 thread 140336426661632
work inp=9.000, out=81.000 ...on node 1 worker 9 thread 140336418268928
Elapsed time=3.05 s
TORCPY: node[0]: created=30, executed=15
TORCPY: node[1]: created=0, executed=15
No of nodes | No of workers | Elapsed time in seconds | Rows/no of data records |
1 | 1 | 47 | 94 |
1 | 2 | 15 | 30 |
.png)


0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home