Unit 3 · CPU Scheduling, Threads & Deadlocks

Lecture 8: Multiprocessor scheduling

Learning outcomes

Discuss the challenges and strategies for scheduling on multiprocessor systems.

Prerequisites

Process concept and threads Basic CPU scheduling concepts Scheduling criteria: CPU utilization, throughput, waiting time, turnaround time, response time Context switching and ready queue Basic idea of multicore processors The topic fits naturally under Unit III: Scheduling & Deadlocks, where the syllabus explicitly includes multiprocessor scheduling, and it also aligns with CO2: Explain processes, threads and scheduling algorithms. The lecture plan places CPU scheduling in the scheduling block immediately after processes and threads, which makes this lecture a natural extension of basic scheduling. The standard textbook reference also treats Multi-Processor Scheduling as a dedicated section within the CPU scheduling chapter.

Lecture Notes

Main content

Multiprocessor Scheduling

Learning Outcome

Discuss the challenges and strategies for scheduling on multiprocessor systems.

Introduction

In a single-processor system, CPU scheduling mainly answers one question: which ready process should run next? In a multiprocessor system, the scheduler must answer two questions: which process should run next, and on which processor should it run? This makes scheduling more complex because several processors or cores may be available at the same time.

Multiprocessor scheduling is therefore concerned with assigning processes or threads to multiple CPUs in a way that improves CPU utilization, throughput, response time, and fairness while keeping scheduling overhead under control. In modern systems, this topic includes both traditional multiple-CPU machines and multicore processors.

Figure: Two common scheduling designs in multiprocessor systems: a single shared ready queue and separate per-processor ready queues.

Why multiprocessor scheduling is harder

A multiprocessor scheduler must do more than choose the next process. It must also decide how to distribute work across processors. If too many runnable tasks remain on one processor while others are idle, system performance suffers. On the other hand, moving tasks too frequently between processors can also reduce performance because useful cache contents may be lost.

Thus, multiprocessor scheduling is a balancing act between:

keeping all processors busy,
minimizing waiting time,
reducing migration overhead,
preserving cache locality, and
scaling well as the number of processors increases.

Basic approaches

Two broad approaches are commonly discussed: asymmetric multiprocessing and symmetric multiprocessing.

In asymmetric multiprocessing (AMP), one processor acts as a master. It handles scheduling and system control, while the other processors execute assigned tasks. This design is easier to manage, but the master processor may become a bottleneck.

In symmetric multiprocessing (SMP), all processors are treated as equals. Each processor may schedule its own work and run both user and kernel activities. This approach offers better performance and scalability, but it is harder to implement because processors may compete for shared scheduling data structures.

Key Idea

Modern operating systems generally prefer SMP because it avoids dependence on a single master CPU and uses processor resources more effectively.

Common queue and per-processor queues

A simple SMP design may place all ready processes in one shared ready queue. Any available processor can pick the next task from that queue. This approach supports natural work sharing, but as the number of processors grows, contention for the queue can become expensive.

Another design gives each processor its own local ready queue. This reduces contention and often improves scalability. However, it can create imbalance if one queue becomes overloaded and another becomes nearly empty. That is why a multiprocessor scheduler also needs a method for load balancing.

Load balancing

Load balancing means distributing runnable tasks so that processors are used more evenly. If one processor is overloaded while another is idle, the system should move work from the busy processor to the idle or less loaded one.

Two standard mechanisms are commonly described:

Push migration: the scheduler detects imbalance and pushes tasks from a busy CPU to a less busy CPU.
Pull migration: an idle CPU looks for work and pulls a task from another processor’s queue.

Both techniques aim to improve utilization, but they must be used carefully. Excessive migration can undo the benefits of cache reuse and increase scheduling overhead.

Figure: Load balancing moves work to underused processors, but processor affinity tries to keep a task on the same CPU to preserve cache locality.

Processor affinity

Processor affinity means that a process is preferably scheduled on the processor where it ran before. This matters because data recently used by the process may still be available in that processor’s cache. If the process is moved to another CPU, that cache advantage may be lost.

Two forms are usually discussed:

Soft affinity: the scheduler tries to keep a process on the same CPU, but may move it if needed.
Hard affinity: the system binds a process to a specific CPU or a limited set of CPUs.

Affinity improves locality and often improves performance, but strict affinity may interfere with load balancing. So practical schedulers balance affinity and fair distribution of work.

Multicore scheduling

Modern systems usually contain multicore processors, where several processing cores exist on the same chip. From the operating system’s perspective, each core may appear as a separate logical CPU. However, cores may share caches, memory paths, and other hardware resources.

Because of this, scheduling on multicore systems is not just a repetition of single-CPU scheduling. The operating system must consider:

shared cache behavior,
contention for memory bandwidth,
cost of task migration between cores, and
coordination between parallel threads of the same application.

Some processors also support simultaneous multithreading (SMT), where one physical core exposes more than one logical CPU. This improves utilization, but the logical CPUs still share hardware resources. Therefore, assigning two heavy threads to sibling logical CPUs may not yield ideal performance.

NUMA-aware scheduling

In some multiprocessor systems, especially larger servers, the architecture is NUMA (Non-Uniform Memory Access). In such systems, memory access time depends on where the memory is located relative to the processor.

A CPU can access its local memory faster than memory attached to another processor. Therefore, the scheduler should try to keep a process close to the memory it frequently uses. This is called NUMA-aware scheduling.

Why NUMA matters

A processor may be free, but that does not always mean it is the best place to run a task. If the task’s memory is far away, performance may still suffer.

Heterogeneous multiprocessing

Not all processors in a system are necessarily identical. In heterogeneous multiprocessing, some cores may be faster and more power-hungry, while others may be slower and more energy efficient. A common example is a big.LITTLE design.

In such systems, the scheduler may place:

interactive or urgent tasks on high-performance cores, and
background or low-priority tasks on energy-efficient cores.

This means the scheduler is no longer concerned only with speed and fairness. It must also consider power consumption and thermal limits.

Main challenges in practice

Multiprocessor scheduling must deal with several practical difficulties:

contention for scheduler data structures,
overhead of synchronization between processors,
cache loss after process migration,
maintaining fairness across processors,
scaling to large numbers of cores, and
balancing performance with power efficiency.

A good scheduler must therefore avoid two extremes: it should neither leave processors idle unnecessarily nor move tasks around so aggressively that overhead destroys the gain.

Conclusion

Multiprocessor scheduling extends basic CPU scheduling to systems with multiple processors or cores. Its core concern is not only choosing the next runnable task, but also selecting the most suitable processor for that task. Modern operating systems address this problem using SMP-based designs, load balancing, processor affinity, multicore awareness, and sometimes NUMA-aware and heterogeneous scheduling policies.

In simple terms, multiprocessor scheduling is about using parallel hardware intelligently: keep processors busy, keep overhead low, and keep work close to the hardware resources that help it run efficiently.

Worked Example

Worked Example: Load Balancing and Processor Affinity

Consider a system with two processors: P0 and P1. Each processor has its own ready queue.

Initial state:

P0 queue: T1, T2, T3, T4
P1 queue: empty

If the scheduler does nothing, P0 will remain busy and P1 will remain idle. This is poor load distribution.

Step 1: Detect imbalance

The operating system observes that one processor has several waiting tasks while the other has none.

Step 2: Perform load balancing

The scheduler migrates T3 and T4 from P0 to P1.

After migration:

P0 queue: T1, T2
P1 queue: T3, T4

Now both processors can execute tasks in parallel, improving CPU utilization and overall throughput.

Step 3: Consider processor affinity

Suppose T3 had previously run on P0 and its working data is still present in P0’s cache. Moving it to P1 may reduce cache benefit and slightly slow execution.

Therefore, the scheduler must decide:

Should T3 stay on P0 for better cache locality?
Or should T3 move to P1 for better load balance?

A practical scheduler usually tries to balance both goals: do enough migration to prevent idleness, but avoid unnecessary movement that destroys cache locality.

Conclusion

This example shows the core tension in multiprocessor scheduling: load balancing improves parallelism, while processor affinity improves locality. A good scheduler must manage both together.

One-Page Summary

One-Page Summary: Multiprocessor Scheduling

Multiprocessor scheduling is the process of assigning runnable processes or threads to multiple CPUs or cores in a computer system. In a single-processor system, the scheduler only decides which process should run next. In a multiprocessor system, it must decide both which process should run and on which processor it should run.

The main objectives of multiprocessor scheduling are to keep all processors busy, distribute workload fairly, reduce scheduling overhead, preserve cache locality, and improve system performance. The two broad design styles are asymmetric multiprocessing (AMP), where one master processor performs scheduling decisions, and symmetric multiprocessing (SMP), where all processors are treated equally and each can schedule work. Modern operating systems mostly use SMP.

A scheduler may use either a common ready queue shared by all processors or separate ready queues for individual processors. A common queue simplifies work sharing but may suffer from contention. Separate queues scale better but may lead to workload imbalance.

To solve imbalance, operating systems use load balancing. In push migration, busy processors push tasks to less loaded processors. In pull migration, idle processors pull work from busy ones. This improves utilization but may increase migration overhead.

Another major concept is processor affinity. A process tends to run better on the processor where it executed previously because useful data may still be available in that processor’s cache. Soft affinity is a preference to stay on the same CPU, while hard affinity is a strict binding to a specific CPU or set of CPUs.

In modern systems, multiprocessor scheduling also includes multicore processors, simultaneous multithreading (SMT), NUMA-aware scheduling, and heterogeneous multiprocessing. NUMA-aware scheduling tries to keep a process near the memory it uses, while heterogeneous scheduling places tasks on fast or power-efficient cores depending on system goals.

The central challenge of multiprocessor scheduling is balance: too much migration improves load distribution but hurts locality; too little migration preserves locality but may leave processors idle. Therefore, an effective multiprocessor scheduler tries to achieve both high parallel utilization and efficient resource usage.