I/O scheduling is a technique used by operating systems to determine the order in which input/output (I/O) operations—such as reading from or writing to storage devices—are processed. Its primary goals are to optimize system performance, reduce latency, increase throughput, and ensure fair access to I/O devices among multiple processes.
Reduces wait time for I/O requests.
Improves overall system performance by minimizing unnecessary disk head movements (especially important for hard drives).
Prevents starvation of any single process by ensuring fairness.
Maximizes device utilization by efficiently handling simultaneous requests.
Linux employs various schedulers to manage different types of tasks and resources, primarily for CPU and I/O. Here's a breakdown of the main ones:
1. CPU Schedulers (Process Schedulers): These are responsible for deciding which process gets to use the CPU and for how long.
Historical Schedulers:
O(n) Scheduler: An older scheduler where scheduling time was proportional to the number of runnable processes. It used a circular linked list and prioritized processes. Less efficient with many processes.
O(1) Scheduler: Introduced in Linux 2.6, it improved upon O(n) by achieving constant-time scheduling decisions regardless of the number of processes. It used separate run queues for different priority levels.
Current and Modern CPU Schedulers:
Completely Fair Scheduler (CFS):
Default Scheduler: CFS is the current default scheduler for most general-purpose Linux systems (from kernel 2.6.23 up to 6.5).
Fairness: It aims to provide a "fair" share of CPU time to all runnable processes, trying to make each process feel as if it's running on its own dedicated CPU.
Red-Black Tree: CFS uses a red-black tree data structure to manage processes, ordering them by their "virtual runtime" (vruntime). The process with the smallest vruntime gets scheduled next.
Scheduling Policies handled by CFS:
SCHED_OTHER (or SCHED_NORMAL): The standard time-sharing policy for regular, non-real-time tasks. It considers "niceness" values (user-adjustable priorities) to determine how much CPU time a process gets.
SCHED_BATCH: Designed for CPU-intensive, non-interactive tasks that can run in the background with lower priority, optimizing for throughput over latency.
SCHED_IDLE: For tasks that should only run when the system is completely idle, with extremely low priority.
Earliest Eligible Virtual Deadline First (EEVDF):
New Default: As of Linux kernel 6.6, EEVDF has replaced CFS as the default process scheduler.
Evolution of CFS: EEVDF is an evolution of the CFS concept, aiming for even better fairness and responsiveness, especially under high load, by using a "virtual deadline" concept.
Real-Time Schedulers: These are designed for applications that require strict timing guarantees and low latency. They have higher priority than normal processes.
SCHED_FIFO (First-In, First-Out): A simple real-time policy where tasks run until they explicitly yield the CPU or are preempted by a higher-priority SCHED_FIFO or SCHED_RR task. Once a SCHED_FIFO task starts, it runs to completion (or until it blocks) unless a higher-priority real-time task becomes runnable.
SCHED_RR (Round-Robin): A real-time policy similar to SCHED_FIFO but with time-slicing. Tasks run for a fixed time quantum. If a task doesn't finish within its quantum, it's moved to the end of the queue for its priority level, and the next task at that priority gets scheduled.
SCHED_DEADLINE:
Deadline-based: This is a real-time scheduler designed for tasks with strict deadlines. Tasks are scheduled based on their earliest deadline. It has the highest priority among all scheduling classes.
Specialized Schedulers:
Capacity Aware Scheduler (CAS): Considers CPU capacity when scheduling tasks, especially on systems with heterogeneous CPUs.
Energy Aware Scheduler (EAS): Aims to optimize power consumption by scheduling tasks on the most energy-efficient CPU cores available.
2. I/O Schedulers (Disk Schedulers): These manage the order in which read/write requests are sent to storage devices (HDDs, SSDs, etc.) to optimize performance.
NOOP: A very simple "no operation" scheduler that simply maintains a FIFO (First-In, First-Out) queue for I/O requests. It's often suitable for modern SSDs or storage arrays that handle their own I/O optimization.
Deadline: Prioritizes I/O requests based on their "deadline" (how long they've been waiting), aiming to prevent starvation of requests. It maintains separate read and write queues and generally offers good performance for various workloads.
CFQ (Completely Fair Queuing): An older, more complex scheduler that attempts to provide fair bandwidth allocation to all processes issuing I/O. It tries to group requests from the same process together and uses a round-robin approach among different processes.
Anticipatory: An extension of the Deadline scheduler that "anticipates" future read requests by pausing briefly after a read to see if another read request for a nearby block arrives. This can improve sequential read performance but might add latency for other workloads.
MQ (Multi-queue) Schedulers: Modern Linux kernels use a blk-mq (block multi-queue) framework, which allows for multiple queues to handle I/O requests, better utilizing multi-core CPUs and fast NVMe devices. Within this framework, different "dispatch queues" or I/O schedulers can be used.
How to check the active I/O scheduler: You can usually find the active I/O scheduler for a particular disk using a command like: cat /sys/block/sda/queue/scheduler (replace sda with your disk device, e.g., nvme0n1)
It's important to note that the choice of scheduler can significantly impact system performance depending on the workload and hardware.
Advantages of Round Robin Scheduling
In RR all the processes have the equal priority because of fixed time quantum.
Starvation will never occur because each process in every RR cycle will be schedule for a fixed time slice or time quantum.
Disadvantages of Round Robin Scheduling
In RR, throughput depends on the time quantum. if the time quantum is increased, the throughput will be decreased.
If the time quantum decreases, it will affect the CPU efficiency.
The EEVDF scheduler is a sophisticated algorithm designed to manage CPU time efficiently, ensuring that all tasks receive a fair share while also prioritizing those with stricter latency requirements. It achieves this by using concepts of "virtual time," "lag," "eligibility," and "virtual deadlines."
Virtual Run Time (VRT): Each runnable task maintains its own virtual run time. This is a conceptual clock that advances based on the CPU time the task should have received, scaled by its priority (or "niceness" in Linux). Tasks with higher priority (lower niceness) have their VRT advance slower, meaning they are "owed" more CPU time relatively.
Lag: The "lag" of a task is the difference between the CPU time it should have received (based on its VRT) and the CPU time it has actually received.
A positive lag means the task is "owed" CPU time; it has received less than its fair share.
A negative lag means the task has "over-run" its fair share.
Eligibility: A task is considered "eligible" to run if its lag value is zero or greater. The scheduler only considers eligible tasks for execution. This prevents tasks that have significantly over-run their share from immediately running again, ensuring fairness.
Virtual Deadline (VD): For each eligible task, a virtual deadline is calculated. This deadline is derived from its current virtual run time and its requested time slice (or a default slice). Tasks with shorter requested time slices (often latency-sensitive tasks) will naturally have earlier virtual deadlines.
The EEVDF scheduler operates on a simple principle: among all eligible tasks, it always picks the one with the earliest virtual deadline to run next.
Fairness through Lag: By tracking lag, EEVDF ensures that tasks that have fallen behind on their allocated CPU time (positive lag) are prioritized. The system aims to keep all tasks' lag values close to zero over time.
Responsiveness through Virtual Deadlines: The introduction of virtual deadlines allows latency-sensitive tasks (those needing CPU quickly, even if for a short burst) to express their urgency. Since they can request shorter time slices, they will compute earlier virtual deadlines, making them more likely to be scheduled sooner. This is a key improvement over previous schedulers like CFS, which primarily focused on proportional fairness but could struggle with low-latency requirements.
Preemption: If a newly awakened or eligible task has a virtual deadline earlier than the currently running task's virtual deadline, the EEVDF scheduler can preempt the current task to run the more urgent one.
Sleeping Tasks: When a task goes to sleep (e.g., waiting for I/O), its lag value is retained. However, the lag of long-sleeping tasks can "decay" over virtual run time, preventing them from accumulating a massive debt that would unfairly penalize them upon waking.