summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2026-03-13 14:54:56 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2026-03-13 14:54:56 -0700
commit8369b2e97d806537dcdba1d6b3bb46fb1407dab0 (patch)
tree7cb3697c8da354bbea9e9c788fadb2e0960c74ba /Documentation
parent8040dc41d272658ac22939ed9cb5ff24240ad851 (diff)
parent2fcfe5951eb2e8440fc5e1dd6ea977336ff83a1d (diff)
Merge tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo: - Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE() annotations for lock-free accesses to module parameters and dsq->seq - Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and above) when passed through the int sched_class interface - Documentation updates: scheduling class precedence, task ownership state machine, example scheduler descriptions, config list cleanup - Selftest fix for format specifier and buffer length in file_write_long() * tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags sched_ext: Documentation: Update sched-ext.rst sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass() sched_ext: Documentation: Mention scheduling class precedence sched_ext: Document task ownership state machine sched_ext: Use READ_ONCE() for lock-free reads of module param variables sched_ext/selftests: Fix format specifier and buffer length in file_write_long() sched_ext: Use WRITE_ONCE() for the write side of dsq->seq update
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/scheduler/sched-ext.rst30
1 files changed, 27 insertions, 3 deletions
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 9e2882d937b4..d74c2c2b9ef3 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -43,7 +43,6 @@ options should be enabled to use sched_ext:
CONFIG_DEBUG_INFO_BTF=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
- CONFIG_PAHOLE_HAS_BTF_TAG=y
sched_ext is used only when the BPF scheduler is loaded and running.
@@ -58,7 +57,8 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler which has
+higher sched_class precedence than ``SCHED_EXT``.
Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
detection of any internal error including stalled runnable tasks aborts the
@@ -345,6 +345,8 @@ Where to Look
The functions prefixed with ``scx_bpf_`` can be called from the BPF
scheduler.
+* ``kernel/sched/ext_idle.c`` contains the built-in idle CPU selection policy.
+
* ``tools/sched_ext/`` hosts example BPF scheduler implementations.
* ``scx_simple[.bpf].c``: Minimal global FIFO scheduler example using a
@@ -353,13 +355,35 @@ Where to Look
* ``scx_qmap[.bpf].c``: A multi-level FIFO scheduler supporting five
levels of priority implemented with ``BPF_MAP_TYPE_QUEUE``.
+ * ``scx_central[.bpf].c``: A central FIFO scheduler where all scheduling
+ decisions are made on one CPU, demonstrating ``LOCAL_ON`` dispatching,
+ tickless operation, and kthread preemption.
+
+ * ``scx_cpu0[.bpf].c``: A scheduler that queues all tasks to a shared DSQ
+ and only dispatches them on CPU0 in FIFO order. Useful for testing bypass
+ behavior.
+
+ * ``scx_flatcg[.bpf].c``: A flattened cgroup hierarchy scheduler
+ implementing hierarchical weight-based cgroup CPU control by compounding
+ each cgroup's share at every level into a single flat scheduling layer.
+
+ * ``scx_pair[.bpf].c``: A core-scheduling example that always makes
+ sibling CPU pairs execute tasks from the same CPU cgroup.
+
+ * ``scx_sdt[.bpf].c``: A variation of ``scx_simple`` demonstrating BPF
+ arena memory management for per-task data.
+
+ * ``scx_userland[.bpf].c``: A minimal scheduler demonstrating user space
+ scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
+ all others are scheduled in user space by a simple vruntime scheduler.
+
ABI Instability
===============
The APIs provided by sched_ext to BPF schedulers programs have no stability
guarantees. This includes the ops table callbacks and constants defined in
``include/linux/sched/ext.h``, as well as the ``scx_bpf_`` kfuncs defined in
-``kernel/sched/ext.c``.
+``kernel/sched/ext.c`` and ``kernel/sched/ext_idle.c``.
While we will attempt to provide a relatively stable API surface when
possible, they are subject to change without warning between kernel