summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu/resctrl/monitor.c
AgeCommit message (Collapse)Author
2026-03-04x86/resctrl: Fix SNC detectionTony Luck
Now that the x86 topology code has a sensible nodes-per-package measure, that does not depend on the online status of CPUs, use this to divinate the SNC mode. Note that when Cluster on Die (CoD) is configured on older systems this will also show multiple NUMA nodes per package. Intel Resource Director Technology is incomaptible with CoD. Print a warning and do not use the fixup MSR_RMID_SNC_CONFIG. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/aaCxbbgjL6OZ6VMd@agluck-desk3 Link: https://patch.msgid.link/20260303110100.367976706@infradead.org
2026-01-09x86/resctrl: Read telemetry eventsTony Luck
Introduce intel_aet_read_event() to read telemetry events for resource RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package, so scan all of them and add up all counters. Aggregators may return an invalid data indication if they have received no records for a given RMID. The user will see "Unavailable" if none of the aggregators on a package provide valid counts. Resctrl now uses readq() so depends on X86_64. Update Kconfig. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86,fs/resctrl: Add architectural event pointerTony Luck
The resctrl file system layer passes the domain, RMID, and event id to the architecture to fetch an event counter. Fetching a telemetry event counter requires additional information that is private to the architecture, for example, the offset into MMIO space from where the counter should be read. Add mon_evt::arch_priv that architecture can use for any private data related to the event. The resctrl filesystem initializes mon_evt::arch_priv when the architecture enables the event and passes it back to architecture when needing to fetch an event counter. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Rename some L3 specific functionsTony Luck
With the arrival of monitor events tied to new domains associated with a different resource it would be clearer if the L3 resource specific functions are more accurately named. Rename three groups of functions: Functions that allocate/free architecture per-RMID MBM state information: arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc() mon_domain_free() -> l3_mon_domain_free() Functions that allocate/free filesystem per-RMID MBM state information: domain_setup_mon_state() -> domain_setup_l3_mon_state() domain_destroy_mon_state() -> domain_destroy_l3_mon_state() Initialization/exit: rdt_get_mon_l3_config() -> rdt_get_l3_mon_config() resctrl_mon_resource_init() -> resctrl_l3_mon_resource_init() resctrl_mon_resource_exit() -> resctrl_l3_mon_resource_exit() Ensure kernel-doc descriptions of these functions' return values are present and correctly formatted. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domainTony Luck
The upcoming telemetry event monitoring is not tied to the L3 resource and will have a new domain structure. Rename the L3 resource specific domain data structures to include "l3_" in their names to avoid confusion between the different resource specific domain structures: rdt_mon_domain -> rdt_l3_mon_domain rdt_hw_mon_domain -> rdt_hw_l3_mon_domain No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-05x86,fs/resctrl: Use struct rdt_domain_hdr when reading countersTony Luck
Convert the whole call sequence from mon_event_read() to resctrl_arch_rmid_read() to pass resource independent struct rdt_domain_hdr instead of an L3 specific domain structure to prepare for monitoring events in other resources. This additional layer of indirection obscures which aspects of event counting depend on a valid domain. Event initialization, support for assignable counters, and normal event counting implicitly depend on a valid domain while summing of domains does not. Split summing domains from the core event counting handling to make their respective dependencies obvious. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2025-12-02Merge tag 'x86_cache_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
2025-10-20x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in ↵Babu Moger
mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: 13390861b426e ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-13x86/resctrl: Fix miscount of bandwidth event when reactivating previously ↵Babu Moger
unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-10-13x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater ForestChen Yu
Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com>
2025-09-15x86/resctrl: Configure mbm_event mode if supportedBabu Moger
Configure mbm_event mode on AMD platforms. On AMD platforms, it is recommended to use the mbm_event mode, if supported, to prevent the hardware from resetting counters between reads. This can result in misleading values or display "Unavailable" if no counter is assigned to the event. Enable mbm_event mode, known as ABMC (Assignable Bandwidth Monitoring Counters) on AMD, by default if the system supports it. Update ABMC across all logical processors within the resctrl domain to ensure proper functionality. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()Babu Moger
System software reads resctrl event data for a particular resource by writing the RMID and Event Identifier (EvtID) to the QM_EVTSEL register and then reading the event data from the QM_CTR register. In ABMC mode, the event data of a specific counter ID is read by setting the following fields: QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC (=1) and setting QM_EVTSEL.RMID to the desired counter ID. Reading the QM_CTR then returns the contents of the specified counter ID. RMID_VAL_ERROR bit is set if the counter configuration is invalid, or if an invalid counter ID is set in the QM_EVTSEL.RMID field. RMID_VAL_UNAVAIL bit is set if the counter data is unavailable. Introduce resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() to reset and read event data for a specific counter. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Refactor resctrl_arch_rmid_read()Babu Moger
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to account for the overflow for MBM events and apply counter scaling for all the events. This logic is common to both reading an RMID and reading a hardware counter directly. Refactor the hardware value adjustment logic into get_corrected_val() to prepare for support of reading a hardware counter. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter ↵Babu Moger
with ABMC The ABMC feature allows users to assign a hardware counter to an RMID, event pair and monitor bandwidth usage as long as it is assigned. The hardware continues to track the assigned counter until it is explicitly unassigned by the user. Implement an x86 architecture-specific handler to configure a counter. This architecture specific handler is called by resctrl fs when a counter is assigned or unassigned as well as when an already assigned counter's configuration should be updated. Configure counters by writing to the L3_QOS_ABMC_CFG MSR, specifying the counter ID, bandwidth source (RMID), and event configuration. The ABMC feature details are documented in APM [1] available from [2]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86/resctrl: Add support to enable/disable AMD ABMC featureBabu Moger
Add the functionality to enable/disable the AMD ABMC feature. The AMD ABMC feature is enabled by setting enabled bit(0) in the L3_QOS_EXT_CFG MSR. When the state of ABMC is changed, the MSR needs to be updated on all the logical processors in the QOS Domain. Hardware counters will reset when ABMC state is changed. [ bp: Massage commit message. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature detailsBabu Moger
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5. Bits Description 15:0 MAX_ABMC Maximum Supported Assignable Bandwidth Monitoring Counter ID + 1 The ABMC feature details are documented in APM [1] available from [2]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). Detect the feature and number of assignable counters supported. For backward compatibility, upon detecting the assignable counter feature, enable the mbm_total_bytes and mbm_local_bytes events that users are familiar with as part of original L3 MBM support. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
2025-09-15x86,fs/resctrl: Consolidate monitoring related data from rdt_resourceBabu Moger
The cache allocation and memory bandwidth allocation feature properties are consolidated into struct resctrl_cache and struct resctrl_membw respectively. In preparation for more monitoring properties that will clobber the existing resource struct more, re-organize the monitoring specific properties to also be in a separate structure. Also convert "bandwidth sources" terminology to "memory transactions" to have consistency within resctrl for related monitoring features. [ bp: Massage commit message. ] Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Prepare for more monitor eventsTony Luck
There's a rule in computer programming that objects appear zero, once, or many times. So code accordingly. There are two MBM events and resctrl is coded with a lot of if (local) do one thing if (total) do a different thing Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays of pointers to per event data instead of explicit fields for total and local bandwidth. Simplify by coding for many events using loops on which are enabled. Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more widely. Also provide a for_each_mbm_event_id() helper macro. Cleanup variable names in functions touched to consistently use "eventid" for those with type enum resctrl_event_id. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86/resctrl: Remove the rdt_mon_features global variableTony Luck
rdt_mon_features is used as a bitmask of enabled monitor events. A monitor event's status is now maintained in mon_evt::enabled with all monitor events' mon_evt structures found in the filesystem's mon_event_all[] array. Remove the remaining uses of rdt_mon_features. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-09-15x86,fs/resctrl: Replace architecture event enabled checksTony Luck
The resctrl file system now has complete knowledge of the status of every event. So there is no need for per-event function calls to check. Replace each of the resctrl_arch_is_{event}enabled() calls with resctrl_is_mon_event_enabled(QOS_{EVENT}). No functional change. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-16x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrlJames Morse
Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16fs/resctrl: Add boiler plate for external resctrl codeJames Morse
Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com
2025-05-16x86/resctrl: Split trace.hJames Morse
trace.h contains all the tracepoints. After the move to /fs/resctrl, some of these will be left behind. All the pseudo_lock tracepoints remain part of the architecture. The lone tracepoint in monitor.c moves to /fs/resctrl. Split trace.h so that each C file includes a different trace header file. This means the trace header files are not modified when they are moved. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-14-james.morse@arm.com
2025-05-16x86/resctrl: Add end-marker to the resctrl_event_id enumJames Morse
The resctrl_event_id enum gives names to the counter event numbers on x86. These are used directly by resctrl. To allow the MPAM driver to keep an array of these the size of the enum needs to be known. Add a 'num_events' enum entry which can be used to size an array. This is added to the enum to reduce conflicts with another series, which in turn requires get_arch_mbm_state() to have a default case. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-12-james.morse@arm.com
2025-05-15x86/resctrl: Drop __init/__exit on assorted symbolsJames Morse
Because ARM's MPAM controls are probed using MMIO, resctrl can't be initialised until enough CPUs are online to have determined the system-wide supported num_closid. Arm64 also supports 'late onlined secondaries', where only a subset of CPUs are online during boot. These two combine to mean the MPAM driver may not be able to initialise resctrl until user-space has brought 'enough' CPUs online. To allow MPAM to initialise resctrl after __init text has been free'd, remove all the __init markings from resctrl. The existing __exit markings cause these functions to be removed by the linker as it has never been possible to build resctrl as a module. MPAM has an error interrupt which causes the driver to reset and disable itself. Remove the __exit markings to allow the MPAM driver to tear down resctrl when an error occurs. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-10-james.morse@arm.com
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-12x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callersJames Morse
Each of get_{mon,ctrl}_domain_from_cpu() only has one caller. Once the filesystem code is moved to /fs/, there is no equivalent to core.c. Move these functions to each live next to their caller. This allows them to be made static and the header file entries to be removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-31-james.morse@arm.com
2025-03-12x86/resctrl: Move mbm_cfg_mask to struct rdt_resourceJames Morse
The mbm_cfg_mask field lists the bits that user-space can set when configuring an event. This value is output via the last_cmd_status file. Once the filesystem parts of resctrl are moved to live in /fs/, the struct rdt_hw_resource is inaccessible to the filesystem code. Because this value is output to user-space, it has to be accessible to the filesystem code. Move it to struct rdt_resource. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-23-james.morse@arm.com
2025-03-12x86/resctrl: Move mba_mbps_default_event init to filesystem codeJames Morse
mba_mbps_default_event is initialised based on whether mbm_local or mbm_total is supported. In the case of both, it is initialised to mbm_local. mba_mbps_default_event is initialised in core.c's get_rdt_mon_resources(), while all the readers are in rdtgroup.c. After this code is split into architecture-specific and filesystem code, get_rdt_mon_resources() remains part of the architecture code, which would mean mba_mbps_default_event has to be exposed by the filesystem code. Move the initialisation to the filesystem's resctrl_mon_resource_init(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-22-james.morse@arm.com
2025-03-12x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMECJames Morse
When BMEC is supported the resctrl event can be configured in a number of ways. This depends on architecture support. rdt_get_mon_l3_config() modifies the struct mon_evt and calls resctrl_file_fflags_init() to create the files that allow the configuration. Splitting this into separate architecture and filesystem parts would require the struct mon_evt and resctrl_file_fflags_init() to be exposed. Instead, add resctrl_arch_is_evt_configurable(), and use this from resctrl_mon_resource_init() to initialise struct mon_evt and call resctrl_file_fflags_init(). resctrl_arch_is_evt_configurable() calls rdt_cpu_has() so it doesn't obviously benefit from being inlined. Putting it in core.c will allow rdt_cpu_has() to eventually become static. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-20-james.morse@arm.com
2025-03-12x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.hJames Morse
The architecture specific parts of resctrl provide helpers like is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses to the rdt_mon_features bitmap. Exposing a group of helpers between the architecture and filesystem code is preferable to a single unsigned-long like rdt_mon_features. Helpers can be more readable and have a well defined behaviour, while allowing architectures to hide more complex behaviour. Once the filesystem parts of resctrl are moved, these existing helpers can no longer live in internal.h. Move them to include/linux/resctrl.h Once these are exposed to the wider kernel, they should have a 'resctrl_arch_' prefix, to fit the rest of the arch<->fs interface. Move and rename the helpers that touch rdt_mon_features directly. is_mbm_event() and is_mbm_enabled() are only called from rdtgroup.c, so can be moved into that file. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-19-james.morse@arm.com
2025-03-12x86/resctrl: Move monitor init work to a resctrl init callJames Morse
rdt_get_mon_l3_config() is called from the arch's resctrl_arch_late_init(), and initialises both architecture specific fields, such as hw_res->mon_scale and resctrl filesystem fields by calling dom_data_init(). To separate the filesystem and architecture parts of resctrl, this function needs splitting up. Add resctrl_mon_resource_init() to do the filesystem specific work, and call it from resctrl_init(). This runs later, but is still before the filesystem is mounted and the rmid_ptrs[] array can be used. [ bp: Massage commit message. ] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-17-james.morse@arm.com
2025-03-12x86/resctrl: Move monitor exit work to a resctrl exit callJames Morse
rdt_put_mon_l3_config() is called via the architecture's resctrl_arch_exit() call, and appears to free the rmid_ptrs[] and closid_num_dirty_rmid[] arrays. In reality this code is marked __exit, and is removed by the linker as resctrl can't be built as a module. To separate the filesystem and architecture parts of resctrl, this free()ing work needs to be triggered by the filesystem, as these structures belong to the filesystem code. Rename rdt_put_mon_l3_config() to resctrl_mon_resource_exit() and call it from resctrl_exit(). The kfree() is currently dependent on r->mon_capable. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-16-james.morse@arm.com
2025-03-12x86/resctrl: Expose resctrl fs's init function to the rest of the kernelJames Morse
rdtgroup_init() needs exposing to the rest of the kernel so that arch code can call it once it lives in core code. As this is one of the few functions exposed, rename it to have "resctrl" in the name. The same goes for the exit call. Rename x86's arch code init functions for RDT to have an arch prefix to make it clear these are part of the architecture code. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-12-james.morse@arm.com
2025-03-12x86/resctrl: Add a helper to avoid reaching into the arch code resource listJames Morse
Resctrl occasionally wants to know something about a specific resource, in these cases it reaches into the arch code's rdt_resources_all[] array. Once the filesystem parts of resctrl are moved to /fs/, this means it will need visibility of the architecture specific struct rdt_hw_resource definition, and the array of all resources. All architectures would also need a r_resctrl member in this struct. Instead, abstract this via a helper to allow architectures to do different things here. Move the level enum to the resctrl header and add a helper to retrieve the struct rdt_resource by 'rid'. resctrl_arch_get_resource() should not return NULL for any value in the enum, it may instead return a dummy resource that is !alloc_enabled && !mon_enabled. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20250311183715.16445-3-james.morse@arm.com
2024-12-10x86/resctrl: Compute memory bandwidth for all supported eventsTony Luck
Switching between local and total memory bandwidth events as the input to the mba_sc feedback loop would be cumbersome and take effect slowly in the current implementation as the bandwidth is only known after two consecutive readings of the same event. Compute the bandwidth for all supported events. This doesn't add significant overhead and will make changing which event is used simple. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
2024-12-10x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group eventTony Luck
update_mba_bw() hard codes use of the memory bandwidth local event which prevents more flexible options from being deployed. Change this function to use the event specified in the rdtgroup that is being processed. Mount time checks for the "mba_MBps" option ensure that local memory bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
2024-12-09x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflagsBabu Moger
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize fflags for resctrl files. Adding new files will involve adding another function to initialize the fflags. This can be simplified by adding a new function resctrl_file_fflags_init() and passing the file name and flags to be initialized. Consolidate fflags initialization into resctrl_file_fflags_init() and remove thread_throttle_mode_init() and mbm_config_rftype_init(). [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at run time. ] Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
2024-11-06x86/resctrl: Support Sub-NUMA cluster mode SNC6Tony Luck
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some Intel platforms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
2024-07-16Merge tag 'x86_cache_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Enable Sub-NUMA clustering to work with resource control on Intel by teaching resctrl to handle scopes due to the clustering which partitions the L3 cache into sets. Modify and extend the subsystem to handle such scopes properly * tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Update documentation with Sub-NUMA cluster changes x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems x86/resctrl: Make __mon_event_count() handle sum domains x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Allocate a new field in union mon_data_bits x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function x86/resctrl: Initialize on-stack struct rmid_read instances x86/resctrl: Add a new field to struct rmid_read for summation of domains x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems x86/resctrl: Introduce snc_nodes_per_l3_cache x86/resctrl: Add node-scope to the options for feature scope x86/resctrl: Split the rdt_domain and rdt_hw_domain structures x86/resctrl: Prepare for different scope for control/monitor operations x86/resctrl: Prepare to split rdt_domain structure x86/resctrl: Prepare for new domain scope
2024-07-02x86/resctrl: Detect Sub-NUMA Cluster (SNC) modeTony Luck
There isn't a simple hardware bit that indicates whether a CPU is running in Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of CPUs sharing the L3 cache with CPU0 to the number of CPUs in the same NUMA node as CPU0. Add the missing definition of pr_fmt() to monitor.c. This wasn't noticed before as there are only "can't happen" console messages from this file. [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-19-tony.luck@intel.com
2024-07-02x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systemsTony Luck
Hardware has two RMID configuration options for SNC systems. The default mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199 on node 1. This isn't compatible with Linux resctrl usage. On this example system a process using RMID 5 would only update monitor counters while running on SNC node 0. The other mode is "RMID Sharing Mode". This is enabled by clearing bit 0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode the number of logical RMIDs is the number of physical RMIDs (from CPUID leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A process can use the same RMID across different SNC nodes. See the "Intel Resource Director Technology Architecture Specification" for additional details. When SNC is enabled, update the MSR when a monitor domain is marked online. Technically this is overkill. It only needs to be done once per L3 cache instance rather than per SNC domain. But there is no harm in doing it more than once, and this is not in a critical path. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
2024-07-02x86/resctrl: Make __mon_event_count() handle sum domainsTony Luck
Legacy resctrl monitor files must provide the sum of event values across all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance. There are now two cases: 1) A specific domain is provided in struct rmid_read This is either a non-SNC system, or the request is to read data from just one SNC node. 2) Domain pointer is NULL. In this case the cacheinfo field in struct rmid_read indicates that all SNC nodes that share that L3 cache instance should have the event read and return the sum of all values. Update the CPU sanity check. The existing check that an event is read from a CPU in the requested domain still applies when reading a single domain. But when summing across domains a more relaxed check that the current CPU is in the scope of the L3 cache instance is appropriate since the MSRs to read events are scoped at L3 cache level. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-17-tony.luck@intel.com
2024-07-02x86/resctrl: Initialize on-stack struct rmid_read instancesTony Luck
New semantics rely on some struct rmid_read members having NULL values to distinguish between the SNC and non-SNC scenarios. resctrl can thus no longer rely on this struct not being initialized properly. Initialize all on-stack declarations of struct rmid_read: rdtgroup_mondata_show() mbm_update() mkdir_mondata_subdir() to ensure that garbage values from the stack are not passed down to other functions. [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-11-tony.luck@intel.com
2024-07-02x86/resctrl: Introduce snc_nodes_per_l3_cacheTony Luck
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores and memory controllers on a socket into two or more groups. These are presented to the operating system as NUMA nodes. This may enable some workloads to have slightly lower latency to memory as the memory controller(s) in an SNC node are electrically closer to the CPU cores on that SNC node. This cost may be offset by lower bandwidth since the memory accesses for each core can only be interleaved between the memory controllers on the same SNC node. Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks to track L3 cache occupancy and memory bandwidth. There is an MSR that controls how the RMIDs are shared between SNC nodes. The default mode divides them numerically. E.g. when there are two SNC nodes on a socket the lower number half of the RMIDs are given to the first node, the remainder to the second node. This would be difficult to use with the Linux resctrl interface as specific RMID values assigned to resctrl groups are not visible to users. RMID sharing mode divides the physical RMIDs evenly between SNC nodes but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two SNC nodes per L3 cache instance would have 100 logical RMIDs available for Linux to use. A task running on SNC node 0 with RMID 5 would accumulate LLC occupancy and MBM bandwidth data in physical RMID 5. Another task using RMID 5, but running on SNC node 1 would accumulate data in physical RMID 105. Even with this renumbering SNC mode requires several changes in resctrl behavior for correct operation. Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate how many SNC domains share an L3 cache instance. Initialize this to "1". Runtime detection of SNC mode will adjust this value. Update all places to take appropriate action when SNC mode is enabled: 1) The number of logical RMIDs per L3 cache available for use is the number of physical RMIDs divided by the number of SNC nodes. 2) Likewise the "mon_scale" value must be divided by the number of SNC nodes. 3) Add a function to convert from logical RMID values (assigned to tasks and loaded into the IA32_PQR_ASSOC MSR on context switch) to physical RMID values to load into IA32_QM_EVTSEL MSR when reading counters on each SNC node. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-7-tony.luck@intel.com
2024-07-02x86/resctrl: Split the rdt_domain and rdt_hw_domain structuresTony Luck
The same rdt_domain structure is used for both control and monitor functions. But this results in wasted memory as some of the fields are only used by control functions, while most are only used for monitor functions. Split into separate rdt_ctrl_domain and rdt_mon_domain structures with just the fields required for control and monitoring respectively. Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain and rdt_hw_mon_domain. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-5-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare for different scope for control/monitor operationsTony Luck
Resctrl assumes that control and monitor operations on a resource are performed at the same scope. Prepare for systems that use different scope (specifically Intel needs to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control and NODE scope for cache occupancy and memory bandwidth monitoring). Create separate domain lists for control and monitor operations. Note that errors during initialization of either control or monitor functions on a domain would previously result in that domain being excluded from both control and monitor operations. Now the domains are allocated independently it is no longer required to disable both control and monitor operations if either fail. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-4-tony.luck@intel.com
2024-07-02x86/resctrl: Prepare to split rdt_domain structureTony Luck
The rdt_domain structure is used for both control and monitor features. It is about to be split into separate structures for these two usages because the scope for control and monitoring features for a resource will be different for future resources. To allow for common code that scans a list of domains looking for a specific domain id, move all the common fields ("list", "id", "cpu_mask") into their own structure within the rdt_domain structure. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/r/20240628215619.76401-3-tony.luck@intel.com