Commit 9e122cc1 authored by David Hildenbrand's avatar David Hildenbrand Committed by Linus Torvalds
Browse files

memory-hotplug.rst: document the "auto-movable" online policy

Commit e83a437f ("mm/memory_hotplug: introduce "auto-movable" online
policy") introduced a new memory online policy to automatically select a
zone for memory blocks to be onlined.  It added a way to set the active
online policy and tunables for the auto-movable online policy.

Follow-up commits tweaked the "auto-movable" policy to also consider
memory device details when selecting zones for memory blocks to be
onlined.

Let's document the new toggles and how the two online policies we have
work.

[david@redhat.com: updates]
  Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com

Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com


Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent a8db400f
Loading
Loading
Loading
Loading
+121 −20
Original line number Diff line number Diff line
@@ -165,9 +165,8 @@ Or alternatively::

	% echo 1 > /sys/devices/system/memory/memoryXXX/online

The kernel will select the target zone automatically, usually defaulting to
``ZONE_NORMAL`` unless ``movable_node`` has been specified on the kernel
command line or if the memory block would intersect the ZONE_MOVABLE already.
The kernel will select the target zone automatically, depending on the
configured ``online_policy``.

One can explicitly request to associate an offline memory block with
ZONE_MOVABLE by::
@@ -198,6 +197,9 @@ Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or

	% echo online > /sys/devices/system/memory/auto_online_blocks

Similarly to manual onlining, with ``online`` the kernel will select the
target zone automatically, depending on the configured ``online_policy``.

Modifying the auto-online behavior will only affect all subsequently added
memory blocks only.

@@ -393,11 +395,16 @@ command line parameters are relevant:
======================== =======================================================
``memhp_default_state``	 configure auto-onlining by essentially setting
                         ``/sys/devices/system/memory/auto_online_blocks``.
``movable_node``	 configure automatic zone selection in the kernel. When
			 set, the kernel will default to ZONE_MOVABLE, unless
			 other zones can be kept contiguous.
``movable_node``	 configure automatic zone selection in the kernel when
			 using the ``contig-zones`` online policy. When
			 set, the kernel will default to ZONE_MOVABLE when
			 onlining a memory block, unless other zones can be kept
			 contiguous.
======================== =======================================================

See Documentation/admin-guide/kernel-parameters.txt for a more generic
description of these command line parameters.

Module Parameters
------------------

@@ -415,18 +422,112 @@ and they can be observed (and some even modified at runtime) via::
The following module parameters are currently defined:

================================ ===============================================
``memmap_on_memory``	 read-write: Allocate memory for the memmap from the
			 added memory block itself. Even if enabled, actual
			 support depends on various other system properties and
			 should only be regarded as a hint whether the behavior
			 would be desired.

			 While allocating the memmap from the memory block
			 itself makes memory hotplug less likely to fail and
			 keeps the memmap on the same NUMA node in any case, it
			 can fragment physical memory in a way that huge pages
			 in bigger granularity cannot be formed on hotplugged
``memmap_on_memory``		 read-write: Allocate memory for the memmap from
				 the added memory block itself. Even if enabled,
				 actual support depends on various other system
				 properties and should only be regarded as a
				 hint whether the behavior would be desired.

				 While allocating the memmap from the memory
				 block itself makes memory hotplug less likely
				 to fail and keeps the memmap on the same NUMA
				 node in any case, it can fragment physical
				 memory in a way that huge pages in bigger
				 granularity cannot be formed on hotplugged
				 memory.
``online_policy``		 read-write: Set the basic policy used for
				 automatic zone selection when onlining memory
				 blocks without specifying a target zone.
				 ``contig-zones`` has been the kernel default
				 before this parameter was added. After an
				 online policy was configured and memory was
				 online, the policy should not be changed
				 anymore.

				 When set to ``contig-zones``, the kernel will
				 try keeping zones contiguous. If a memory block
				 intersects multiple zones or no zone, the
				 behavior depends on the ``movable_node`` kernel
				 command line parameter: default to ZONE_MOVABLE
				 if set, default to the applicable kernel zone
				 (usually ZONE_NORMAL) if not set.

				 When set to ``auto-movable``, the kernel will
				 try onlining memory blocks to ZONE_MOVABLE if
				 possible according to the configuration and
				 memory device details. With this policy, one
				 can avoid zone imbalances when eventually
				 hotplugging a lot of memory later and still
				 wanting to be able to hotunplug as much as
				 possible reliably, very desirable in
				 virtualized environments. This policy ignores
				 the ``movable_node`` kernel command line
				 parameter and isn't really applicable in
				 environments that require it (e.g., bare metal
				 with hotunpluggable nodes) where hotplugged
				 memory might be exposed via the
				 firmware-provided memory map early during boot
				 to the system instead of getting detected,
				 added and onlined  later during boot (such as
				 done by virtio-mem or by some hypervisors
				 implementing emulated DIMMs). As one example, a
				 hotplugged DIMM will be onlined either
				 completely to ZONE_MOVABLE or completely to
				 ZONE_NORMAL, not a mixture.
				 As another example, as many memory blocks
				 belonging to a virtio-mem device will be
				 onlined to ZONE_MOVABLE as possible,
				 special-casing units of memory blocks that can
				 only get hotunplugged together. *This policy
				 does not protect from setups that are
				 problematic with ZONE_MOVABLE and does not
				 change the zone of memory blocks dynamically
				 after they were onlined.*
``auto_movable_ratio``		 read-write: Set the maximum MOVABLE:KERNEL
				 memory ratio in % for the ``auto-movable``
				 online policy. Whether the ratio applies only
				 for the system across all NUMA nodes or also
				 per NUMA nodes depends on the
				 ``auto_movable_numa_aware`` configuration.

				 All accounting is based on present memory pages
				 in the zones combined with accounting per
				 memory device. Memory dedicated to the CMA
				 allocator is accounted as MOVABLE, although
				 residing on one of the kernel zones. The
				 possible ratio depends on the actual workload.
				 The kernel default is "301" %, for example,
				 allowing for hotplugging 24 GiB to a 8 GiB VM
				 and automatically onlining all hotplugged
				 memory to ZONE_MOVABLE in many setups. The
				 additional 1% deals with some pages being not
				 present, for example, because of some firmware
				 allocations.

				 Note that ZONE_NORMAL memory provided by one
				 memory device does not allow for more
				 ZONE_MOVABLE memory for a different memory
				 device. As one example, onlining memory of a
				 hotplugged DIMM to ZONE_NORMAL will not allow
				 for another hotplugged DIMM to get onlined to
				 ZONE_MOVABLE automatically. In contrast, memory
				 hotplugged by a virtio-mem device that got
				 onlined to ZONE_NORMAL will allow for more
				 ZONE_MOVABLE memory within *the same*
				 virtio-mem device.
``auto_movable_numa_aware``	 read-write: Configure whether the
				 ``auto_movable_ratio`` in the ``auto-movable``
				 online policy also applies per NUMA
				 node in addition to the whole system across all
				 NUMA nodes. The kernel default is "Y".

				 Disabling NUMA awareness can be helpful when
				 dealing with NUMA nodes that should be
				 completely hotunpluggable, onlining the memory
				 completely to ZONE_MOVABLE automatically if
				 possible.

				 Parameter availability depends on CONFIG_NUMA.
================================ ===============================================

ZONE_MOVABLE