- Nov 06, 2019
-
-
Alex Deucher authored
Clarify some areas, clean up formatting, add section for unrecoverable error handling. v2: fix grammatical errors Reviewed-by:
Yong Zhao <yong.zhao@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Tariq Toukan authored
Add TLS TX counter description for the handshake retransmitted packets that triggers the resync procedure then skip it, going into the regular transmit flow. Fixes: 46a3ea98 ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by:
Tariq Toukan <tariqt@mellanox.com> Signed-off-by:
Saeed Mahameed <saeedm@mellanox.com> Acked-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Nov 01, 2019
-
-
Jonathan Neuschäfer authored
These asterisks were once references to a line that said: "* Other names and brands may be claimed as the property of others." But now, they serve no purpose; they can only irritate the reader. Fixes: de3edab4 ("e1000: update README for e1000") Fixes: a3fb6568 ("e100.txt: Cleanup license info in kernel doc") Fixes: da8c01c4 ("e1000e.txt: Add e1000e documentation") Fixes: f12a84a9 ("Documentation: fm10k: Add kernel documentation") Fixes: b55c52b1 ("igb.txt: Add igb documentation") Fixes: c4e9b56e ("igbvf.txt: Add igbvf Documentation") Fixes: d7064f4c ("Documentation/networking/: Update Intel wired LAN driver documentation") Fixes: c4b8c011 ("ixgbevf.txt: Update ixgbevf documentation") Fixes: 1e06edcc ("Documentation: i40e: Prepare documentation for RST conversion") Fixes: 105bf2fe ("i40evf: add driver to kernel build system") Fixes: 1fae869b ("Documentation: ice: Prepare documentation for RST conversion") Fixes: df69ba43 ("ionic: Add basic framework for IONIC Network device driver") Signed-off-by:
Jonathan Neuschäfer <j.neuschaefer@gmx.net> Tested-by:
Aaron Brown <aaron.f.brown@intel.com> Signed-off-by:
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Florian Fainelli authored
The Broadcom Brahma-B53 core is susceptible to the issue described by ARM64_ERRATUM_843419 so this commit enables the workaround to be applied when executing on that core. Since there are now multiple entries to match, we must convert the existing ARM64_ERRATUM_843419 into an erratum list and use cpucap_multi_entry_cap_matches to match our entries. Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
Will Deacon <will@kernel.org>
-
Doug Berger authored
The Broadcom Brahma-B53 core is susceptible to the issue described by ARM64_ERRATUM_845719 so this commit enables the workaround to be applied when executing on that core. Since there are now multiple entries to match, we must convert the existing ARM64_ERRATUM_845719 into an erratum list. Signed-off-by:
Doug Berger <opendmb@gmail.com> Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
Will Deacon <will@kernel.org>
-
- Oct 31, 2019
-
-
Eric Dumazet authored
tcp_max_syn_backlog default value depends on memory size and TCP ehash size. Before this patch, the max value was 2048 [1], which is considered too small nowadays. Increase it to 4096 to match the recent SOMAXCONN change. [1] This is with TCP ehash size being capped to 524288 buckets. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
SOMAXCONN is /proc/sys/net/core/somaxconn default value. It has been defined as 128 more than 20 years ago. Since it caps the listen() backlog values, the very small value has caused numerous problems over the years, and many people had to raise it on their hosts after beeing hit by problems. Google has been using 1024 for at least 15 years, and we increased this to 4096 after TCP listener rework has been completed, more than 4 years ago. We got no complain of this change breaking any legacy application. Many applications indeed setup a TCP listener with listen(fd, -1); meaning they let the system select the backlog. Raising SOMAXCONN lowers chance of the port being unavailable under even small SYNFLOOD attack, and reduces possibilities of side channel vulnerabilities. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Bjorn Andersson authored
The Kryo cores share errata 1009 with Falkor, so add their model definitions and enable it for them as well. Signed-off-by:
Bjorn Andersson <bjorn.andersson@linaro.org> [will: Update entry in silicon-errata.rst] Signed-off-by:
Will Deacon <will@kernel.org>
-
- Oct 28, 2019
-
-
Thomas Zimmermann authored
The TODO item is misleading and makes it seem as if fbdev emulation cannot be used with SHMEM. Rephrase the text to describe the current situation more correctly. Signed-off-by:
Thomas Zimmermann <tzimmermann@suse.de> Acked-by:
Noralf Trønnes <noralf@tronnes.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191025092759.13069-3-tzimmermann@suse.de
-
- Oct 25, 2019
-
-
Anna Karas authored
Update references to reservation.c and reservation.h since these files have been renamed to dma-resv.c and dma-resv.h respectively. Cc: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/323401/?series=65037&rev=1 Signed-off-by:
Anna Karas <anna.karas@intel.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190927111504.20136-1-anna.karas@intel.com
-
Anna Karas authored
Update header files containing i915_perf_stream, i915_perf_stream_ops and i915_oa_ops definitions since they have been moved from i915_drv.h to i915_perf_types.h. Cc: Robert Bragg <robert@sixbynine.org> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by:
Anna Karas <anna.karas@intel.com> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191022100906.16597-1-anna.karas@intel.com
-
- Oct 23, 2019
-
-
Rob Herring authored
Fix the errors in the RiscV CPU DT schema: Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@1: 'timebase-frequency' is a required property Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible:0: 'riscv' is not one of ['sifive,rocket0', 'sifive,e5', 'sifive,e51', 'sifive,u54-mc', 'sifive,u54', 'sifive,u5'] Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: compatible: ['riscv'] is too short Documentation/devicetree/bindings/riscv/cpus.example.dt.yaml: cpu@0: 'timebase-frequency' is a required property The DT spec allows for 'timebase-frequency' to be in 'cpu' or 'cpus' node and RiscV requires it in /cpus node, so make it disallowed in cpu nodes. Fixes: 4fd669a8 ("dt-bindings: riscv: convert cpu binding to json-schema") Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Acked-by:
Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by:
Rob Herring <robh@kernel.org>
-
Daniel Vetter authored
Should help new people pick suitable tasks. Cc: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Sean Paul <sean@poorly.run> Reviewed-by:
Sean Paul <sean@poorly.run> Acked-by:
Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by:
Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022152530.22038-2-daniel.vetter@ffwll.ch
-
Daniel Vetter authored
Done with commit aef9f33b Author: Imre Deak <imre.deak@intel.com> Date: Tue Oct 23 17:43:10 2018 +0300 drm/i915: Ensure proper HDA suspend/resume ordering with a device link Cc: Imre Deak <imre.deak@intel.com> Reviewed-by:
Sean Paul <sean@poorly.run> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20191022152530.22038-1-daniel.vetter@ffwll.ch
-
- Oct 17, 2019
-
-
Daniele Ceraolo Spurio authored
Better explain the usage of the microcontroller and what i915 is responsible of. While at it, fix the documentation for the auth function, which doesn't do any pinning anymore. v2: add a comment on HuC being optional and descrive how HuC accesses memory (Martin) v3: add extra newline for better text organization (Martin) Signed-off-by:
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Acked-by:
Anna Karas <anna.karas@intel.com> Reviewed-by:
Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014183602.3643-3-daniele.ceraolospurio@intel.com
-
Daniele Ceraolo Spurio authored
Add a short description of what we expect from GuC and some minor improvements to existing documentation. Also remove a comment about a difference between GuC and HuC that is not true anymore. v2: add that the GuC is not mandatory (Martin) v3: add extra newline for better text organization (Martin) Signed-off-by:
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Acked-by:
Anna Karas <anna.karas@intel.com> Reviewed-by:
Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014183602.3643-2-daniele.ceraolospurio@intel.com
-
Daniele Ceraolo Spurio authored
To better organize the information, add a microcontrollers section and move/link the GuC, HuC and DMC documentation under it. Also add a small intro. Signed-off-by:
Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by:
Anna Karas <anna.karas@intel.com> Reviewed-by:
Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014183602.3643-1-daniele.ceraolospurio@intel.com
-
Wen He authored
Add optional property node 'arm,malidp-arqos-value' for the Mali DP500. This property describe the ARQoS levels of DP500's QoS signaling. Signed-off-by:
Wen He <wen.he_1@nxp.com> Reviewed-by:
Rob Herring <robh@kernel.org> Signed-off-by:
Liviu Dudau <liviu.dudau@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190910075913.17650-1-wen.he_1@nxp.com
-
Thomas Zimmermann authored
The DRM TODO list now contains an entry for converting fbdev drivers over to DRM. Signed-off-by:
Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20191017074705.9140-2-tzimmermann@suse.de
-
- Oct 16, 2019
-
-
Andrew Jeffery authored
Rename SD3 functions and groups to EMMC to better reflect their intended use before the binding escapes too far into the wild. Also clean up the SD3 pin groups to eliminate some silliness that slipped through the cracks (SD3DAT[4-7]) by unifying them into three new groups: EMMCG1, EMMCG4 and EMMCG8 for 1, 4 and 8-bit data buses respectively. Signed-off-by:
Andrew Jeffery <andrew@aj.id.au> Link: https://lore.kernel.org/r/20191008044153.12734-2-andrew@aj.id.au Reviewed-by:
Rob Herring <robh@kernel.org> Reviewed-by:
Joel Stanley <joel@jms.id.au> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org>
-
- Oct 15, 2019
-
-
Biju Das authored
Document RZ/G2N (R8A774B1) SoC bindings. Signed-off-by:
Biju Das <biju.das@bp.renesas.com> Reviewed-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
-
Biju Das authored
Document the RZ/G2N (R8A774B1) LVDS bindings. Signed-off-by:
Biju Das <biju.das@bp.renesas.com> Reviewed-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by:
Rob Herring <robh@kernel.org>
-
Biju Das authored
Document the RZ/G2N (R8A774B1) SoC in the R-Car DU bindings. Signed-off-by:
Biju Das <biju.das@bp.renesas.com> Reviewed-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by:
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Acked-by:
Rob Herring <robh@kernel.org>
-
- Oct 14, 2019
-
-
Vlastimil Babka authored
Commit 8974558f ("mm, page_owner, debug_pagealloc: save and dump freeing stack trace") enhanced page_owner to also store freeing stack trace, when debug_pagealloc is also enabled. KASAN would also like to do this [1] to improve error reports to debug e.g. UAF issues. Kirill has suggested that the freeing stack trace saving should be also possible to be enabled separately from KASAN or debug_pagealloc, i.e. with an extra boot option. Qian argued that we have enough options already, and avoiding the extra overhead is not worth the complications in the case of a debugging option. Kirill noted that the extra stack handle in struct page_owner requires 0.1% of memory. This patch therefore enables free stack saving whenever page_owner is enabled, regardless of whether debug_pagealloc or KASAN is also enabled. KASAN kernels booted with page_owner=on will thus benefit from the improved error reports. [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967 [vbabka@suse.cz: v3] Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Qian Cai <cai@lca.pw> Suggested-by:
Dmitry Vyukov <dvyukov@google.com> Suggested-by:
Walter Wu <walter-zh.wu@mediatek.com> Suggested-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Suggested-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by:
Qian Cai <cai@lca.pw> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 11, 2019
-
-
Joe Perches authored
Describe the fallthrough pseudo-keyword. Convert the coding-style.rst example to the keyword style. Add description and links to deprecated.rst. Miguel Ojeda comments on the eventual [[fallthrough]] syntax: "Note that C17/C18 does not have [[fallthrough]]. C++17 introduced it, as it is mentioned above. I would keep the __attribute__((fallthrough)) -> [[fallthrough]] change you did, though, since that is indeed the standard syntax (given the paragraph references C++17). I was told by Aaron Ballman (who is proposing them for C) that it is more or less likely that it becomes standardized in C2x. However, it is still not added to the draft (other attributes are already, though). See N2268 and N2269: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf (fallthrough) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2269.pdf (attributes in general)" Signed-off-by:
Joe Perches <joe@perches.com> Acked-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ezequiel Garcia authored
Add the register specifier description for an optional gamma LUT address. Reviewed-by:
Douglas Anderson <dianders@chromium.org> Reviewed-by:
Rob Herring <robh@kernel.org> Tested-by:
Heiko Stuebner <heiko@sntech.de> Signed-off-by:
Ezequiel Garcia <ezequiel@collabora.com> Signed-off-by:
Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191010194351.17940-2-ezequiel@collabora.com
-
- Oct 10, 2019
-
-
Jacob Keller authored
Commit 8960b389 ("linux/dim: Rename externally used net_dim members") renamed the net_dim API, removing the "net_" prefix from the structures and functions. The patch didn't update the net_dim.txt documentation file. Fix the documentation so that its examples match the current code. Fixes: 8960b389 ("linux/dim: Rename externally used net_dim members", 2019-06-25) Fixes: c002bd52 ("linux/dim: Rename externally exposed macros", 2019-06-25) Fixes: 4f75da36 ("linux/dim: Move implementation to .c files") Cc: Tal Gilboa <talgi@mellanox.com> Signed-off-by:
Jacob Keller <jacob.e.keller@intel.com> Signed-off-by:
Jakub Kicinski <jakub.kicinski@netronome.com>
-
- Oct 09, 2019
-
-
Sean Paul authored
Fixes the following warning: ../include/drm/drm_atomic_state_helper.h:1: warning: no structured comments found Fixes: 9ef8a9dc ("drm: Extract drm_atomic_state_helper.[hc]") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Reviewed-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191007151921.27099-1-sean@poorly.run
-
Randy Dunlap authored
Fix documentation build warnings for Pensando ionic: Documentation/networking/device_drivers/pensando/ionic.rst:39: WARNING: Unexpected indentation. Documentation/networking/device_drivers/pensando/ionic.rst:43: WARNING: Unexpected indentation. Fixes: df69ba43 ("ionic: Add basic framework for IONIC Network device driver") Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Acked-by:
Shannon Nelson <snelson@pensando.io> Signed-off-by:
Jakub Kicinski <jakub.kicinski@netronome.com>
-
- Oct 08, 2019
-
-
Sam Ravnborg authored
There is finally no more users left in the kernel of drmP.h and drm_os_linux.h (drmP.h was the only user left). Delete the header files and delete the corresponding todo entry. When we started this quest there was more than 700 users of drmP.h. And drmP.h was a huge cover-it-all header file. Daniel Vetter is the one that followed the work from start to the end and in between many people have contributed to the removal process - thanks to everyone! Signed-off-by:
Sam Ravnborg <sam@ravnborg.org> Reviewed-by:
Sean Paul <sean@poorly.run> Reviewed-by:
Lyude Paul <lyude@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20191007171224.1581-3-sam@ravnborg.org
-
Masahiro Yamada authored
We discussed a better location for this file, and agreed that core-api/ is a good fit. Rename it to symbol-namespaces.rst for disambiguation, and also add it to index.rst and MAINTAINERS. Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by:
Matthias Maennich <maennich@google.com> Signed-off-by:
Jessica Yu <jeyu@kernel.org>
-
Marc Zyngier authored
Allow the user to select the workaround for TX2-219, and update the silicon-errata.rst file to reflect this. Cc: <stable@vger.kernel.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Will Deacon <will@kernel.org>
-
- Oct 07, 2019
-
-
Vlastimil Babka authored
In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/ [3] https://lwn.net/Articles/787740/ [akpm@linux-foundation.org: documentation fixlet, per Matthew] Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
Christoph Hellwig <hch@lst.de> Cc: David Sterba <dsterba@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: "Darrick J . Wong" <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Chris Down authored
cgroup v2 introduces two memory protection thresholds: memory.low (best-effort) and memory.min (hard protection). While they generally do what they say on the tin, there is a limitation in their implementation that makes them difficult to use effectively: that cliff behaviour often manifests when they become eligible for reclaim. This patch implements more intuitive and usable behaviour, where we gradually mount more reclaim pressure as cgroups further and further exceed their protection thresholds. This cliff edge behaviour happens because we only choose whether or not to reclaim based on whether the memcg is within its protection limits (see the use of mem_cgroup_protected in shrink_node), but we don't vary our reclaim behaviour based on this information. Imagine the following timeline, with the numbers the lruvec size in this zone: 1. memory.low=1000000, memory.current=999999. 0 pages may be scanned. 2. memory.low=1000000, memory.current=1000000. 0 pages may be scanned. 3. memory.low=1000000, memory.current=1000001. 1000001* pages may be scanned. (?!) * Of course, we won't usually scan all available pages in the zone even without this patch because of scan control priority, over-reclaim protection, etc. However, as shown by the tests at the end, these techniques don't sufficiently throttle such an extreme change in input, so cliff-like behaviour isn't really averted by their existence alone. Here's an example of how this plays out in practice. At Facebook, we are trying to protect various workloads from "system" software, like configuration management tools, metric collectors, etc (see this[0] case study). In order to find a suitable memory.low value, we start by determining the expected memory range within which the workload will be comfortable operating. This isn't an exact science -- memory usage deemed "comfortable" will vary over time due to user behaviour, differences in composition of work, etc, etc. As such we need to ballpark memory.low, but doing this is currently problematic: 1. If we end up setting it too low for the workload, it won't have *any* effect (see discussion above). The group will receive the full weight of reclaim and won't have any priority while competing with the less important system software, as if we had no memory.low configured at all. 2. Because of this behaviour, we end up erring on the side of setting it too high, such that the comfort range is reliably covered. However, protected memory is completely unavailable to the rest of the system, so we might cause undue memory and IO pressure there when we *know* we have some elasticity in the workload. 3. Even if we get the value totally right, smack in the middle of the comfort zone, we get extreme jumps between no pressure and full pressure that cause unpredictable pressure spikes in the workload due to the current binary reclaim behaviour. With this patch, we can set it to our ballpark estimation without too much worry. Any undesirable behaviour, such as too much or too little reclaim pressure on the workload or system will be proportional to how far our estimation is off. This means we can set memory.low much more conservatively and thus waste less resources *without* the risk of the workload falling off a cliff if we overshoot. As a more abstract technical description, this unintuitive behaviour results in having to give high-priority workloads a large protection buffer on top of their expected usage to function reliably, as otherwise we have abrupt periods of dramatically increased memory pressure which hamper performance. Having to set these thresholds so high wastes resources and generally works against the principle of work conservation. In addition, having proportional memory reclaim behaviour has other benefits. Most notably, before this patch it's basically mandatory to set memory.low to a higher than desirable value because otherwise as soon as you exceed memory.low, all protection is lost, and all pages are eligible to scan again. By contrast, having a gradual ramp in reclaim pressure means that you now still get some protection when thresholds are exceeded, which means that one can now be more comfortable setting memory.low to lower values without worrying that all protection will be lost. This is important because workingset size is really hard to know exactly, especially with variable workloads, so at least getting *some* protection if your workingset size grows larger than you expect increases user confidence in setting memory.low without a huge buffer on top being needed. Thanks a lot to Johannes Weiner and Tejun Heo for their advice and assistance in thinking about how to make this work better. In testing these changes, I intended to verify that: 1. Changes in page scanning become gradual and proportional instead of binary. To test this, I experimented stepping further and further down memory.low protection on a workload that floats around 19G workingset when under memory.low protection, watching page scan rates for the workload cgroup: +------------+-----------------+--------------------+--------------+ | memory.low | test (pgscan/s) | control (pgscan/s) | % of control | +------------+-----------------+--------------------+--------------+ | 21G | 0 | 0 | N/A | | 17G | 867 | 3799 | 23% | | 12G | 1203 | 3543 | 34% | | 8G | 2534 | 3979 | 64% | | 4G | 3980 | 4147 | 96% | | 0 | 3799 | 3980 | 95% | +------------+-----------------+--------------------+--------------+ As you can see, the test kernel (with a kernel containing this patch) ramps up page scanning significantly more gradually than the control kernel (without this patch). 2. More gradual ramp up in reclaim aggression doesn't result in premature OOMs. To test this, I wrote a script that slowly increments the number of pages held by stress(1)'s --vm-keep mode until a production system entered severe overall memory contention. This script runs in a highly protected slice taking up the majority of available system memory. Watching vmstat revealed that page scanning continued essentially nominally between test and control, without causing forward reclaim progress to become arrested. [0]: https://facebookmicrosites.github.io/cgroup2/docs/overview.html#case-study-the-fbtax2-project [akpm@linux-foundation.org: reflow block comments to fit in 80 cols] [chris@chrisdown.name: handle cgroup_disable=memory when getting memcg protection] Link: http://lkml.kernel.org/r/20190201045711.GA18302@chrisdown.name Link: http://lkml.kernel.org/r/20190124014455.GA6396@chrisdown.name Signed-off-by:
Chris Down <chris@chrisdown.name> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Reviewed-by:
Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Boris Ostrovsky authored
Currently execution of panic() continues until Xen's panic notifier (xen_panic_event()) is called at which point we make a hypercall that never returns. This means that any notifier that is supposed to be called later as well as significant part of panic() code (such as pstore writes from kmsg_dump()) is never executed. There is no reason for xen_panic_event() to be this last point in execution since panic()'s emergency_restart() will call into xen_emergency_restart() from where we can perform our hypercall. Nevertheless, we will provide xen_legacy_crash boot option that will preserve original behavior during crash. This option could be used, for example, if running kernel dumper (which happens after panic notifiers) is undesirable. Reported-by:
James Dingwall <james@dingwall.me.uk> Signed-off-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by:
Juergen Gross <jgross@suse.com>
-
Adam Zerella authored
Sphinx is generating a build warning as the title underline of this file is too short. Signed-off-by:
Adam Zerella <adam.zerella@gmail.com> Reviewed-by:
Jean Delvare <jdelvare@suse.de> Signed-off-by:
Guenter Roeck <linux@roeck-us.net>
-
Maxime Ripard authored
It turns out that what was thought to be the module clock was actually the clock meant to be used by the sensor, and isn't playing any role with the CSI controller itself. Let's drop that clock from our binding. Fixes: c5e8f4cc ("media: dt-bindings: media: Add Allwinner A10 CSI binding") Reported-by:
Chen-Yu Tsai <wens@csie.org> Signed-off-by:
Maxime Ripard <mripard@kernel.org>
-
Pragnesh Patel authored
$id doesn't match the actual filename, so update the $id Fixes: c5e8f4cc ("media: dt-bindings: media: Add Allwinner A10 CSI binding") Signed-off-by:
Pragnesh Patel <pragnesh.patel@sifive.com> Signed-off-by:
Maxime Ripard <mripard@kernel.org>
-
Yongqiang Niu authored
This patch add mutex description for mt8183 display Signed-off-by:
Yongqiang Niu <yongqiang.niu@mediatek.com> Acked-by:
Rob Herring <robh@kernel.org> Signed-off-by:
CK Hu <ck.hu@mediatek.com>
-
Yongqiang Niu authored
Update device tree binding documention for the display subsystem for Mediatek MT8183 SOCs Signed-off-by:
Yongqiang Niu <yongqiang.niu@mediatek.com> Reviewed-by: Rob Herring <robh at kernel.org> Signed-off-by:
CK Hu <ck.hu@mediatek.com>
-