- Jun 15, 2021
-
-
Dan Williams authored
LIBNVDIMM device objects register sysfs power attributes despite nothing requiring that support. Clean up sysfs remove the power/ attribute group. This requires a device_create() and a device_register() usage to be converted to the device_initialize() + device_add() pattern. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379910795.2993820.10130417680551632288.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
CXL is a hotplug bus and arranges for nvdimm devices to be dynamically discovered and removed. The libnvdimm core manages shutdown of nvdimm security operations when the device is unregistered. That functionality is moved to nvdimm_delete() and invoked by the CXL-to-nvdimm glue code. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379910271.2993820.2955889139842401250.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
Register an 'nvdimm-bridge' device to act as an anchor for a libnvdimm bus hierarchy. Also, flesh out the cxl_bus definition to allow a cxl_nvdimm_bridge_driver to attach to the bridge and trigger the nvdimm-bus registration. The creation of the bridge is gated on the detection of a PMEM capable address space registered to the root. The bridge indirection allows the libnvdimm module to remain unloaded on platforms without PMEM support. Given that the probing of ACPI0017 is asynchronous to CXL endpoint devices, and the expectation that CXL endpoint devices register other PMEM resources on the 'CXL' nvdimm bus, a workqueue is added. The workqueue is needed to run bus_rescan_devices() outside of the device_lock() of the nvdimm-bridge device to rendezvous nvdimm resources as they arrive. For now only the bus is taken online/offline in the workqueue. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379909706.2993820.14051258608641140169.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
Enable devices on the 'cxl' bus to be attached to drivers. The initial user of this functionality is a driver for an 'nvdimm-bridge' device that anchors a libnvdimm hierarchy attached to CXL persistent memory resources. Other device types that will leverage this include: cxl_port: map and use component register functionality (HDM Decoders) cxl_nvdimm: translate CXL memory expander endpoints to libnvdimm 'nvdimm' objects cxl_region: translate CXL interleave sets to libnvdimm 'region' objects The pairing of devices to drivers is handled through the cxl_device_id() matching to cxl_driver.id values. A cxl_device_id() of '0' indicates no driver support. In addition to ->match(), ->probe(), and ->remove() support for the 'cxl' bus introduce MODULE_ALIAS_CXL() to autoload modules containing cxl-drivers. Drivers are added in follow-on changes. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162379909190.2993820.6134168109678004186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ben Widawsky authored
Some of the commands have already been defined for the support of RAW commands (to be blocked). Unlike their usage in the RAW interface, when used through the supported interface, they will be coordinated and marshalled along with other commands being issued by userspace and the driver itself. That coordination will be added later. The list of commands was determined based on the learnings from libnvdimm and this list is provided directly from Dan. Recommended-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Ben Widawsky <ben.widawsky@intel.com> Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20210413140907.534404-1-ben.widawsky@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 12, 2021
-
-
Ben Widawsky authored
The CXL.cache and CXL.mem registers begin after the CXL.io registers which occupy the first 0x1000 bytes. The current code wasn't setting this up properly for future users of the component registers. It was correct for the probing code however. Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Fixes: 08422378 ("cxl/pci: Add HDM decoder capabilities") Signed-off-by:
Ben Widawsky <ben.widawsky@intel.com> Acked-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20210611051113.224328-1-ben.widawsky@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ben Widawsky authored
The decoder count in the HDM decoder capability structure is an encoded field. As defined in the spec: Decoder Count: Reports the number of memory address decoders implemented by the component. 0 – 1 Decoder 1 – 2 Decoders 2 – 4 Decoders 3 – 6 Decoders 4 – 8 Decoders 5 – 10 Decoders All other values are reserved Nothing is actually fixed by this as nothing actually used this mapping yet. Cc: Ira Weiny <ira.weiny@intel.com> Fixes: 08422378 ("cxl/pci: Add HDM decoder capabilities") Acked-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/20210611190111.121295-1-ben.widawsky@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 10, 2021
-
-
Dan Williams authored
A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │ ├── decoder1.0 │ │ ├── devtype │ │ ├── locked │ │ ├── size │ │ ├── start │ │ ├── subsystem -> ../../../../../../bus/cxl │ │ ├── target_list │ │ ├── target_type │ │ └── uevent │ ├── devtype │ ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │ ├── subsystem -> ../../../../../bus/cxl │ ├── uevent │ └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
While the resources enumerated by the CEDT.CFMWS identify a cxl_port with host bridges as downstream ports, host bridges themselves are upstream ports that decode to downstream ports represented by PCIe Root Ports. Walk the PCIe Root Ports connected to a CXL Host Bridge, identified by the ACPI0016 _HID, and add each one as a cxl_dport of the host bridge cxl_port. For now, component registers are not enumerated, only the first order uport / dport relationships. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325451145.2293126.10149150938788969381.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
In preparation for infrastructure that enumerates and configures the CXL decode mechanism of an upstream port to its downstream ports, add a representation of a CXL downstream port. On ACPI systems the top-most logical downstream ports in the hierarchy are the host bridges (ACPI0016 devices) that decode the memory windows described by the CXL Early Discovery Table Fixed Memory Window Structures (CEDT.CFMWS). Reviewed-by:
Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/162325450624.2293126.3533006409920271718.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
CONFIG_CXL_BUS is default 'n' as expected for new functionality. When that is enabled do not make the end user hunt for all the expected sub-options to enable. For example CONFIG_CXL_BUS without CONFIG_CXL_MEM is an odd/expert configuration, so is CONFIG_CXL_MEM without CONFIG_CXL_ACPI (on ACPI capable platforms). Default CONFIG_CXL_MEM and CONFIG_CXL_ACPI to CONFIG_CXL_BUS. Acked-by:
Ben Widawsky <ben.widawsky@intel.com> Acked-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325450105.2293126.17046356425194082921.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Dan Williams authored
While CXL builds upon the PCI software model for enumeration and endpoint control, a static platform component is required to bootstrap the CXL memory layout. Similar to how ACPI identifies root-level PCI memory resources, ACPI data enumerates the address space and interleave configuration for CXL Memory. In addition to identifying host bridges, ACPI is responsible for enumerating the CXL memory space that can be addressed by downstream decoders. This is similar to the requirement for ACPI to publish resources via the _CRS method for PCI host bridges. Specifically, ACPI publishes a table, CXL Early Discovery Table (CEDT), which includes a list of CXL Memory resources, CXL Fixed Memory Window Structures (CFMWS). For now, introduce the core infrastructure for a cxl_port hierarchy starting with a root level anchor represented by the ACPI0017 device. Follow on changes model support for the configurable decode capabilities of cxl_port instances, i.e. CXL switch support. Co-developed-by:
Alison Schofield <alison.schofield@intel.com> Signed-off-by:
Alison Schofield <alison.schofield@intel.com> Acked-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325449515.2293126.15303270193010154608.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 07, 2021
-
-
Wei Ming Chen authored
ACPICA commit 2296edd39b4ce2d2dd691c1f309c4da00843ecc9 Replace /* FALLTHROUGH */ comment with ACPI_FALLTHROUGH Link: https://github.com/acpica/acpica/commit/2296edd3 Signed-off-by:
Wei Ming Chen <jj251510319013@gmail.com> Signed-off-by:
Bob Moore <robert.moore@intel.com> Signed-off-by:
Erik Kaneda <erik.kaneda@intel.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Erik Kaneda authored
ACPICA commit 180cb53963aa876c782a6f52cc155d951b26051a According to the ACPI spec, _CID returns a package containing hardware ID's. Each element of an ASL package contains a reference count from the parent package as well as the element itself. Name (TEST, Package() { "String object" // this package element has a reference count of 2 }) A memory leak was caused in the _CID repair function because it did not decrement the reference count created by the package. Fix the memory leak by calling acpi_ut_remove_reference on _CID package elements that represent a hardware ID (_HID). Link: https://github.com/acpica/acpica/commit/180cb539 Tested-by:
Shawn Guo <shawn.guo@linaro.org> Signed-off-by:
Erik Kaneda <erik.kaneda@intel.com> Signed-off-by:
Bob Moore <robert.moore@intel.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- Jun 06, 2021
-
-
Dan Williams authored
The expectation is that devm functions take 'struct device *' and pci functions take 'struct pci_dev *'. Swap out the @pdev argument for @dev and fixup related helpers. Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Acked-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162216592374.3833641.13281743585064451514.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ben Widawsky authored
An HDM decoder is defined in the CXL 2.0 specification as a mechanism that allow devices and upstream ports to claim memory address ranges and participate in interleave sets. HDM decoder registers are within the component register block defined in CXL 2.0 8.2.3 CXL 2.0 Component Registers as part of the CXL.cache and CXL.mem subregion. The Component Register Block is found via the Register Locator DVSEC in a similar fashion to how the CXL Device Register Block is found. The primary difference is the capability id size of the Component Register Block is a single DWORD instead of 4 DWORDS. It's now possible to configure a CXL type 3 device's HDM decoder. Such programming is expected for CXL devices with persistent memory, and hot plugged CXL devices that participate in CXL.mem with volatile memory. Add probe and mapping functions for the component register blocks. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by:
Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Co-developed-by:
Vishal Verma <vishal.l.verma@intel.com> Signed-off-by:
Vishal Verma <vishal.l.verma@intel.com> Signed-off-by:
Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/20210528004922.3980613-6-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ira Weiny authored
Some hardware implementations mix component and device registers into the same BAR and the driver stack is going to need independent mapping implementations for those 2 cases. Furthermore, it will be nice to have finer grained mappings should user space want to map some register blocks. Now that individual register blocks are mapped; those blocks regions should be reserved individually to fully separate the register blocks. Release the 'global' memory reservation and create individual register block region reservations through devm. NOTE: pci_release_mem_regions() is still compatible with pcim_enable_device() because it removes the automatic region release when called. So preserve the pcim_enable_device() so that the pcim interface can be called if needed. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210604005316.4187340-1-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ira Weiny authored
The information required to map registers based on capabilities is contained within the bars themselves. This means the bar must be mapped to read the information needed and then unmapped to map the individual parts of the BAR based on capabilities. Change cxl_setup_device_regs() to return a new cxl_register_map, change the name to cxl_probe_device_regs(). Allocate and place cxl_register_maps on a list while processing all of the specified register blocks. After probing all the register blocks go back and map smaller registers blocks based on their capabilities and dispose of the cxl_register_maps. NOTE: pci_iomap() is not managed automatically via pcim_enable_device() so be careful to call pci_iounmap() correctly. Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210604005036.4187184-1-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ira Weiny authored
In order to remap individual register sets each bar region must be reserved prior to mapping. Because the details of individual register sets are contained within the BARs themselves, the bar must be mapped 2 times, once to extract this information and a second time for each register set. Rather than attempt to reserve each BAR individually and track if that bar has been reserved. Open code pcim_iomap_regions() by first reserving all memory regions on the device and then mapping the bars individually as needed. NOTE pci_request_mem_regions() does not need a corresponding pci_release_mem_regions() because the pci device is managed via pcim_enable_device(). Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210528004922.3980613-3-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Ira Weiny authored
Each register block located in the DVSEC needs to be decoded from 2 words, 'register offset high' and 'register offset low'. Create a function, cxl_decode_register_block() to perform this decode and return the bar, offset, and register type of the register block. Then use the values decoded in cxl_mem_map_regblock() instead of passing the raw registers. Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210528004922.3980613-2-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 05, 2021
-
-
David Hildenbrand authored
offline_pages() properly checks for memory holes and bails out. However, we do a page_zone(pfn_to_page(start_pfn)) before calling offline_pages() when offlining a memory block. We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on aarch64 in offlining code, otherwise we can trigger a BUG when hitting a memory hole: kernel BUG at include/linux/mm.h:1383! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4 Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020 pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : memory_subsys_offline+0x1f8/0x250 lr : memory_subsys_offline+0x1f8/0x250 Call trace: memory_subsys_offline+0x1f8/0x250 device_offline+0x154/0x1d8 online_store+0xa4/0x118 dev_attr_store+0x44/0x78 sysfs_kf_write+0xe8/0x138 kernfs_fop_write_iter+0x26c/0x3d0 new_sync_write+0x2bc/0x4f8 vfs_write+0x718/0xc88 ksys_write+0xf8/0x1e0 __arm64_sys_write+0x74/0xa8 invoke_syscall.constprop.0+0x78/0x1e8 do_el0_svc+0xe4/0x298 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb8 el0_sync+0x178/0x180 Kernel panic - not syncing: Oops - BUG: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x00000251,20000846 Memory Limit: none If nr_vmemmap_pages is set, we know that we are dealing with hotplugged memory that doesn't have any holes. So call page_zone(pfn_to_page(start_pfn)) only when really necessary -- when nr_vmemmap_pages is set and we actually adjust the present pages. Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com Fixes: a08a2ae3 ("mm,memory_hotplug: allocate memmap from the added memory range") Signed-off-by:
David Hildenbrand <david@redhat.com> Reported-by:
Qian Cai (QUIC) <quic_qiancai@quicinc.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 04, 2021
-
-
Rahul Lakkireddy authored
When configuring TC-MQPRIO offload, only turn off netdev carrier and don't bring physical link down in hardware. Otherwise, when the physical link is brought up again after configuration, it gets re-trained and stalls ongoing traffic. Also, when firmware is no longer accessible or crashed, avoid sending FLOWC and waiting for reply that will never come. Fix following hung_task_timeout_secs trace seen in these cases. INFO: task tc:20807 blocked for more than 122 seconds. Tainted: G S 5.13.0-rc3+ #122 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tc state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000 Call Trace: __schedule+0x27b/0x6a0 schedule+0x37/0xa0 schedule_preempt_disabled+0x5/0x10 __mutex_lock.isra.14+0x2a0/0x4a0 ? netlink_lookup+0x120/0x1a0 ? rtnl_fill_ifinfo+0x10f0/0x10f0 __netlink_dump_start+0x70/0x250 rtnetlink_rcv_msg+0x28b/0x380 ? rtnl_fill_ifinfo+0x10f0/0x10f0 ? rtnl_calcit.isra.42+0x120/0x120 netlink_rcv_skb+0x4b/0xf0 netlink_unicast+0x1a0/0x280 netlink_sendmsg+0x216/0x440 sock_sendmsg+0x56/0x60 __sys_sendto+0xe9/0x150 ? handle_mm_fault+0x6d/0x1b0 ? do_user_addr_fault+0x1c5/0x620 __x64_sys_sendto+0x1f/0x30 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7f73218321 RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321 RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003 RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6 R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0 Fixes: b1396c2b ("cxgb4: parse and configure TC-MQPRIO offload") Signed-off-by:
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly. In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k. The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions. We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Suggested-by:
Arnd Bergmann <arnd@arndb.de> Suggested-by:
Matthew Wilcox <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation. This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack. The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation. Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test. While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Fixes: 8b5553ac ("wireguard: queueing: get rid of per-peer ring buffers") Suggested-by:
Arnd Bergmann <arnd@arndb.de> Suggested-by:
Matthew Wilcox <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes. This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jason A. Donenfeld authored
Apparently, various versions of gcc have O3-related miscompiles. Looking at the difference between -O2 and -O3 for gcc 11 doesn't indicate miscompiles, but the difference also doesn't seem so significant for performance that it's worth risking. Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/ Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Fixes: e7096c13 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Roja Rani Yarubandi authored
Mark bus as suspended during system suspend to block the future transfers. Implement geni_i2c_resume_noirq() to resume the bus. Fixes: 37692de5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by:
Roja Rani Yarubandi <rojay@codeaurora.org> Reviewed-by:
Stephen Boyd <swboyd@chromium.org> Signed-off-by:
Wolfram Sang <wsa@kernel.org>
-
Roja Rani Yarubandi authored
If the hardware is still accessing memory after SMMU translation is disabled (as part of smmu shutdown callback), then the IOVAs (I/O virtual address) which it was using will go on the bus as the physical addresses which will result in unknown crashes like NoC/interconnect errors. So, implement shutdown callback for i2c driver to suspend the bus during system "reboot" or "shutdown". Fixes: 37692de5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by:
Roja Rani Yarubandi <rojay@codeaurora.org> Reviewed-by:
Stephen Boyd <swboyd@chromium.org> Signed-off-by:
Wolfram Sang <wsa@kernel.org>
-
Dave Ertman authored
Currently in the ice driver, the check whether to allow a LLDP packet to egress the interface from the PF_VSI is being based on the SKB's priority field. It checks to see if the packets priority is equal to TC_PRIO_CONTROL. Injected LLDP packets do not always meet this condition. SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL (0x0003) and does not set the priority field. There will be other injection methods (even ones used by end users) that will not correctly configure the socket so that SKB fields are correctly populated. Then ethernet header has to have to correct value for the protocol though. Add a check to also allow packets whose ethhdr->h_proto matches ETH_P_LLDP (0x88CC). Fixes: 0c3a6101 ("ice: Allow egress control packets from PF_VSI") Signed-off-by:
Dave Ertman <david.m.ertman@intel.com> Tested-by:
Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com>
-
Paul Greenwalt authored
Ethtool incorrectly reported supported and advertised auto-negotiation settings for a backplane PHY image which did not support auto-negotiation. This can occur when using media or PHY type for reporting ethtool supported and advertised auto-negotiation settings. Remove setting supported and advertised auto-negotiation settings based on PHY type in ice_phy_type_to_ethtool(), and MAC type in ice_get_link_ksettings(). Ethtool supported and advertised auto-negotiation settings should be based on the PHY image using the AQ command get PHY capabilities with media. Add setting supported and advertised auto-negotiation settings based get PHY capabilities with media in ice_get_link_ksettings(). Fixes: 48cb27f2 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by:
Paul Greenwalt <paul.greenwalt@intel.com> Tested-by:
Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com>
-
Haiyue Wang authored
VSI rebuild can be failed for LAN queue config, then the VF's VSI will be NULL, the VF reset should be stopped with the VF entering into the disable state. Fixes: 12bb018c ("ice: Refactor VF reset") Signed-off-by:
Haiyue Wang <haiyue.wang@intel.com> Tested-by:
Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com>
-
Brett Creeley authored
Some AVF drivers expect the VF_MBX_ATQLEN register to be cleared for any type of VFR/VFLR. Fix this by clearing the VF_MBX_ATQLEN register at the same time as VF_MBX_ARQLEN. Fixes: 82ba0128 ("ice: clear VF ARQLEN register on reset") Signed-off-by:
Brett Creeley <brett.creeley@intel.com> Tested-by:
Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com>
-
Brett Creeley authored
Commit 12bb018c ("ice: Refactor VF reset") caused a regression that removes the ability for a VF to request a different amount of queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to either increase or decrease the number of queue pairs they are allocated. Fix this by using the variable vf->num_req_qs when determining the vf->num_vf_qs during VF VSI creation. Fixes: 12bb018c ("ice: Refactor VF reset") Signed-off-by:
Brett Creeley <brett.creeley@intel.com> Tested-by:
Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by:
Tony Nguyen <anthony.l.nguyen@intel.com>
-
- Jun 03, 2021
-
-
Xuan Zhuo authored
In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. [ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1)) [ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131) [ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714) [ 44.546628] ? dev_gro_receive (net/core/dev.c:6103) [ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565) [ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525) [ 44.548251] __napi_poll (net/core/dev.c:6985) [ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139) [ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560) [ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649) [ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13)) [ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) [ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by:
Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by:
Corentin Noël <corentin.noel@collabora.com> Tested-by:
Corentin Noël <corentin.noel@collabora.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Rahul Lakkireddy authored
commit db43b30c ("cxgb4: add ethtool n-tuple filter deletion") has moved searching for next highest priority HASH filter rule to cxgb4_flow_rule_destroy(), which searches the rhashtable before the the rule is removed from it and hence always finds at least 1 entry. Fix by removing the rule from rhashtable first before calling cxgb4_flow_rule_destroy() and hence avoid fetching stale info. Fixes: db43b30c ("cxgb4: add ethtool n-tuple filter deletion") Signed-off-by:
Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Luiz Augusto von Dentz authored
Some firmware when operation don't may have broken versions leading to error like the following: [ 6.176482] Bluetooth: hci0: Firmware revision 0.0 build 121 week 7 2021 [ 6.177906] bluetooth hci0: Direct firmware load for intel/ibt-20-0-0.sfi failed with error -2 [ 6.177910] Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-20-0-0.sfi (-2) Since we load the firmware file just to check if its version had changed comparing to the one already loaded we can just skip since the firmware is already operation. Fixes: ac056546 ("Bluetooth: btintel: Check firmware version before download") Signed-off-by:
Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org>
-
Lee Jones authored
Fixes the following W=1 kernel build warning(s): drivers/i2c/busses/i2c-tegra-bpmp.c:86: warning: Function parameter or member 'i2c' not described in 'tegra_bpmp_serialize_i2c_msg' drivers/i2c/busses/i2c-tegra-bpmp.c:86: warning: Function parameter or member 'request' not described in 'tegra_bpmp_serialize_i2c_msg' drivers/i2c/busses/i2c-tegra-bpmp.c:86: warning: Function parameter or member 'msgs' not described in 'tegra_bpmp_serialize_i2c_msg' drivers/i2c/busses/i2c-tegra-bpmp.c:86: warning: Function parameter or member 'num' not described in 'tegra_bpmp_serialize_i2c_msg' drivers/i2c/busses/i2c-tegra-bpmp.c:86: warning: expecting prototype for The serialized I2C format is simply the following(). Prototype was for tegra_bpmp_serialize_i2c_msg() instead drivers/i2c/busses/i2c-tegra-bpmp.c:130: warning: Function parameter or member 'i2c' not described in 'tegra_bpmp_i2c_deserialize' drivers/i2c/busses/i2c-tegra-bpmp.c:130: warning: Function parameter or member 'response' not described in 'tegra_bpmp_i2c_deserialize' drivers/i2c/busses/i2c-tegra-bpmp.c:130: warning: Function parameter or member 'msgs' not described in 'tegra_bpmp_i2c_deserialize' drivers/i2c/busses/i2c-tegra-bpmp.c:130: warning: Function parameter or member 'num' not described in 'tegra_bpmp_i2c_deserialize' drivers/i2c/busses/i2c-tegra-bpmp.c:130: warning: expecting prototype for The data in the BPMP(). Prototype was for tegra_bpmp_i2c_deserialize() instead Signed-off-by:
Lee Jones <lee.jones@linaro.org> Acked-by:
Thierry Reding <treding@nvidia.com> Signed-off-by:
Wolfram Sang <wsa@kernel.org>
-