- Feb 15, 2018
-
-
Matan Barak authored
32 bit processes running on a 64 bit kernel call compat_ioctl so that implementations can revise any structure layout issues. Point compat_ioctl at our normal ioctl because: - All our structures are designed to be the same on 32 and 64 bit, ie we use __aligned_u64 when required and are careful to manage padding. - Any pointers are stored in u64's and userspace is expected to prepare them properly. Signed-off-by:
Matan Barak <matanb@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Matan Barak authored
Fix a bug in uverbs_ioctl_merge that looked at the object's iterator number instead of the method's iterator number when merging methods. While we're at it, make the uverbs_ioctl_merge code a bit more clear and faster. Fixes: 118620d3 ('IB/core: Add uverbs merge trees functionality') Signed-off-by:
Matan Barak <matanb@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Jason Gunthorpe authored
The union approach will get the endianness wrong sometimes if the kernel's pointer size is 32 bits resulting in EFAULTs when trying to copy to/from user. Signed-off-by:
Leon Romanovsky <leon@kernel.org> Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Jason Gunthorpe authored
The rule for the API is pointers less than 8 bytes are inlined into the .data field of the attribute. Fix the creation of the driver udata struct to follow this rule and point to the .data itself when the size is less than 8 bytes. Otherwise if the UHW struct is less than 8 bytes the driver will get EFAULT during copy_from_user. Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Matan Barak authored
This fixes several bugs around the copy_to/from user path: - copy_to used the user provided size of the attribute and could copy data beyond the end of the kernel buffer into userspace. - copy_from didn't know the size of the kernel buffer and could have left kernel memory unexpectedly un-initialized. - copy_from did not use the user length to determine if the attribute data is inlined or not. Signed-off-by:
Matan Barak <matanb@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Leon Romanovsky authored
Resource tracking of XRCD objects is not implemented in current version of restrack and hence can be removed. Fixes: 02d8883f ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Alaa Hleihel authored
netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event multiple times until all refs are gone, which will result in calling ipoib_delete_debug_files multiple times and printing a warning. Remove the WARN_ONCE since checks of NULL pointers before calling debugfs_remove are not needed. Fixes: 771a5258 ("IB/IPoIB: ibX: failed to create mcg debug file") Signed-off-by:
Alaa Hleihel <alaa@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- Feb 11, 2018
-
-
Linus Torvalds authored
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 05, 2018
-
-
oulijun authored
The hip06 and hip08 run on a little endian ARM, it needs to revise the annotations to indicate that the HW uses little endian data in the various DMA buffers, and flow the necessary swaps throughout. The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu swaps are no-op for this, which makes the only substantive change the handling of imm_data which is now mandatory swapped. This also keep match with the userspace hns driver and resolve the warning by sparse. Signed-off-by:
Lijun Ou <oulijun@huawei.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- Feb 04, 2018
-
-
Jason Gunthorpe authored
We really don't want people turning this on just yet, make it very clear with capital letters. Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Jason Gunthorpe authored
These days the userspace comes from rdma-core, revise references in the kernel to point to the current repository. Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- Feb 01, 2018
-
-
Don Hiatt authored
Add trace_hfi1_rcvhdr support for bypass packets. While here, remove the etype argument as it is available in struct hfi1_packet. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Don Hiatt <don.hiatt@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kamenee Arumugam authored
Kzalloc_node API doesn't check for overflows in size multiplication. While kcalloc API check for overflows in size multiplication but these implementations are not NUMA-aware. This conversion allowed for correcting an allocation used in the hot path to be on the local NUMA and ensure us overflow free multiplication for the size of a memory allocation. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Michael J. Ruhl authored
The ev_file is an optional parameter for CQ creation. If the parameter is not passed, the ev_file pointer will be NULL. Using that pointer to set the cq_context will result in an OOPs. Verify that ev_file is not NULL before using. Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 9ee79fce ("IB/core: Add completion queue (cq) object actions") Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Alex Estrin authored
On reboot SM can program port pkey table before ipoib registered its event handler, which could result in missing pkey event and leave root interface with initial pkey value from index 0. Since OPA port starts with invalid pkey in index 0, root interface will fail to initialize and stay down with no-carrier flag. For IB ipoib interface may end up with pkey different from value opensm put in pkey table idx 0, resulting in connectivity issues (different mcast groups, for example). Close the window by calling event handler after registration to make sure ipoib pkey is in sync with port pkey table. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Alex Estrin <alex.estrin@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mitko Haralanov authored
The routine which shows the fault stats checks the counters to determine whether to show any stats based on the number of transmitted pkts/bytes for a particular opcode. Unfortunately, it only checked the receive counters. As a result, if any packet faults have happened for packets egressing the HFI, those stats would not be shown. In order to fix this, the routine is amended to also check the TX counters. With this change the pkt/byte counts are the sum of both TX and RX counts for the opcode. Fixes: 1b311f89 ("IB/hfi1: Add tx_opcode_stats like the opcode_stats") Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mike Marciniszyn authored
These values were introduced as part of the 16B code to account for the varying size of the LRH between the differing packet formats. Replace the blind constants with defines based on FIELD_SIZEOF() calls. Fixes: 5b6cabb0 ("IB/hfi1: Add 16B RC/UC support") Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Signed-off-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kamenee Arumugam authored
HFI's counters SendWaitCnt and SendWaitVlCnt are in units of TXE cycle time (at 805MHz). OPA counters PortXmitWait and PortVLXmtWait are in units of flit times. Convert the counter values to flit units using following conversion formula: PortXmitWait = SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) PortVLXmitWait = SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) At link up or downgrade events, the link width can change. To ensure accurate counter calculations, sample the counters after the events, during counter requests, and then aggregate the OPA counters. Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Bartlomiej Dudek authored
During PCIe Gen 3 transistion, pcie_pset is read and might be overridden to a default value(i.e. 255) in do_pcie_gen3_transition() routine. If the pcie_pset value is overridden then this new value will be used during initialization of next adapter on a different card. Introducing a new local variable to avoid modification of pcie_pset Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Bartlomiej Dudek <bartlomiej.dudek@intel.com> Signed-off-by:
Patel Jay P <jay.p.patel@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sebastian Sanchez authored
The arguments for trace_hfi1_rcvhdr() get computed every time in the hot path regardless of the whether the trace is on or off. This is seen to be costly with a profile. The handling of fault inject isolates the verbs device for all packets regardless of the presence of a RHF_DC_ERR error. Fix the first by computing trace_hfi1_rcvhdr() arguments within the trace itself, so that when the trace is off, the argument data isn't computed. Fix the second by moving the error check to handle_eflags() when an RHF error occurs and by testing for RHF_DC_ERR before executing the reset of handle_eflags(). Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sebastian Sanchez authored
packet->fecn and packet->becn are calculated in the hot path and are never used. Remove these fields as they show to be costly in a profile. Also, remove initialization for becn and fecn in process_ecn() as they're unconditionally assigned in the function and ensure fecn and becn variables use a boolean type. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sebastian Sanchez authored
In the receive path, hfi1_ibport is looked up by indexing into an array. A profile shows this to be expensive. The receive context data has a pointer to the ibport data, use that pointer instead. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sebastian Sanchez authored
The packet type comparison used to find out if a packet is a bypass packet in the hot path is an expensive operation as seen in a profile. Determine packet's pkey and migration bit through the bypass and 9B code paths instead. Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sebastian Sanchez authored
In hfi1_rc_rcv(), BTH is computed for all packets received. However, it's only used for packets received with opcodes RDMA_WRITE_LAST and SEND_LAST, and it is a costly operation. Compute BTH only in the RDMA_WRITE_LAST/SEND_LAST code path and let the compiler handle endianness conversion for bitwise operations. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mitko Haralanov authored
The s_hdrwords variable was used to indicate whether a packet was already built on a previous iteration of the send engine. This variable assumed the protection of the QP's RVT_S_BUSY flag, which was required since the the QP's s_lock was dropped just prior to the packet being queued on the one of the egress mechanisms. Support for multiple send engine instantiations require that the field not be used due to concurency issues. The ps.txreq signals the "already built" without the potential concurency issues. Fix by getting rid of all s_hdrword usage. A wrapper is added to test for the already built case that used to use s_hdrwords. What used to be stored in s_hdrwords is now in the txreq. The PBC is not counted, but is added in the pio/sdma code paths prior to posting the packet. Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Signed-off-by:
Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Alex Estrin authored
The dd refcount is speculatively incremented prior to allocating the fd memory with kzalloc(). If that kzalloc() failed the dd refcount leaks. Increment refcount on kzalloc success. Fixes: e11ffbd5 ("IB/hfi1: Do not free hfi1 cdev parent structure early") Reviewed-by:
Michael J Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Alex Estrin <alex.estrin@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Alex Estrin authored
With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled module removal may result in panic in sdma_interrupt() routine if associated sdma context was released before pci_free_irq(); [ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at (null) [ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1] [ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0 [ 9198.941783] Oops: 0000 [#1] SMP ..... [ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G OE 4.14.0-rc4+ #1 [ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017 [ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000 [ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1] [ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046 [ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000 [ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0 [ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff [ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44 [ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00 [ 9198.983462] FS: 00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000 [ 9198.986231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0 [ 9198.991911] Call Trace: [ 9198.994847] sdma_engine_interrupt+0x82/0x100 [hfi1] [ 9198.997852] sdma_interrupt+0x61/0xc0 [hfi1] [ 9199.000852] __free_irq+0x1b3/0x2d0 [ 9199.003873] free_irq+0x35/0x70 [ 9199.006909] pci_free_irq+0x1c/0x30 [ 9199.009999] clean_up_interrupts+0x53/0xf0 [hfi1] [ 9199.013137] hfi1_start_cleanup+0x117/0x190 [hfi1] [ 9199.016315] postinit_cleanup+0x1d/0x270 [hfi1] [ 9199.019529] remove_one+0x1f3/0x210 [hfi1] [ 9199.022738] pci_device_remove+0x39/0xc0 [ 9199.025974] device_release_driver_internal+0x141/0x210 [ 9199.029268] driver_detach+0x3f/0x80 [ 9199.032580] bus_remove_driver+0x55/0xd0 [ 9199.035931] driver_unregister+0x2c/0x50 [ 9199.039321] pci_unregister_driver+0x2a/0xa0 [ 9199.042755] hfi1_mod_cleanup+0x10/0xb50 [hfi1] [ 9199.046196] SyS_delete_module+0x171/0x250 ... Fix by exporting sdma_clean() and removing from sdma_exit(). sdma_exit() now just manipulates the engine state, leaving the memory free to sdma_clean() which is now called just before the dd is freed. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by:
Michael J Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Alex Estrin <alex.estrin@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Michael J. Ruhl authored
The pci_request_irq() interfaces always adds the IRQF_SHARED bit to all IRQ requests. When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the IRQF_SHARED bit is set, a call to the IRQ handler is made from the __free_irq() function. This is testing a race condition between the IRQ cleanup and an IRQ racing the cleanup. The HFI driver should be able to handle this race, but does not. This race can cause traces that start with this footprint: BUG: unable to handle kernel NULL pointer dereference at (null) Call Trace: <hfi1 irq handler> ... __free_irq+0x1b3/0x2d0 free_irq+0x35/0x70 pci_free_irq+0x1c/0x30 clean_up_interrupts+0x53/0xf0 [hfi1] hfi1_start_cleanup+0x122/0x190 [hfi1] postinit_cleanup+0x1d/0x280 [hfi1] remove_one+0x233/0x250 [hfi1] pci_device_remove+0x39/0xc0 Export IRQ cleanup function so it can be called from other modules. Using the exported cleanup function: Re-order the driver cleanup code to clean up IRQ resources before other resources, eliminating the race. Re-order error path for init so that the race does not occur. Reduce severity on spurious error message for SDMA IRQs to info. Reviewed-by:
Alex Estrin <alex.estrin@intel.com> Reviewed-by:
Patel Jay P <jay.p.patel@intel.com> Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Dan Carpenter authored
We should return -ENOMEM if the allocation fails. The current code accidentally returns success. Fixes: bf3c5a93 ("RDMA/nldev: Provide global resource utilization") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
oulijun authored
The mtt_table is cleaned up during the err_unmap_cqe label, it is a mistake to duplicate the cleanup during the later unwind labels. Signed-off-by:
Lijun Ou <oulijun@huawei.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
oulijun authored
This patch mainly fix some style warings matched with the new checkpatch requirement. The warning as follows: WARNING: function definition argument 'struct hns_roce_cq *' should also have an identifier name Signed-off-by:
Lijun Ou <oulijun@huawei.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
oulijun authored
The double not-operator is unncessary when used in a boolean context. This patch removes them. Signed-off-by:
Lijun Ou <oulijun@huawei.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Markus Elfring authored
Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by:
Markus Elfring <elfring@users.sourceforge.net> Acked-by:
Devesh Sharma <devesh.sharma@broadcom.com> Acked-by:
Jonathan Toppins <jtoppins@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Markus Elfring authored
RDMA/bnxt_re: Delete two error messages for a failed memory allocation in bnxt_qplib_alloc_dpi_tbl() Omit extra messages for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by:
Markus Elfring <elfring@users.sourceforge.net> Acked-by:
Devesh Sharma <devesh.sharma@broadcom.com> Acked-by:
Jonathan Toppins <jtoppins@redhat.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
David Rientjes authored
Commit 4d4bbd85 ("mm, oom_reaper: skip mm structs with mmu notifiers") prevented the oom reaper from unmapping private anonymous memory with the oom reaper when the oom victim mm had mmu notifiers registered. The rationale is that doing mmu_notifier_invalidate_range_{start,end}() around the unmap_page_range(), which is needed, can block and the oom killer will stall forever waiting for the victim to exit, which may not be possible without reaping. That concern is real, but only true for mmu notifiers that have blockable invalidate_range_{start,end}() callbacks. This patch adds a "flags" field to mmu notifier ops that can set a bit to indicate that these callbacks do not block. The implementation is steered toward an expensive slowpath, such as after the oom reaper has grabbed mm->mmap_sem of a still alive oom victim. [rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com [akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.com Signed-off-by:
David Rientjes <rientjes@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Acked-by:
Christian König <christian.koenig@amd.com> Acked-by:
Dimitri Sivanich <sivanich@hpe.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Joerg Roedel <joro@8bytes.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 31, 2018
-
-
Zhu Yanjun authored
In the function rxe_av_fill_ip_info, the parameter rxe is not used. So it is removed. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Reviewed-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Zhu Yanjun authored
The function rxe_av_fill_ip_info always returns 0. So the function type is changed to void. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Zhu Yanjun authored
Since the function rxe_av_to_attr always return 0, the function type is changed to void. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Zhu Yanjun authored
In the function rxe_av_to_attr, the parameter rxe is not used. So it is removed. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Reviewed-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Zhu Yanjun authored
The function rxe_av_from_attr always return 0. So change the function to void. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by:
Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by:
Leon Romanovsky <leonro@mellanox.com> Reviewed-by:
Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-