- Jan 17, 2019
-
-
David Rheinsberg authored
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by:
Tom Gundersen <teg@jklm.no> Signed-off-by:
David Herrmann <dh.herrmann@gmail.com> Acked-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jan 06, 2019
-
-
Masahiro Yamada authored
Now that Kbuild automatically creates asm-generic wrappers for missing mandatory headers, it is redundant to list the same headers in generic-y and mandatory-y. Suggested-by:
Sam Ravnborg <sam@ravnborg.org> Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by:
Sam Ravnborg <sam@ravnborg.org>
-
Masahiro Yamada authored
These comments are leftovers of commit fcc8487d ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com>
-
- Dec 14, 2018
-
-
Firoz Khan authored
System call table generation script must be run to gener- ate unistd_(nr_)n64/n32/o32.h and syscall_table_32_o32/ 64_n64/64_n32/64-o32.h files. This patch will have changes which will invokes the script. This patch will generate unistd_(nr_)n64/n32/o32.h and syscall_table_32_o32/64_n64/64-n32/64-o32.h files by the syscall table generation script invoked by parisc/Make- file and the generated files against the removed files must be identical. The generated uapi header file will be included in uapi/- asm/unistd.h and generated system call table header file will be included by kernel/scall32-o32/64-n64/64-n32/- 64-o32.Sfile. Signed-off-by:
Firoz Khan <firoz.khan@linaro.org> Signed-off-by:
Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: y2038@lists.linaro.org Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: arnd@arndb.de Cc: deepa.kernel@gmail.com Cc: marcin.juszkiewicz@linaro.org
-
Firoz Khan authored
All other architectures are hold a value for __NR_syscalls will be equal to the last system call number +1. But in mips architecture, __NR_syscalls hold the value equal to total number of system exits in the architecture. One of the patch in this patch series will genarate uapi header files. In order to make the implementation common across all architect- ures, add +1 to __NR_syscalls, which will be equal to the last system call number +1. Signed-off-by:
Firoz Khan <firoz.khan@linaro.org> Signed-off-by:
Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: y2038@lists.linaro.org Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: arnd@arndb.de Cc: deepa.kernel@gmail.com Cc: marcin.juszkiewicz@linaro.org
-
Firoz Khan authored
Remove __NR_Linux_syscalls from uapi/asm/unistd.h as there is no users to use NR_syscalls macro in mips kernel. MAX_SYSCALL_NO can also remove as there is commit 2957c9e6 ("[MIPS] IRIX: Goodbye and thanks for all the fish"), eight years ago. Signed-off-by:
Firoz Khan <firoz.khan@linaro.org> [paul.burton@mips.com: - Drop the removal of NR_syscalls which is used by kernel/trace/trace.h.] Signed-off-by:
Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: y2038@lists.linaro.org Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: arnd@arndb.de Cc: deepa.kernel@gmail.com Cc: marcin.juszkiewicz@linaro.org
-
- Dec 13, 2018
-
-
Firoz Khan authored
__NR_Linux_syscalls macro holds the number of system call exist in mips architecture. We have to change the value of __NR_Linux_syscalls, if we add or delete a system call. One of the patch in this patch series has a script which will generate a uapi header based on syscall.tbl file. The syscall.tbl file contains the total number of system calls information. So we have two option to update __NR- _Linux_syscalls value. 1. Update __NR_Linux_syscalls in asm/unistd.h manually by counting the no.of system calls. No need to update __NR_Linux_syscalls until we either add a new system call or delete existing system call. 2. We can keep this feature it above mentioned script, that will count the number of syscalls and keep it in a generated file. In this case we don't need to expli- citly update __NR_Linux_syscalls in asm/unistd.h file. The 2nd option will be the recommended one. For that, I added the __NR_syscalls macro in uapi/asm/unistd.h along with __NR_Linux_syscalls. The macro __NR_syscalls also added for making the name convention same across all architecture. While __NR_syscalls isn't strictly part of the uapi, having it as part of the generated header to simplifies the implementation. We also need to enclose this macro with #ifdef __KERNEL__ to avoid side effects. Signed-off-by:
Firoz Khan <firoz.khan@linaro.org> Signed-off-by:
Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: y2038@lists.linaro.org Cc: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: arnd@arndb.de Cc: deepa.kernel@gmail.com Cc: marcin.juszkiewicz@linaro.org
-
- Dec 07, 2018
-
-
Jiong Wang authored
Jitting of BPF_K is supported already, but not BPF_X. This patch complete the support for the latter on both MIPS and microMIPS. Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Acked-by:
Paul Burton <paul.burton@mips.com> Signed-off-by:
Jiong Wang <jiong.wang@netronome.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
Jiong Wang authored
For micro-mips, srlv inside POOL32A encoding space should use 0x50 sub-opcode, NOT 0x90. Some early version ISA doc describes the encoding as 0x90 for both srlv and srav, this looks to me was a typo. I checked Binutils libopcode implementation which is using 0x50 for srlv and 0x90 for srav. v1->v2: - Keep mm_srlv32_op sorted by value. Fixes: f31318fd ("MIPS: uasm: Add srlv uasm instruction") Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Acked-by:
Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Jiong Wang <jiong.wang@netronome.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
- Nov 19, 2018
-
-
Sean Young authored
When building BPF code using "clang -target bpf -c", clang does not define __linux__. To build BPF IR decoders the include linux/lirc.h is needed which includes linux/types.h. Currently this workaround is needed: https://git.linuxtv.org/v4l-utils.git/commit/?id=dd3ff81f58c4e1e6f33765dc61ad33c48ae6bb07 This check might otherwise be useful to stop users from using a non-linux compiler, but if you're doing that you are going to have a lot more trouble anyway. Signed-off-by:
Sean Young <sean@mess.org> Signed-off-by:
Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/21149/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
-
- Oct 03, 2018
-
-
Eric W. Biederman authored
Rework the defintion of struct siginfo so that the array padding struct siginfo to SI_MAX_SIZE can be placed in a union along side of the rest of the struct siginfo members. The result is that we no longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions. Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- Oct 02, 2018
-
-
Nicolas Ferre authored
Add the ISO7816 ioctl and associated accessors and data structure. Drivers can then use this common implementation to handle ISO7816 (smart cards). Signed-off-by:
Nicolas Ferre <nicolas.ferre@microchip.com> [ludovic.desroches@microchip.com: squash and rebase, removal of gpios, checkpatch fixes] Signed-off-by:
Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jul 04, 2018
-
-
Richard Cochran authored
This patch introduces SO_TXTIME. User space enables this option in order to pass a desired future transmit time in a CMSG when calling sendmsg(2). The argument to this socket option is a 8-bytes long struct provided by the uapi header net_tstamp.h defined as: struct sock_txtime { clockid_t clockid; u32 flags; }; Note that new fields were added to struct sock by filling a 2-bytes hole found in the struct. For that reason, neither the struct size or number of cachelines were altered. Signed-off-by:
Richard Cochran <rcochran@linutronix.de> Signed-off-by:
Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jun 20, 2018
-
-
Paul Burton authored
Wire up the io_pgetevents syscall that was introduced by commit 7a074e96 ("aio: implement io_pgetevents"). Signed-off-by:
Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/19593/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org
-
Paul Burton authored
Wire up the restartable sequences (rseq) syscall for MIPS. This was introduced by commit d7822b1e ("rseq: Introduce restartable sequences system call") & MIPS now supports the prerequisites. Signed-off-by:
Paul Burton <paul.burton@mips.com> Reviewed-by:
James Hogan <jhogan@kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/19525/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
-
- Apr 20, 2018
-
-
Arnd Bergmann authored
MIPS is the weirdest case for sysvipc, because each of the three data structures is done differently: * msqid64_ds has padding in the right place so we could in theory extend this one to just have 64-bit values instead of time_t. As this does not work for most of the other combinations, we just handle it in the common manner though. * semid64_ds has no padding for 64-bit time_t, but has two reserved 'long' fields, which are sufficient to extend the sem_otime and sem_ctime fields to 64 bit. In order to do this, the libc implementation will have to copy the data into another structure that has the fields in a different order. MIPS is the only architecture with this problem, so this is best done in MIPS specific libc code. * shmid64_ds is slightly worse than that, because it has three time_t fields but only two unused 32-bit words. As a workaround, we extend each field only by 16 bits, ending up with 48-bit timestamps that user space again has to work around by itself. The compat versions of the data structures are changed in the same way. Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Apr 11, 2018
-
-
Michal Hocko authored
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2. This has started as a follow up discussion [3][4] resulting in the runtime failure caused by hardening patch [5] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [6]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_NOREPLACE which enforces the given address but unlike MAP_FIXED it fails with EEXIST if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [7] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_NOREPLACE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_NOREPLACE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au This patch (of 2): MAP_FIXED is used quite often to enforce mapping at the particular range. The main problem of this flag is, however, that it is inherently dangerous because it unmaps existing mappings covered by the requested range. This can cause silent memory corruptions. Some of them even with serious security implications. While the current semantic might be really desiderable in many cases there are others which would want to enforce the given range but rather see a failure than a silent memory corruption on a clashing range. Please note that there is no guarantee that a given range is obeyed by the mmap even when it is free - e.g. arch specific code is allowed to apply an alignment. Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this behavior. It has the same semantic as MAP_FIXED wrt. the given address request with a single exception that it fails with EEXIST if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. [mpe@ellerman.id.au: fix whitespace] [fail on clashing range with EEXIST as per Florian Weimer] [set MAP_FIXED before round_hint_to_min as per Khalid Aziz] Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org Reviewed-by:
Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Michael Ellerman <mpe@ellerman.id.au> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Russell King - ARM Linux <linux@armlinux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Evans <jasone@google.com> Cc: David Goldblatt <davidtgoldblatt@gmail.com> Cc: Edward Tomasz Napierała <trasz@FreeBSD.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 19, 2018
-
-
Marcin Nowakowski authored
Indicate that CRC32 and CRC32C instuctions are supported by the CPU through elf_hwcap flags. This will be used by a follow-up commit that introduces crc32(c) crypto acceleration modules and is required by GENERIC_CPU_AUTOPROBE feature. Signed-off-by:
Marcin Nowakowski <marcin.nowakowski@mips.com> Signed-off-by:
James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/18600/
-
- Feb 11, 2018
-
-
Al Viro authored
except, again, POLLFREE and POLL_BUSY_LOOP. With this, we finally get to the promised end result: - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any stray instances of ->poll() still using those will be caught by sparse. - eventpoll.c and select.c warning-free wrt __poll_t - no more kernel-side definitions of POLL... - userland ones are visible through the entire kernel (and used pretty much only for mangle/demangle) - same behavior as after the first series (i.e. sparc et.al. epoll(2) working correctly). Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 12, 2018
-
-
Al Viro authored
... having taught the latter that si_errno and si_code might be swapped. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Eric W. Biederman <ebiederm@xmission.com>
-
- Dec 05, 2017
-
-
Hendrik Brueckner authored
Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by:
Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 0515e599 ("bpf: int...
-
- Nov 30, 2017
-
-
Al Viro authored
mangle/demangle on the way to/from userland Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Nov 27, 2017
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Nov 03, 2017
-
-
Dan Williams authored
The mmap(2) syscall suffers from the ABI anti-pattern of not validating unknown flags. However, proposals like MAP_SYNC need a mechanism to define new behavior that is known to fail on older kernels without the support. Define a new MAP_SHARED_VALIDATE flag pattern that is guaranteed to fail on all legacy mmap implementations. It is worth noting that the original proposal was for a standalone MAP_VALIDATE flag. However, when that could not be supported by all archs Linus observed: I see why you *think* you want a bitmap. You think you want a bitmap because you want to make MAP_VALIDATE be part of MAP_SYNC etc, so that people can do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_SYNC, fd, 0); and "know" that MAP_SYNC actually takes. And I'm saying that whole wish is bogus. You're fundamentally depending on special semantics, just make it explicit. It's already not portable, so don't try to make it so. Rename that MAP_VALIDATE as MAP_SHARED_VALIDATE, make it have a value of 0x3, and make people do ret = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0); and then the kernel side is easier too (none of that random garbage playing games with looking at the "MAP_VALIDATE bit", but just another case statement in that map type thing. Boom. Done. Similar to ->fallocate() we also want the ability to validate the support for new flags on a per ->mmap() 'struct file_operations' instance basis. Towards that end arrange for flags to be generically validated against a mmap_supported_flags exported by 'struct file_operations'. By default all existing flags are implicitly supported, but new flags require MAP_SHARED_VALIDATE and per-instance-opt-in. Cc: Jan Kara <jack@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Suggested-by:
Christoph Hellwig <hch@lst.de> Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by:
Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Nov 02, 2017
-
-
Greg Kroah-Hartman authored
Many user space API headers have licensing information, which is either incomplete, badly formatted or just a shorthand for referring to the license under which the file is supposed to be. This makes it hard for compliance tools to determine the correct license. Update these files with an SPDX license identifier. The identifier was chosen based on the license information in the file. GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license identifier with the added 'WITH Linux-syscall-note' exception, which is the officially assigned exception identifier for the kernel syscall exception: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". This exception makes it possible to include GPL headers into non GPL code, without confusing license compliance tools. Headers which have either explicit dual licensing or are just licensed under a non GPL license are updated with the corresponding SPDX identifier and the GPLv2 with syscall exception identifier. The format is: ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE) SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. The update does not remove existing license information as this has to be done on a case by case basis and the copyright holders might have to be consulted. This will happen in a separate step. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by:
Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by:
Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine the correct license. By default are files without license information under the default license of the kernel, which is GPLV2. Marking them GPLV2 would exclude them from being included in non GPLV2 code, which is obviously not intended. The user space API headers fall under the syscall exception which is in the kernels COPYING file: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". otherwise syscall usage would not be possible. Update the files which contain no license information with an SPDX license identifier. The chosen identifier is 'GPL-2.0 WITH Linux-syscall-note' which is the officially assigned identifier for the Linux syscall exception. SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by:
Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by:
Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Sep 07, 2017
-
-
Rik van Riel authored
Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty in the child process after fork. This differs from MADV_DONTFORK in one important way. If a child process accesses memory that was MADV_WIPEONFORK, it will get zeroes. The address ranges are still valid, they are just empty. If a child process accesses memory that was MADV_DONTFORK, it will get a segmentation fault, since those address ranges are no longer valid in the child after fork. Since MADV_DONTFORK also seems to be used to allow very large programs to fork in systems with strict memory overcommit restrictions, changing the semantics of MADV_DONTFORK might break existing programs. MADV_WIPEONFORK only works on private, anonymous VMAs. The use case is libraries that store or cache information, and want to know that they need to regenerate it in the child process after fork. Examples of this would be: - systemd/pulseaudio API checks (fail after fork) (replacing a getpid check, which is too slow without a PID cache) - PKCS#11 API reinitialization check (mandated by specification) - glibc's upcoming PRNG (reseed after fork) - OpenSSL PRNG (reseed after fork) The security benefits of a forking server having a re-inialized PRNG in every child process are pretty obvious. However, due to libraries having all kinds of internal state, and programs getting compiled with many different versions of each library, it is unreasonable to expect calling programs to re-initialize everything manually after fork. A further complication is the proliferation of clone flags, programs bypassing glibc's functions to call clone directly, and programs calling unshare, causing the glibc pthread_atfork hook to not get called. It would be better to have the kernel take care of this automatically. The patch also adds MADV_KEEPONFORK, to undo the effects of a prior MADV_WIPEONFORK. This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO: https://man.openbsd.org/minherit.2 [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines] Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com Signed-off-by:
Rik van Riel <riel@redhat.com> Reported-by:
Florian Weimer <fweimer@redhat.com> Reported-by:
Colm MacCártaigh <colm@allcosts.net> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Cc: <linux-api@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
A non-default huge page size can be encoded in the flags argument of the mmap system call. The definitions for these encodings are in arch specific header files. However, all architectures use the same values. Consolidate all the definitions in the primary user header file (uapi/linux/mman.h). Include definitions for all known huge page sizes. Use the generic encoding definitions in hugetlb_encode.h as the basis for these definitions. Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com Signed-off-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 06, 2017
-
-
Matt Redfearn authored
When the immediate encoded in the instruction is accessed, it is sign extended due to being a signed value being assigned to a signed integer. The ISA specifies that this operation is an unsigned operation. The sign extension leads us to incorrectly decode: 801e9c8e: cbf1 sw ra,68(sp) As having an immediate of 1073741809. Since the instruction format does not specify signed/unsigned, and this is currently the only location to use this instuction format, change it to an unsigned immediate. Fixes: bb9bc468 ("MIPS: Calculate microMIPS ra properly when unwinding the stack") Suggested-by:
Paul Burton <paul.burton@imgtec.com> Signed-off-by:
Matt Redfearn <matt.redfearn@imgtec.com> Reviewed-by:
James Hogan <james.hogan@imgtec.com> Cc: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Miodrag Dinic <miodrag.dinic@imgtec.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/16957/ Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
- Aug 17, 2017
-
-
Eric W. Biederman authored
In a recent discussion Maciej Rozycki reported that this case is impossible. Handle the impossible case by just returning instead of trying to handle it. This makes static analysis simpler as it means nothing needs to consider the impossible case after the return statement. As the code no longer has to deal with this case remove FPE_FIXME from the mips siginfo.h Cc: "Maciej W. Rozycki" <macro@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Link: http://lkml.kernel.org/r/20170718140651.15973-4-ebiederm@xmission.com Ref: ea1b75cf ("signal/mips: Document a conflict with SI_USER with SIGFPE") Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- Aug 04, 2017
-
-
Willem de Bruijn authored
The send call ignores unknown flags. Legacy applications may already unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a socket opts in to zerocopy. Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing. Processes can also query this socket option to detect kernel support for the feature. Older kernels will return ENOPROTOOPT. Signed-off-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 24, 2017
-
-
Eric W. Biederman authored
struct siginfo is a union and the kernel since 2.4 has been hiding a union tag in the high 16bits of si_code using the values: __SI_KILL __SI_TIMER __SI_POLL __SI_FAULT __SI_CHLD __SI_RT __SI_MESGQ __SI_SYS While this looks plausible on the surface, in practice this situation has not worked well. - Injected positive signals are not copied to user space properly unless they have these magic high bits set. - Injected positive signals are not reported properly by signalfd unless they have these magic high bits set. - These kernel internal values leaked to userspace via ptrace_peek_siginfo - It was possible to inject these kernel internal values and cause the the kernel to misbehave. - Kernel developers got confused and expected these kernel internal values in userspace in kernel self tests. - Kernel developers got confused and set si_code to __SI_FAULT which is SI_USER in userspace which causes userspace to think an ordinary user sent the signal and that it was not kernel generated. - The values make it impossible to reorganize the code to transform siginfo_copy_to_user into a plain copy_to_user. As si_code must be massaged before being passed to userspace. So remove these kernel internal si codes and make the kernel code simpler and more maintainable. To replace these kernel internal magic si_codes introduce the helper function siginfo_layout, that takes a signal number and an si_code and computes which union member of siginfo is being used. Have siginfo_layout return an enumeration so that gcc will have enough information to warn if a switch statement does not handle all of union members. A couple of architectures have a messed up ABI that defines signal specific duplications of SI_USER which causes more special cases in siginfo_layout than I would like. The good news is only problem architectures pay the cost. Update all of the code that used the previous magic __SI_ values to use the new SIL_ values and to call siginfo_layout to get those values. Escept where not all of the cases are handled remove the defaults in the switch statements so that if a new case is missed in the future the lack will show up at compile time. Modify the code that copies siginfo si_code to userspace to just copy the value and not cast si_code to a short first. The high bits are no longer used to hold a magic union member. Fixup the siginfo header files to stop including the __SI_ values in their constants and for the headers that were missing it to properly update the number of si_codes for each signal type. The fixes to copy_siginfo_from_user32 implementations has the interesting property that several of them perviously should never have worked as the __SI_ values they depended up where kernel internal. With that dependency gone those implementations should work much better. The idea of not passing the __SI_ values out to userspace and then not reinserting them has been tested with criu and criu worked without changes. Ref: 2.4.0-test1 Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- Jul 20, 2017
-
-
Eric W. Biederman authored
Setting si_code to __SI_FAULT results in a userspace seeing an si_code of 0. This is the same si_code as SI_USER. Posix and common sense requires that SI_USER not be a signal specific si_code. As such this use of 0 for the si_code is a pretty horribly broken ABI. This use of of __SI_FAULT is only a decade old. Which compared to the other pieces of kernel code that has made this mistake is almost yesterday. This is probably worth fixing but I don't know mips well enough to know what si_code to would be the proper one to use. Cc: Ralf Baechle <ralf@linux-mips.org> Ref: 948a34cf ("[MIPS] Maintain si_code field properly for FP exceptions") Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
- Jul 17, 2017
-
-
Gleb Fotengauer-Malinovskiy authored
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag because it doesn't copy anything from/to userspace to access the argument. Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl") Signed-off-by:
Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Acked-by:
Aleksa Sarai <asarai@suse.de> Acked-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jun 29, 2017
-
-
Miodrag Dinic authored
Add handling of missaligned access for DSP load instructions lwx & lhx. Since DSP instructions share SPECIAL3 opcode with other non-DSP instructions, necessary logic was inserted for distinguishing between instructions with SPECIAL3 opcode. For that purpose, the instruction format for DSP instructions is added to arch/mips/include/uapi/asm/inst.h. Signed-off-by:
Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by:
Aleksandar Markovic <aleksandar.markovic@imgtech.com> Cc: James.Hogan@imgtec.com Cc: Paul.Burton@imgtec.com Cc: Raghu.Gandham@imgtec.com Cc: Leonid.Yegoshin@imgtec.com Cc: Douglas.Leung@imgtec.com Cc: Petar.Jovanovic@imgtec.com Cc: Goran.Ferenc@imgtec.com Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16511/ Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
- Jun 28, 2017
-
-
David Daney authored
DSHD was incorrectly classified as being BSHFL, and DSHD was missing altogether. Signed-off-by:
David Daney <david.daney@cavium.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Matt Redfearn <matt.redfearn@imgtec.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16366/ Signed-off-by:
Ralf Baechle <ralf@linux-mips.org>
-
- Jun 21, 2017
-
-
David Rheinsberg authored
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to retrieve the auxiliary groups of the remote peer. It is designed to naturally extend SO_PEERCRED. That is, the underlying data is from the same credentials. Regarding its syntax, it is based on SO_PEERSEC. That is, if the provided buffer is too small, ERANGE is returned and @optlen is updated. Otherwise, the information is copied, @optlen is set to the actual size, and 0 is returned. While SO_PEERCRED (and thus `struct ucred') already returns the primary group, it lacks the auxiliary group vector. However, nearly all access controls (including kernel side VFS and SYSVIPC, but also user-space polkit, DBus, ...) consider the entire set of groups, rather than just the primary group. But this is currently not possible with pure SO_PEERCRED. Instead, user-space has to work around this and query the system database for the auxiliary groups of a UID retrieved via SO_PEERCRED. Unfortunately, there is no race-free way to query the auxiliary groups of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space solution is to use getgrouplist(3p), which itself falls back to NSS and whatever is configured in nsswitch.conf(3). This effectively checks which groups we *would* assign to the user if it logged in *now*. On normal systems it is as easy as reading /etc/group, but with NSS it can resort to quering network databases (eg., LDAP), using IPC or network communication. Long story short: Whenever we want to use auxiliary groups for access checks on IPC, we need further IPC to talk to the user/group databases, rather than just relying on SO_PEERCRED and the incoming socket. This is unfortunate, and might even result in dead-locks if the database query uses the same IPC as the original request. So far, those recursions / dead-locks have been avoided by using primitive IPC for all crucial NSS modules. However, we want to avoid re-inventing the wheel for each NSS module that might be involved in user/group queries. Hence, we would preferably make DBus (and other IPC that supports access-management based on groups) work without resorting to the user/group database. This new SO_PEERGROUPS ioctl would allow us to make dbus-daemon work without ever calling into NSS. Cc: Michal Sekletar <msekleta@redhat.com> Cc: Simon McVittie <simon.mcvittie@collabora.co.uk> Reviewed-by:
Tom Gundersen <teg@jklm.no> Signed-off-by:
David Herrmann <dh.herrmann@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jun 09, 2017
-
-
Aleksa Sarai authored
When opening the slave end of a PTY, it is not possible for userspace to safely ensure that /dev/pts/$num is actually a slave (in cases where the mount namespace in which devpts was mounted is controlled by an untrusted process). In addition, there are several unresolvable race conditions if userspace were to attempt to detect attacks through stat(2) and other similar methods [in addition it is not clear how userspace could detect attacks involving FUSE]. Resolve this by providing an interface for userpace to safely open the "peer" end of a PTY file descriptor by using the dentry cached by devpts. Since it is not possible to have an open master PTY without having its slave exposed in /dev/pts this interface is safe. This interface currently does not provide a way to get the master pty (since it is not clear whether such an interface is safe or even useful). Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Valentin Rothberg <vrothberg@suse.com> Signed-off-by:
Aleksa Sarai <asarai@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- May 22, 2017
-
-
David S. Miller authored
A definition was only provided for asm-generic/socket.h using platforms, define it for the others as well Reported-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 10, 2017
-
-
Nicolas Dichtel authored
Regularly, when a new header is created in include/uapi/, the developer forgets to add it in the corresponding Kbuild file. This error is usually detected after the release is out. In fact, all headers under uapi directories should be exported, thus it's useless to have an exhaustive list. After this patch, the following files, which were not exported, are now exported (with make headers_install_all): asm-arc/kvm_para.h asm-arc/ucontext.h asm-blackfin/shmparam.h asm-blackfin/ucontext.h asm-c6x/shmparam.h asm-c6x/ucontext.h asm-cris/kvm_para.h asm-h8300/shmparam.h asm-h8300/ucontext.h asm-hexagon/shmparam.h asm-m32r/kvm_para.h asm-m68k/kvm_para.h asm-m68k/shmparam.h asm-metag/kvm_para.h asm-metag/shmparam.h asm-metag/ucontext.h asm-mips/hwcap.h asm-mips/reg.h asm-mips/ucontext.h asm-nios2/kvm_para.h asm-nios2/ucontext.h asm-openrisc/shmparam.h asm-parisc/kvm_para.h asm-powerpc/perf_regs.h asm-sh/kvm_para.h asm-sh/ucontext.h asm-tile/shmparam.h asm-unicore32/shmparam.h asm-unicore32/ucontext.h asm-x86/hwcap2.h asm-xtensa/kvm_para.h drm/armada_drm.h drm/etnaviv_drm.h drm/vgem_drm.h linux/aspeed-lpc-ctrl.h linux/auto_dev-ioctl.h linux/bcache.h linux/btrfs_tree.h linux/can/vxcan.h linux/cifs/cifs_mount.h linux/coresight-stm.h linux/cryptouser.h linux/fsmap.h linux/genwqe/genwqe_card.h linux/hash_info.h linux/kcm.h linux/kcov.h linux/kfd_ioctl.h linux/lightnvm.h linux/module.h linux/nbd-netlink.h linux/nilfs2_api.h linux/nilfs2_ondisk.h linux/nsfs.h linux/pr.h linux/qrtr.h linux/rpmsg.h linux/sched/types.h linux/sed-opal.h linux/smc.h linux/smc_diag.h linux/stm.h linux/switchtec_ioctl.h linux/vfio_ccw.h linux/wil6210_uapi.h rdma/bnxt_re-abi.h Note that I have removed from this list the files which are generated in every exported directories (like .install or .install.cmd). Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all subdirs with a pure makefile command. For the record, note that exported files for asm directories are a mix of files listed by: - include/uapi/asm-generic/Kbuild.asm; - arch/<arch>/include/uapi/asm/Kbuild; - arch/<arch>/include/asm/Kbuild. Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by:
Russell King <rmk+kernel@armlinux.org.uk> Acked-by:
Mark Salter <msalter@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by:
Masahiro Yamada <yamada.masahiro@socionext.com>
-