Skip to content
  1. Mar 06, 2017
    • Michael Ellerman's avatar
      powerpc/64: Fix L1D cache shape vector reporting L1I values · 9c7a0086
      Michael Ellerman authored
      
      
      It seems we didn't pay quite enough attention when testing the new cache
      shape vectors, which means we didn't notice the bug where the vector for
      the L1D was using the L1I values. Fix it, resulting in eg:
      
        L1I  cache size:     0x8000      32768B         32K
        L1I  line size:        0x80       8-way associative
        L1D  cache size:    0x10000      65536B         64K
        L1D  line size:        0x80       8-way associative
      
      Fixes: 98a5f361 ("powerpc: Add new cache geometry aux vectors")
      Cut-and-paste-bug-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Badly-reviewed-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      9c7a0086
    • Anton Blanchard's avatar
      powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info() · 6ba422c7
      Anton Blanchard authored
      
      
      I see a panic in early boot when building with a recent gcc toolchain.
      The issue is a divide by zero, which is undefined. Older toolchains
      let us get away with it:
      
      int foo(int a) { return a / 0; }
      
      foo:
      	li 9,0
      	divw 3,3,9
      	extsw 3,3
      	blr
      
      But newer ones catch it:
      
      foo:
      	trap
      
      Add a check to avoid the divide by zero.
      
      Fixes: e2827fe5 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      6ba422c7
    • Suraj Jitindar Singh's avatar
      powerpc: Update to new option-vector-5 format for CAS · 014d02cb
      Suraj Jitindar Singh authored
      
      
      On POWER9 the ibm,client-architecture-support (CAS) negotiation process
      has been updated to change how the host to guest negotiation is done for
      the new hash/radix mmu as well as the nest mmu, process tables and guest
      translation shootdown (GTSE).
      
      This is documented in the unreleased PAPR ACR "CAS option vector
      additions for P9".
      
      The host tells the guest which options it supports in
      ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
      to request in the CAS call and these are agreed to in the
      ibm,architecture-vec-5 property of the chosen node.
      
      Thus we read ibm,arch-vec-5-platform-support and make our selection before
      calling CAS. We then parse the ibm,architecture-vec-5 property of the
      chosen node to check whether we should run as hash or radix.
      
      ibm,arch-vec-5-platform-support format:
      
      index value pairs: <index, val> ... <index, val>
      
      index: Option vector 5 byte number
      val:   Some representation of supported values
      
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Acked-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [mpe: Don't print about unknown options, be consistent with OV5_FEAT]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      014d02cb
    • Suraj Jitindar Singh's avatar
      powerpc: Parse the command line before calling CAS · 12cc9fd6
      Suraj Jitindar Singh authored
      
      
      On POWER9 the hypervisor requires the guest to decide whether it would
      like to use a hash or radix mmu model at the time it calls
      ibm,client-architecture-support (CAS) based on what the hypervisor has
      said it's allowed to do. It is possible to disable radix by passing
      "disable_radix" on the command line. The next patch will add support for
      the new CAS format, thus we need to parse the command line before calling
      CAS so we can correctly select which mmu we would like to use.
      
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      12cc9fd6
    • Balbir Singh's avatar
      powerpc/xics: Work around limitations of OPAL XICS priority handling · a69e2fb7
      Balbir Singh authored
      
      
      The CPPR (Current Processor Priority Register) of a XICS interrupt
      presentation controller contains a value N, such that only interrupts
      with a priority "more favoured" than N will be received by the CPU,
      where "more favoured" means "less than". So if the CPPR has the value 5
      then only interrupts with a priority of 0-4 inclusive will be received.
      
      In theory the CPPR can support a value of 0 to 255 inclusive.
      In practice Linux only uses values of 0, 4, 5 and 0xff. Setting the CPPR
      to 0 rejects all interrupts, setting it to 0xff allows all interrupts.
      The values 4 and 5 are used to differentiate IPIs from external
      interrupts. Setting the CPPR to 5 allows IPIs to be received but not
      external interrupts.
      
      The CPPR emulation in the OPAL XICS implementation only directly
      supports priorities 0 and 0xff. All other priorities are considered
      equivalent, and mapped to a single priority value internally. This means
      when using icp-opal we can not allow IPIs but not externals.
      
      This breaks Linux's use of priority values when a CPU is hot unplugged.
      After migrating IRQs away from the CPU that is being offlined, we set
      the priority to 5, meaning we still want the offline CPU to receive
      IPIs. But the effect of the OPAL XICS emulation's use of a single
      priority value is that all interrupts are rejected by the CPU. With the
      CPU offline, and not receiving IPIs, we may not be able to wake it up to
      bring it back online.
      
      The first part of the fix is in icp_opal_set_cpu_priority(). CPPR values
      of 0 to 4 inclusive will correctly cause all interrupts to be rejected,
      so we pass those CPPR values through to OPAL. However if we are called
      with a CPPR of 5 or greater, the caller is expecting to be able to allow
      IPIs but not external interrupts. We know this doesn't work, so instead
      of rejecting all interrupts we choose the opposite which is to allow all
      interrupts. This is still not correct behaviour, but we know for the
      only existing caller (xics_migrate_irqs_away()), that it is the better
      option.
      
      The other part of the fix is in xics_migrate_irqs_away(). Instead of
      setting priority (CPPR) to 0, and then back to 5 before migrating IRQs,
      we migrate the IRQs before setting the priority back to 5. This should
      have no effect on an ICP backend with a working set_priority(), and on
      icp-opal it means we will keep all interrupts blocked until after we've
      finished doing the IRQ migration. Additionally we wait for 5ms after
      doing the migration to make sure there are no IRQs in flight.
      
      Fixes: d7436188 ("powerpc/xics: Add ICP OPAL backend")
      Cc: stable@vger.kernel.org # v4.8+
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Tested-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      [mpe: Rewrote comments and change log, change delay to 5ms]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a69e2fb7
  2. Mar 04, 2017
  3. Mar 03, 2017
    • Laurentiu Tudor's avatar
      powerpc/booke: Fix boot crash due to null hugepd · 3fb66a70
      Laurentiu Tudor authored
      
      
      On 32-bit book-e machines, hugepd_ok() no longer takes into account null
      hugepd values, causing this crash at boot:
      
        Unable to handle kernel paging request for data at address 0x80000000
        ...
        NIP [c0018378] follow_huge_addr+0x38/0xf0
        LR [c001836c] follow_huge_addr+0x2c/0xf0
        Call Trace:
         follow_huge_addr+0x2c/0xf0 (unreliable)
         follow_page_mask+0x40/0x3e0
         __get_user_pages+0xc8/0x450
         get_user_pages_remote+0x8c/0x250
         copy_strings+0x110/0x390
         copy_strings_kernel+0x2c/0x50
         do_execveat_common+0x478/0x630
         do_execve+0x2c/0x40
         try_to_run_init_process+0x18/0x60
         kernel_init+0xbc/0x110
         ret_from_kernel_thread+0x5c/0x64
      
      This impacts all nxp (ex-freescale) 32-bit booke platforms.
      
      This was caused by the change of hugepd_t.pd from signed to unsigned,
      and the update to the nohash version of hugepd_ok(). Previously
      hugepd_ok() could exclude all non-huge and NULL pgds using > 0, whereas
      now we need to explicitly check that the value is not zero and also that
      PD_HUGE is *clear*.
      
      This isn't protected by the pgd_none() check in __find_linux_pte_or_hugepte()
      because on 32-bit we use pgtable-nopud.h, which causes the pgd_none()
      check to be always false.
      
      Fixes: 20717e1f ("powerpc/mm: Fix little-endian 4K hugetlb")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: default avatarMadalin-Cristian Bucur <madalin.bucur@nxp.com>
      Signed-off-by: default avatarLaurentiu Tudor <laurentiu.tudor@nxp.com>
      [mpe: Flesh out change log details.]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3fb66a70
    • Nicholas Piggin's avatar
      powerpc: Fix compiling a BE kernel with a powerpc64le toolchain · 4dc831aa
      Nicholas Piggin authored
      
      
      GCC can compile with either endian, but the default ABI version is set
      based on the default endianness of the toolchain. Alan Modra says:
      
        you need both -mbig and -mabi=elfv1 to make a powerpc64le gcc
        generate powerpc64 code
      
      The opposite is true for powerpc64 when generating -mlittle it
      requires -mabi=elfv2 to generate v2 ABI, which we were already doing.
      
      This change adds ABI annotations together with endianness for all cases,
      LE and BE. This fixes the case of building a BE kernel with a toolchain
      that is LE by default.
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4dc831aa
    • Gautham R. Shenoy's avatar
      powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop · 424f8acd
      Gautham R. Shenoy authored
      
      
      Commit 09206b60 ("powernv: Pass PSSCR value and mask to
      power9_idle_stop") added additional code in power_enter_stop() to
      distinguish between stop requests whose PSSCR had ESL=EC=1 from those
      which did not. When ESL=EC=1, we do a forward-jump to a location
      labelled by "1", which had the code to handle the ESL=EC=1 case.
      
      Unfortunately just a couple of instructions before this label, is the
      macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its
      expansion.
      
      As a result, the current code can result in directly executing stop
      instruction for deep stop requests with PSSCR ESL=EC=1, without saving
      the hypervisor state.
      
      Fix this BUG by labeling the location that handles ESL=EC=1 case with
      a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion
      a la .Lxx from Anton Blanchard).
      
      While at it, rename the label "2" labelling the location of the code
      handling entry into deep stop states with ".Lhandle_deep_stop".
      
      For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro
      to an not-so commonly used value in order to avoid similar mishaps in
      the future.
      
      Fixes: 09206b60 ("powernv: Pass PSSCR value and mask to power9_idle_stop")
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      424f8acd
    • Paul Mackerras's avatar
      powerpc/64: Invalidate process table caching after setting process table · 7a70d728
      Paul Mackerras authored
      
      
      The POWER9 MMU reads and caches entries from the process table.
      When we kexec from one kernel to another, the second kernel sets
      its process table pointer but doesn't currently do anything to
      make the CPU invalidate any cached entries from the old process table.
      This adds a tlbie (TLB invalidate entry) instruction with parameters
      to invalidate caching of the process table after the new process
      table is installed.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7a70d728
    • Ravi Bangoria's avatar
      powerpc: emulate_step() tests for load/store instructions · 4ceae137
      Ravi Bangoria authored
      
      
      Add new selftest that test emulate_step for Normal, Floating Point,
      Vector and Vector Scalar - load/store instructions. Test should run
      at boot time if CONFIG_KPROBES_SANITY_TEST and CONFIG_PPC64 is set.
      
      Sample log:
      
        emulate_step_test: ld             : PASS
        emulate_step_test: lwz            : PASS
        emulate_step_test: lwzx           : PASS
        emulate_step_test: std            : PASS
        emulate_step_test: ldarx / stdcx. : PASS
        emulate_step_test: lfsx           : PASS
        emulate_step_test: stfsx          : PASS
        emulate_step_test: lfdx           : PASS
        emulate_step_test: stfdx          : PASS
        emulate_step_test: lvx            : PASS
        emulate_step_test: stvx           : PASS
        emulate_step_test: lxvd2x         : PASS
        emulate_step_test: stxvd2x        : PASS
      
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      [mpe: Drop start/complete lines, make it all __init]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      4ceae137
    • Ravi Bangoria's avatar
      powerpc: Emulation support for load/store instructions on LE · e148bd17
      Ravi Bangoria authored
      
      
      emulate_step() uses a number of underlying kernel functions that were
      initially not enabled for LE. This has been rectified since. So, fix
      emulate_step() for LE for the corresponding instructions.
      
      Cc: stable@vger.kernel.org # v3.18+
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      e148bd17
  4. Feb 28, 2017
  5. Feb 25, 2017
  6. Feb 23, 2017
    • Michal Hocko's avatar
      lib/show_mem.c: teach show_mem to work with the given nodemask · 9af744d7
      Michal Hocko authored
      show_mem() allows to filter out node specific data which is irrelevant
      to the allocation request via SHOW_MEM_FILTER_NODES.  The filtering is
      done in skip_free_areas_node which skips all nodes which are not in the
      mems_allowed of the current process.  This works most of the time as
      expected because the nodemask shouldn't be outside of the allocating
      task but there are some exceptions.  E.g.  memory hotplug might want to
      request allocations from outside of the allowed nodes (see
      new_node_page).
      
      Get rid of this hardcoded behavior and push the allocation mask down the
      show_mem path and use it instead of cpuset_current_mems_allowed.  NULL
      nodemask is interpreted as cpuset_current_mems_allowed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170117091543.25850-5-mhocko@kernel.org
      
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9af744d7
    • Denys Vlasenko's avatar
      powerpc: do not make the entire heap executable · 16e72e9b
      Denys Vlasenko authored
      On 32-bit powerpc the ELF PLT sections of binaries (built with
      --bss-plt, or with a toolchain which defaults to it) look like this:
      
        [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
        [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
        [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4
      
      Which results in an ELF load header:
      
        Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
        LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000
      
      This is all correct, the load region containing the PLT is marked as
      executable.  Note that the PLT starts at 0002b00c but the file mapping
      ends at 0002aff8, so the PLT falls in the 0 fill section described by
      the load header, and after a page boundary.
      
      Unfortunately the generic ELF loader ignores the X bit in the load
      headers when it creates the 0 filled non-file backed mappings.  It
      assumes all of these mappings are RW BSS sections, which is not the case
      for PPC.
      
      gcc/ld has an option (--secure-plt) to not do this, this is said to
      incur a small performance penalty.
      
      Currently, to support 32-bit binaries with PLT in BSS kernel maps
      *entire brk area* with executable rights for all binaries, even
      --secure-plt ones.
      
      Stop doing that.
      
      Teach the ELF loader to check the X bit in the relevant load header and
      create 0 filled anonymous mappings that are executable if the load
      header requests that.
      
      Test program showing the difference in /proc/$PID/maps:
      
      int main() {
      	char buf[16*1024];
      	char *p = malloc(123); /* make "[heap]" mapping appear */
      	int fd = open("/proc/self/maps", O_RDONLY);
      	int len = read(fd, buf, sizeof(buf));
      	write(1, buf, len);
      	printf("%p\n", p);
      	return 0;
      }
      
      Compiled using: gcc -mbss-plt -m32 -Os test.c -otest
      
      Unpatched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
      f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
      0x10690008
      
      Patched ppc64 kernel:
      00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
      0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
      10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
      10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
      10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
      10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                        ^^^^ this has changed
      f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
      f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
      ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
      0x10180008
      
      The patch was originally posted in 2012 by Jason Gunthorpe
      and apparently ignored:
      
      https://lkml.org/lkml/2012/9/30/138
      
      Lightly run-tested.
      
      Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com
      
      
      Signed-off-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16e72e9b
  7. Feb 22, 2017
  8. Feb 21, 2017
    • Michael Roth's avatar
      powerpc/pseries: Advertise Hot Plug Event support to firmware · 3dbbaf20
      Michael Roth authored
      
      
      With the inclusion of commit 333f7b76 ("powerpc/pseries: Implement
      indexed-count hotplug memory add") and commit 75384347
      ("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
      now have complete handling of the RTAS hotplug event format as described
      by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
      
      This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
      architecture option vector 5, and allows for greater control over
      cpu/memory/pci hot plug/unplug operations.
      
      Existing pseries kernels will utilize this capability based on the
      existence of the /event-sources/hot-plug-events DT property, so we
      only need to advertise it via CAS and do not need a corresponding
      FW_FEATURE_* value to test for.
      
      Signed-off-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      3dbbaf20
    • Douglas Miller's avatar
      powerpc/xmon: Dump memory in CPU endian format · 5e48dc0a
      Douglas Miller authored
      
      
      Extend the dump command to allow display of 2, 4, and 8 byte words in
      CPU endian format. Also adds dump command for "1 byte values" for the
      sake of symmetry. New commands are:
      
        d1	dump 1 byte values
        d2	dump 2 byte values
        d4	dump 4 byte values
        d8	dump 8 byte values
      
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarDouglas Miller <dougmill@linux.vnet.ibm.com>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      5e48dc0a
    • Nathan Fontenot's avatar
      powerpc/pseries: Revert 'Auto-online hotplugged memory' · 943db62c
      Nathan Fontenot authored
      
      
      This reverts commit ec999072 ("powerpc/pseries: Auto-online
      hotplugged memory"), and 9dc51281 ("powerpc: Fix unused function
      warning 'lmb_to_memblock'").
      
      Using the auto-online acpability does online added memory but does not
      update the associated device struct to indicate that the memory is
      online. This causes the pseries memory DLPAR code to fail when trying to
      remove a LMB that was previously removed and added back. This happens
      when validating that the LMB is removable.
      
      This patch reverts to the previous behavior of calling device_online()
      to online the LMB when it is DLPAR added and moves the lmb_to_memblock()
      routine out of CONFIG_MEMORY_HOTREMOVE now that we call it for add.
      
      Fixes: ec999072 ("powerpc/pseries: Auto-online hotplugged memory")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      943db62c
  9. Feb 20, 2017
  10. Feb 18, 2017
    • Nicholas Piggin's avatar
      powerpc/64: Implement clear_bit_unlock_is_negative_byte() · d11914b2
      Nicholas Piggin authored
      
      
      Commit b91e1302 ("mm: optimize PageWaiters bit use for
      unlock_page()") added a special bitop function to speed up
      unlock_page(). Implement this for 64-bit powerpc.
      
      This improves the unlock_page() core code from this:
      
      	li	9,1
      	lwsync
      1:	ldarx	10,0,3,0
      	andc	10,10,9
      	stdcx.	10,0,3
      	bne-	1b
      	ori	2,2,0
      	ld	9,0(3)
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      To this:
      
      	li	10,1
      	lwsync
      1:	ldarx	9,0,3,0
      	andc	9,9,10
      	stdcx.	9,0,3
      	bne-	1b
      	andi.	10,9,0x80
      	beqlr
      	li	4,0
      	b	wake_up_page_bit
      
      In a test of elapsed time for dd writing into 16GB of already-dirty
      pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
      this patch reduced overhead by 1.1%:
      
          N           Min           Max        Median           Avg        Stddev
      x  19         2.578         2.619         2.594         2.595         0.011
      +  19         2.552         2.592         2.564         2.565         0.008
      Difference at 95.0% confidence
      	-0.030  +/- 0.006
      	-1.142% +/- 0.243%
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Made 64-bit only until I can test it properly on 32-bit]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d11914b2
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now · bcd3bb63
      Paul Mackerras authored
      
      
      The new HPT resizing code added in commit b5baa687 ("KVM: PPC:
      Book3S HV: KVM-HV HPT resizing implementation", 2016-12-20) doesn't
      have code to handle the new HPTE format which POWER9 uses.  Thus it
      would be best not to advertise it to userspace on POWER9 systems
      until it works properly.
      
      Also, since resize_hpt_rehash_hpte() contains BUG_ON() calls that
      could be hit on POWER9, let's prevent it from being called on POWER9
      for now.
      
      Acked-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      bcd3bb63
  11. Feb 17, 2017
    • Daniel Borkmann's avatar
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann authored
      
      
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74451e66
    • Daniel Borkmann's avatar
      bpf: remove stubs for cBPF from arch code · 9383191d
      Daniel Borkmann authored
      
      
      Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
      that a single __weak function in the core that can be overridden
      similarly to the eBPF one. Also remove stale pr_err() mentions
      of bpf_jit_compile.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9383191d
    • Paolo Bonzini's avatar
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini authored
      
      
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
    • Gavin Shan's avatar
      powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable() · 02983449
      Gavin Shan authored
      
      
      The local variable @iov isn't used, to remove it.
      
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      02983449
    • Gavin Shan's avatar
      powerpc/kernel: Remove error message in pcibios_setup_phb_resources() · 727597d1
      Gavin Shan authored
      
      
      The CAPI driver creates virtual PHB (vPHB) from the CAPI adapter.
      The vPHB's IO and memory windows aren't built from device-tree node
      as we do for normal PHBs. A error message is thrown in below path
      when trying to probe AFUs contained in the adapter. The error message
      is confusing and unnecessary.
      
          cxl_probe()
          pci_init_afu()
          cxl_pci_vphb_add()
          pcibios_scan_phb()
          pcibios_setup_phb_resources()
      
      This removes the error message. We might have the case where the
      first memory window on real PHB isn't populated properly because
      of error in "ranges" property in the device-tree node. We can check
      the device-tree instead for that. This also removes one unnecessary
      blank line in the function.
      
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      727597d1
Loading