Skip to content
  1. Apr 28, 2014
    • Wei Yang's avatar
      powerpc/powernv: Release the refcount for pci_dev · 4966bfa1
      Wei Yang authored
      
      
      On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
      leads to the pci_dev is not destroyed when hotplugging a pci device.
      
      This patch release the unnecessary refcount.
      
      Signed-off-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4966bfa1
    • Wei Yang's avatar
      powerpc/powernv: Reduce multi-hit of iommu_add_device() · 3f28c5af
      Wei Yang authored
      
      
      During the EEH hotplug event, iommu_add_device() will be invoked three times
      and two of them will trigger warning or error.
      
      The three times to invoke the iommu_add_device() are:
      
          pci_device_add
             ...
             set_iommu_table_base_and_group   <- 1st time, fail
          device_add
             ...
             tce_iommu_bus_notifier           <- 2nd time, succees
          pcibios_add_pci_devices
             ...
             pcibios_setup_bus_devices        <- 3rd time, re-attach
      
      The first time fails, since the dev->kobj->sd is not initialized. The
      dev->kobj->sd is initialized in device_add().
      The third time's warning is triggered by the re-attach of the iommu_group.
      
      After applying this patch, the error
      
          iommu_tce: 0003:05:00.0 has not been added, ret=-14
      
      and the warning
      
          [  204.123609] ------------[ cut here ]------------
          [  204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
          [  204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
          [  204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
          [  204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
          [  204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
          [  204.124506] REGS: c0000027ed50f440 TRAP: 0700   Not tainted  (3.14.0-rc5yw+)
          [  204.124558] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 88008084  XER: 20000000
          [  204.124682] CFAR: c00000000006c644 SOFTE: 1
          GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
          GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
          GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
          GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
          GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
          GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
          GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
          GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
          [  204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
          [  204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
          [  204.125443] Call Trace:
          [  204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
          [  204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
          [  204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
          [  204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
          [  204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
          [  204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
          [  204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
          [  204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
          [  204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
          [  204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
          [  204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
          [  204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
          [  204.126556] Instruction dump:
          [  204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
          [  204.126787] 60000000 e87e0298 3143ffff 7d2a1910 <0b090000> 2fa90000 40de00c8 ebfe0218
          [  204.126966] ---[ end trace 6e7aefd80add2973 ]---
      
      are cleared.
      
      This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
      revert part of the change in commit d905c5df(PPC: POWERNV: move
      iommu_add_device earlier).
      
      Signed-off-by: default avatarWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3f28c5af
    • Anton Blanchard's avatar
      powerpc/powernv: Fix little endian issues in OPAL flash code · cc146d1d
      Anton Blanchard authored
      
      
      With this patch I was able to update firmware on an LE kernel.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cc146d1d
    • Benjamin Herrenschmidt's avatar
      powerpc/powernv: Fix kexec races going back to OPAL · 298b34d7
      Benjamin Herrenschmidt authored
      
      
      We have a subtle race when sending CPUs back to OPAL on kexec.
      
      We mark them as "in real mode" right before we send them down. Once
      we've booted the new kernel, it might try to call opal_reinit_cpus()
      to change endianness, and that requires all CPUs to be spinning inside
      OPAL.
      
      However there is no synchronization here and we've observed cases
      where the returning CPUs hadn't established their new state inside
      OPAL before opal_reinit_cpus() is called, causing it to fail.
      
      The proper fix is to actually wait for them to go down all the way
      from the kexec'ing kernel.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      298b34d7
    • Joel Stanley's avatar
      powerpc/powernv: Check sysparam size before creation · 63aecfb2
      Joel Stanley authored
      
      
      The size of the sysparam sysfs files is determined from the device tree
      at boot. However the buffer is hard coded to 64 bytes. If we encounter a
      parameter that is larger than 64, or miss-parse the device tree, the
      buffer will overflow when reading or writing to the parameter.
      
      Check it at discovery time, and if the parameter is too large, do not
      create a sysfs entry for it.
      
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      63aecfb2
    • Joel Stanley's avatar
    • Joel Stanley's avatar
      powerpc/powernv: Check sysfs size before copying · 85390378
      Joel Stanley authored
      
      
      The sysparam code currently uses the userspace supplied number of
      bytes when memcpy()ing in to a local 64-byte buffer.
      
      Limit the maximum number of bytes by the size of the buffer.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      85390378
    • Joel Stanley's avatar
      powerpc/powernv: Use ssize_t for sysparam return values · b8569d23
      Joel Stanley authored
      
      
      The OPAL calls are returning int64_t values, which the sysparam code
      stores in an int, and the sysfs callback returns ssize_t. Make code a
      easier to read by consistently using ssize_t.
      
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b8569d23
    • Joel Stanley's avatar
      powerpc/powernv: Fix sysparam sysfs error handling · ba9a32b1
      Joel Stanley authored
      
      
      When a sysparam query in OPAL returned a negative value (error code),
      sysfs would spew out a decent chunk of memory; almost 64K more than
      expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
      sysfs code at sys_param_show.
      
      The return value of sys_param_show is a ssize_t, calculated using
      
        return ret ? ret : attr->param_size;
      
      Alan Modra explains:
      
        "attr->param_size" is an unsigned int, "ret" an int, so the overall
        expression has type unsigned int.  Result is that ret is cast to
        unsigned int before being cast to ssize_t.
      
      Instead of using the ternary operator, set ret to the param_size if an
      error is not detected. The same bug exists in the sysfs write callback;
      this patch fixes it in the same way.
      
      A note on debugging this next time: on my system gcc will warn about
      this if compiled with -Wsign-compare, which is not enabled by -Wall,
      only -Wextra.
      
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba9a32b1
    • Li Zhong's avatar
      powerpc: Fix Oops in rtas_stop_self() · 4fb8d027
      Li Zhong authored
      
      
      commit 41dd03a9 may cause Oops in rtas_stop_self().
      
      The reason is that the rtas_args was moved into stack space. For a box
      with more that 4GB RAM, the stack could easily be outside 32bit range,
      but RTAS is 32bit.
      
      So the patch moves rtas_args away from stack by adding static before
      it.
      
      Signed-off-by: default avatarLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4fb8d027
    • Jeff Mahoney's avatar
      powerpc: Export flush_icache_range · 28ea3c75
      Jeff Mahoney authored
      
      
      Commit aac416fc (lkdtm: flush icache and report actions) calls
      flush_icache_range from a module. It's exported on most architectures
      that implement it, but not on powerpc. This patch exports it to fix
      the module link failure.
      
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      28ea3c75
  2. Apr 18, 2014
  3. Apr 17, 2014
    • Masami Hiramatsu's avatar
      kprobes/x86: Fix page-fault handling logic · 6381c24c
      Masami Hiramatsu authored
      
      
      Current kprobes in-kernel page fault handler doesn't
      expect that its single-stepping can be interrupted by
      an NMI handler which may cause a page fault(e.g. perf
      with callback tracing).
      
      In that case, the page-fault handled by kprobes and it
      misunderstands the page-fault has been caused by the
      single-stepping code and tries to recover IP address
      to probed address.
      
      But the truth is the page-fault has been caused by the
      NMI handler, and do_page_fault failes to handle real
      page fault because the IP address is modified and
      causes Kernel BUGs like below.
      
       ----
       [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
       [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
      
      To handle this correctly, I fixed the kprobes fault
      handler to ensure the faulted ip address is its own
      single-step buffer instead of checking current kprobe
      state.
      
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fche@redhat.com
      Cc: systemtap@sourceware.org
      Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6381c24c
    • Ingo Molnar's avatar
      x86/mce: Fix CMCI preemption bugs · ea431643
      Ingo Molnar authored
      
      
      The following commit:
      
        27f6c573 ("x86, CMCI: Add proper detection of end of CMCI storms")
      
      Added two preemption bugs:
      
       - machine_check_poll() does a get_cpu_var() without a matching
         put_cpu_var(), which causes preemption imbalance and crashes upon
         bootup.
      
       - it does percpu ops without disabling preemption. Preemption is not
         disabled due to the mistaken use of a raw spinlock.
      
      To fix these bugs fix the imbalance and change
      cmci_discover_lock to a regular spinlock.
      
      Reported-by: default avatarOwen Kibel <qmewlo@gmail.com>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chen, Gong <gong.chen@linux.intel.com>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Todorov <atodorov@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      --
       arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
       arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
       2 files changed, 10 insertions(+), 12 deletions(-)
      ea431643
  4. Apr 16, 2014
    • Tony Luck's avatar
      [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237) · c0b5a64d
      Tony Luck authored
      April 2014 Itanium processor specification update:
      
      http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
      
      
      
      describes this erratum:
      
      =========================================================================
      237. Under a complex set of conditions, store to load forwarding for a
      sub 8-byte load may complete incorrectly
      
      Problem: A load instruction may complete incorrectly when a code sequence
      using 4-byte or smaller load and store operations to the same address
      is executed in combination with specific timing of all the following
      concurrent conditions: store to load forwarding, alignment checking
      enabled, a mis-predicted branch, and complex cache utilization activity.
      
      Implication: The affected sub 8-byte instruction may complete
      incorrectly resulting in unpredictable system behavior. There is an
      extremely low probability of exposure due to the significant number of
      complex microarchitectural concurrent conditions required to encounter
      the erratum.
      
      Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
      Hyper-Threading will significantly reduce exposure to the conditions
      that contribute to encountering the erratum.
      
      Status: See the Summary Table of Changes for the affected steppings.
      =========================================================================
      
      [Table of changes essentially lists all models from McKinley to Tukwila]
      
      The PSR.ac bit controls whether the processor will always generate
      an unaligned reference trap (0x5a00) for a misaligned data access
      (when PSR.ac=1) or if it will let the access succeed when running
      on a cpu that implements logic to handle some unaligned accesses.
      
      Way back in 2008 in commit b704882e
        [IA64] Rationalize kernel mode alignment checking
      we made the decision to always enable strict checking. We were
      already doing so in trap/interrupt context because the common
      preamble code set this bit - but the rest of supervisor code
      (and by inheritance user code) ran with PSR.ac=0.
      
      We now reverse that decision and set PSR.ac=0 everywhere in the
      kernel (also inherited by user processes). This will avoid the
      erratum using the method described in the Itanium specification
      update.  Net effect for users is that the processor will handle
      unaligned access when it can (typically with a tiny performance
      bubble in the pipeline ... but much less invasive than taking a
      trap and having the OS perform the access).
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      c0b5a64d
    • Ingo Molnar's avatar
      x86: Remove the PCI reboot method from the default chain · 5be44a6f
      Ingo Molnar authored
      
      
      Steve reported a reboot hang and bisected it back to this commit:
      
        a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list
      
      He heroically tested all reboot methods and found the following:
      
        reboot=t       # triple fault                  ok
        reboot=k       # keyboard ctrl                 FAIL
        reboot=b       # BIOS                          ok
        reboot=a       # ACPI                          FAIL
        reboot=e       # EFI                           FAIL   [system has no EFI]
        reboot=p       # PCI 0xcf9                     FAIL
      
      And I think it's pretty obvious that we should only try PCI 0xcf9 as a
      last resort - if at all.
      
      The other observation is that (on this box) we should never try
      the PCI reboot method, but close with either the 'triple fault'
      or the 'BIOS' (terminal!) reboot methods.
      
      Thirdly, CF9_COND is a total misnomer - it should be something like
      CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
      
      So this patch fixes the worst problems:
      
       - it orders the actual reboot logic to follow the reboot ordering
         pattern - it was in a pretty random order before for no good
         reason.
      
       - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
         BOOT_CF9_SAFE flags to make the code more obvious.
      
       - it tries the BIOS reboot method before the PCI reboot method.
         (Since 'BIOS' is a terminal reboot method resulting in a hang
          if it does not work, this is essentially equivalent to removing
          the PCI reboot method from the default reboot chain.)
      
       - just for the miraculous possibility of terminal (resulting
         in hang) reboot methods of triple fault or BIOS returning
         without having done their job, there's an ordering between
         them as well.
      
      Reported-and-bisected-and-tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5be44a6f
  5. Apr 15, 2014
  6. Apr 14, 2014
  7. Apr 13, 2014
  8. Apr 12, 2014
Loading