Skip to content
  1. Mar 31, 2023
    • Kan Liang's avatar
      iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplug · 16812c96
      Kan Liang authored
      
      
      A warning can be triggered when hotplug CPU 0.
      $ echo 0 > /sys/devices/system/cpu/cpu0/online
      
       ------------[ cut here ]------------
       Voluntary context switch within RCU read-side critical section!
       WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318
                rcu_note_context_switch+0x4f4/0x580
       RIP: 0010:rcu_note_context_switch+0x4f4/0x580
       Call Trace:
        <TASK>
        ? perf_event_update_userpage+0x104/0x150
        __schedule+0x8d/0x960
        ? perf_event_set_state.part.82+0x11/0x50
        schedule+0x44/0xb0
        schedule_timeout+0x226/0x310
        ? __perf_event_disable+0x64/0x1a0
        ? _raw_spin_unlock+0x14/0x30
        wait_for_completion+0x94/0x130
        __wait_rcu_gp+0x108/0x130
        synchronize_rcu+0x67/0x70
        ? invoke_rcu_core+0xb0/0xb0
        ? __bpf_trace_rcu_stall_warning+0x10/0x10
        perf_pmu_migrate_context+0x121/0x370
        iommu_pmu_cpu_offline+0x6a/0xa0
        ? iommu_pmu_del+0x1e0/0x1e0
        cpuhp_invoke_callback+0x129/0x510
        cpuhp_thread_fun+0x94/0x150
        smpboot_thread_fn+0x183/0x220
        ? sort_range+0x20/0x20
        kthread+0xe6/0x110
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
       ---[ end trace 0000000000000000 ]---
      
      The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(),
      when migrating a PMU to a new CPU. However, the current for_each_iommu()
      is within RCU read-side critical section.
      
      Two methods were considered to fix the issue.
      - Use the dmar_global_lock to replace the RCU read lock when going
        through the drhd list. But it triggers a lockdep warning.
      - Use the cpuhp_setup_state_multi() to set up a dedicated state for each
        IOMMU PMU. The lock can be avoided.
      
      The latter method is implemented in this patch. Since each IOMMU PMU has
      a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track
      the state. The state can be dynamically allocated now. Remove the
      CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE.
      
      Fixes: 46284c6c ("iommu/vt-d: Support cpumask for IOMMU perfmon")
      Reported-by: default avatarAmmy Yi <ammy.yi@intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com
      
      
      Signed-off-by: default avatarLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com
      
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      16812c96
  2. Mar 17, 2023
  3. Mar 15, 2023
    • Yu Kuai's avatar
      block: count 'ios' and 'sectors' when io is done for bio-based device · 5f275713
      Yu Kuai authored
      
      
      While using iostat for raid, I observed very strange 'await'
      occasionally, and turns out it's due to that 'ios' and 'sectors' is
      counted in bdev_start_io_acct(), while 'nsecs' is counted in
      bdev_end_io_acct(). I'm not sure why they are ccounted like that
      but I think this behaviour is obviously wrong because user will get
      wrong disk stats.
      
      Fix the problem by counting 'ios' and 'sectors' when io is done, like
      what rq-based device does.
      
      Fixes: 394ffa50 ("blk: introduce generic io stat accounting help function")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5f275713
    • Minwoo Im's avatar
      nvme-trace: show more opcode names · 8e19b87c
      Minwoo Im authored
      
      
      We have more commands to show in the trace. Sync up.
      
      Signed-off-by: default avatarMinwoo Im <minwoo.im.dev@gmail.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8e19b87c
    • Liu Ying's avatar
      drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc · 0d3c9333
      Liu Ying authored
      
      
      The returned array size for input formats is set through
      atomic_get_input_bus_fmts()'s 'num_input_fmts' argument, so use
      'num_input_fmts' to represent the array size in the function's kdoc,
      not 'num_output_fmts'.
      
      Fixes: 91ea8330 ("drm/bridge: Fix the bridge kernel doc")
      Fixes: f32df58a ("drm/bridge: Add the necessary bits to support bus format negotiation")
      Signed-off-by: default avatarLiu Ying <victor.liu@nxp.com>
      Reviewed-by: default avatarRobert Foss <rfoss@kernel.org>
      Signed-off-by: default avatarNeil Armstrong <neil.armstrong@linaro.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20230314055035.3731179-1-victor.liu@nxp.com
      0d3c9333
    • Eric Dumazet's avatar
      net: tunnels: annotate lockless accesses to dev->needed_headroom · 4b397c06
      Eric Dumazet authored
      
      
      IP tunnels can apparently update dev->needed_headroom
      in their xmit path.
      
      This patch takes care of three tunnels xmit, and also the
      core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA()
      helpers.
      
      More changes might be needed for completeness.
      
      BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit
      
      read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1:
      ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
      dst_output include/net/dst.h:444 [inline]
      ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
      iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
      ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      
      write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0:
      ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804
      __gre_xmit net/ipv4/ip_gre.c:469 [inline]
      ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
      __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
      netdev_start_xmit include/linux/netdevice.h:4895 [inline]
      xmit_one net/core/dev.c:3580 [inline]
      dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
      __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
      dev_queue_xmit include/linux/netdevice.h:3051 [inline]
      neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
      neigh_output include/net/neighbour.h:546 [inline]
      ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:444 [inline]
      NF_HOOK include/linux/netfilter.h:302 [inline]
      mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820
      mld_send_cr net/ipv6/mcast.c:2121 [inline]
      mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2390
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537
      kthread+0x1ac/0x1e0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x0dd4 -> 0x0e14
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
      Workqueue: mld mld_ifc_work
      
      Fixes: 8eb30be0 ("ipv6: Create ip6_tnl_xmit")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4b397c06
  4. Mar 14, 2023
  5. Mar 13, 2023
    • Niklas Schnelle's avatar
      PCI: s390: Fix use-after-free of PCI resources with per-function hotplug · ab909509
      Niklas Schnelle authored
      
      
      On s390 PCI functions may be hotplugged individually even when they
      belong to a multi-function device. In particular on an SR-IOV device VFs
      may be removed and later re-added.
      
      In commit a50297cf ("s390/pci: separate zbus creation from
      scanning") it was missed however that struct pci_bus and struct
      zpci_bus's resource list retained a reference to the PCI functions MMIO
      resources even though those resources are released and freed on
      hot-unplug. These stale resources may subsequently be claimed when the
      PCI function re-appears resulting in use-after-free.
      
      One idea of fixing this use-after-free in s390 specific code that was
      investigated was to simply keep resources around from the moment a PCI
      function first appeared until the whole virtual PCI bus created for
      a multi-function device disappears. The problem with this however is
      that due to the requirement of artificial MMIO addreesses (address
      cookies) extra logic is then needed to keep the address cookies
      compatible on re-plug. At the same time the MMIO resources semantically
      belong to the PCI function so tying their lifecycle to the function
      seems more logical.
      
      Instead a simpler approach is to remove the resources of an individually
      hot-unplugged PCI function from the PCI bus's resource list while
      keeping the resources of other PCI functions on the PCI bus untouched.
      
      This is done by introducing pci_bus_remove_resource() to remove an
      individual resource. Similarly the resource also needs to be removed
      from the struct zpci_bus's resource list. It turns out however, that
      there is really no need to add the MMIO resources to the struct
      zpci_bus's resource list at all and instead we can simply use the
      zpci_bar_struct's resource pointer directly.
      
      Fixes: a50297cf ("s390/pci: separate zbus creation from scanning")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Reviewed-by: default avatarMatthew Rosato <mjrosato@linux.ibm.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Link: https://lore.kernel.org/r/20230306151014.60913-2-schnelle@linux.ibm.com
      
      
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      ab909509
  6. Mar 12, 2023
  7. Mar 11, 2023
  8. Mar 10, 2023
  9. Mar 09, 2023
    • Nathan Chancellor's avatar
      clk: Avoid invalid function names in CLK_OF_DECLARE() · 5cf9d015
      Nathan Chancellor authored
      
      
      After commit c28cd1f3 ("clk: Mark a fwnode as initialized when using
      CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build:
      
       drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '('
       CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock",
       ^
       include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE'
               static void __init name##_of_clk_init_declare(struct device_node *np) \
                                  ^
       <scratch space>:124:1: note: expanded from here
       98dx1135_clk_of_clk_init_declare
       ^
       drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
       include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
               OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
                                               ^
       <scratch space>:125:3: note: expanded from here
       98dx1135_clk_of_clk_init_declare
         ^
       drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
       include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
               OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
                                               ^
       <scratch space>:125:3: note: expanded from here
       98dx1135_clk_of_clk_init_declare
         ^
       drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
       include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
               OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
                                               ^
       <scratch space>:125:3: note: expanded from here
       98dx1135_clk_of_clk_init_declare
         ^
      
      C function names must start with either an alphabetic letter or an
      underscore. To avoid generating invalid function names from clock names,
      add two underscores to the beginning of the identifier.
      
      Fixes: c28cd1f3 ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro")
      Suggested-by: default avatarSaravana Kannan <saravanak@google.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org
      
      
      Reviewed-by: default avatarSaravana Kannan <saravanak@google.com>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      5cf9d015
    • Uwe Kleine-König's avatar
      i2c: Switch .probe() to not take an id parameter · 03c835f4
      Uwe Kleine-König authored
      
      
      Commit b8a1a4cd ("i2c: Provide a temporary .probe_new() call-back
      type") introduced a new probe callback to convert i2c init routines to
      not take an i2c_device_id parameter. Now that all in-tree drivers are
      converted to the temporary .probe_new() callback, .probe() can be
      modified to match the desired prototype.
      
      Now that .probe() and .probe_new() have the same semantic, they can be
      defined as members of an anonymous union to save some memory and
      simplify the core code a bit.
      
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      03c835f4
  10. Mar 07, 2023
    • Jakub Kicinski's avatar
      ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause · 37d9df22
      Jakub Kicinski authored
      I was intending to make all the Netlink Spec code BSD-3-Clause
      to ease the adoption but it appears that:
       - I fumbled the uAPI and used "GPL WITH uAPI note" there
       - it gives people pause as they expect GPL in the kernel
      As suggested by Chuck re-license under dual. This gives us benefit
      of full BSD freedom while fulfilling the broad "kernel is under GPL"
      expectations.
      
      Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/
      Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      37d9df22
    • Johan Hovold's avatar
      interconnect: fix provider registration API · eb59eca0
      Johan Hovold authored
      
      
      The current interconnect provider interface is inherently racy as
      providers are expected to be added before being fully initialised.
      
      Specifically, nodes are currently not added and the provider data is not
      initialised until after registering the provider which can cause racing
      DT lookups to fail.
      
      Add a new provider API which will be used to fix up the interconnect
      drivers.
      
      The old API is reimplemented using the new interface and will be removed
      once all drivers have been fixed.
      
      Fixes: 11f1ceca ("interconnect: Add generic on-chip interconnect API")
      Fixes: 87e3031b ("interconnect: Allow endpoints translation via DT")
      Cc: stable@vger.kernel.org      # 5.1
      Reviewed-by: default avatarKonrad Dybcio <konrad.dybcio@linaro.org>
      Signed-off-by: default avatarJohan Hovold <johan+linaro@kernel.org>
      Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # i.MX8MP MSC SM2-MB-EP1 Board
      Link: https://lore.kernel.org/r/20230306075651.2449-4-johan+linaro@kernel.org
      
      
      Signed-off-by: default avatarGeorgi Djakov <djakov@kernel.org>
      eb59eca0
    • Linus Torvalds's avatar
      cpumask: be more careful with 'cpumask_setall()' · 63355b98
      Linus Torvalds authored
      
      
      Commit 596ff4a0 ("cpumask: re-introduce constant-sized cpumask
      optimizations") changed cpumask_setall() to use "bitmap_set()" instead
      of "bitmap_fill()", because bitmap_fill() would explicitly set all the
      bits of a constant sized small bitmap, and that's exactly what we don't
      want: we want to only set bits up to 'nr_cpu_ids', which is what
      "bitmap_set()" does.
      
      However, Yury correctly points out that while "bitmap_set()" does indeed
      only set bits up to the required bitmap size, it doesn't _clear_ bits
      above that size, so the upper bits would still not have well-defined
      values.
      
      Now, none of this should really matter, since any bits set past
      'nr_cpu_ids' should always be ignored in the first place.  Yes, the bit
      scanning functions might return them as a result, but since users should
      always consider the ">= nr_cpu_ids" condition to mean "no more bits",
      that shouldn't have any actual effect (see previous commit 8ca09d5f
      "cpumask: fix incorrect cpumask scanning result checks").
      
      But let's just do it right, the way the code was _intended_ to work.  We
      have had enough lazy code that works but bites us in the *rse later
      (again, see previous commit) that there's no reason to not just do this
      properly.
      
      It turns out that "bitmap_fill()" gets this all right for the complex
      case, and really only fails for the inlined optimized case that just
      fills the whole word.  And while we could just fix bitmap_fill() to use
      the proper last word mask, there's two issues with that:
      
       - the cpumask case wants to do the _optimization_ based on "NR_CPUS is
         a small constant", but then wants to do the actual bit _fill_ based
         on "nr_cpu_ids" that isn't necessarily that same constant
      
       - we have lots of non-cpumask users of bitmap_fill(), and while they
         hopefully don't care, and probably would want the proper semantics
         anyway ("only set bits up to the limit"), I do not want the cpumask
         changes to impact other parts
      
      So this ends up just doing the single-word optimization by hand in the
      cpumask code.  If our cpumask is fundamentally limited to a single word,
      just do the proper "fill in that word" exactly.  And if it's the more
      complex multi-word case, then the generic bitmap_fill() will DTRT.
      
      This is all an example of how our bitmap function optimizations really
      are somewhat broken.  They conflate the "this is size of the bitmap"
      optimizations with the actual bit(s) we want to set.
      
      In many cases we really want to have the two be separate things:
      sometimes we base our optimizations on the size of the whole bitmap ("I
      know this whole bitmap fits in a single word, so I'll just use
      single-word accesses"), and sometimes we base them on the bit we are
      looking at ("this is just acting on bits that are in the first word, so
      I'll use single-word accesses").
      
      Notice how the end result of the two optimizations are the same, but the
      way we get to them are quite different.
      
      And all our cpumask optimization games are really about that fundamental
      distinction, and we'd often really want to pass in both the "this is the
      bit I'm working on" (which _can_ be a small constant but might be
      variable), and "I know it's in this range even if it's variable" (based
      on CONFIG_NR_CPUS).
      
      So this cpumask_setall() implementation just makes that explicit.  It
      checks the "I statically know the size is small" using the known static
      size of the cpumask (which is what that 'small_cpumask_bits' is all
      about), but then sets the actual bits using the exact number of cpus we
      have (ie 'nr_cpumask_bits')
      
      Of course, in a perfect world, the compiler would have done all the
      range analysis (possibly with help from us just telling it that
      "this value is always in this range"), and would do all of this for us.
      But that is not the world we live in.
      
      While we dream of that perfect world, this does that manual logic to
      make it all work out.  And this was a very long explanation for a small
      code change that shouldn't even matter.
      
      Reported-by: default avatarYury Norov <yury.norov@gmail.com>
      Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63355b98
    • Hans de Goede's avatar
      ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helper · 5adc4093
      Hans de Goede authored
      
      
      x86 ACPI boards which ship with only Android as their factory image usually
      have pretty broken ACPI tables, relying on everything being hardcoded in
      the factory kernel image and often disabling parts of the ACPI enumeration
      kernel code to avoid the broken tables causing issues.
      
      Part of this broken ACPI code is that sometimes these boards have _AEI
      ACPI GPIO event handlers which are broken.
      
      So far this has been dealt with in the platform/x86/x86-android-tablets.c
      module, which contains various workarounds for these devices, by it calling
      acpi_gpiochip_free_interrupts() on gpiochip-s with troublesome handlers to
      disable the handlers.
      
      But in some cases this is too late, if the handlers are of the edge type
      then gpiolib-acpi.c's code will already have run them at boot.
      This can cause issues such as GPIOs ending up as owned by "ACPI:OpRegion",
      making them unavailable for drivers which actually need them.
      
      Boards with these broken ACPI tables are already listed in
      drivers/acpi/x86/utils.c for e.g. acpi_quirk_skip_i2c_client_enumeration().
      Extend the quirks mechanism for a new acpi_quirk_skip_gpio_event_handlers()
      helper, this re-uses the DMI-ids rather then having to duplicate the same
      DMI table in gpiolib-acpi.c .
      
      Also add the new ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS quirk to existing
      boards with troublesome ACPI gpio event handlers, so that the current
      acpi_gpiochip_free_interrupts() hack can be removed from
      x86-android-tablets.c .
      
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Acked-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@rjwysocki.net>
      5adc4093
    • Al Viro's avatar
      new helper: put_and_unmap_page() · 849ad04c
      Al Viro authored
      
      
      kunmap_local() + put_page(), as done by e.g. ext2 directory handling.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      849ad04c
  11. Mar 06, 2023
  12. Mar 05, 2023
    • Linus Torvalds's avatar
      cpumask: re-introduce constant-sized cpumask optimizations · 596ff4a0
      Linus Torvalds authored
      
      
      Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
      in the cpumask operations potentially becoming hugely less efficient,
      because suddenly the cpumask was always considered to be variable-sized.
      
      The optimization was then later added back in a limited form by commit
      6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
      FORCE_NR_CPUS option is not useful in a generic kernel and more of a
      special case for embedded situations with fixed hardware.
      
      Instead, just re-introduce the optimization, with some changes.
      
      Instead of depending on CPUMASK_OFFSTACK being false, and then always
      using the full constant cpumask width, this introduces three different
      cpumask "sizes":
      
       - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
      
         This is used for situations where we should use the exact size.
      
       - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
         fits in a single word and the bitmap operations thus end up able
         to trigger the "small_const_nbits()" optimizations.
      
         This is used for the operations that have optimized single-word
         cases that get inlined, notably the bit find and scanning functions.
      
       - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
         is an sufficiently small constant that makes simple "copy" and
         "clear" operations more efficient.
      
         This is arbitrarily set at four words or less.
      
      As a an example of this situation, without this fixed size optimization,
      cpumask_clear() will generate code like
      
              movl    nr_cpu_ids(%rip), %edx
              addq    $63, %rdx
              shrq    $3, %rdx
              andl    $-8, %edx
              callq   memset@PLT
      
      on x86-64, because it would calculate the "exact" number of longwords
      that need to be cleared.
      
      In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
      reasonable value to use), the above becomes a single
      
      	movq $0,cpumask
      
      instruction instead, because instead of caring to figure out exactly how
      many CPU's the system has, it just knows that the cpumask will be a
      single word and can just clear it all.
      
      Note that this does end up tightening the rules a bit from the original
      version in another way: operations that set bits in the cpumask are now
      limited to the actual nr_cpu_ids limit, whereas we used to do the
      nr_cpumask_bits thing almost everywhere in the cpumask code.
      
      But if you just clear bits, or scan for bits, we can use the simpler
      compile-time constants.
      
      In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
      which were not useful, and which fundamentally have to be limited to
      'nr_cpu_ids'.  Better remove them now than have somebody introduce use
      of them later.
      
      Of course, on x86-64 with MAXSMP there is no sane small compile-time
      constant for the cpumask sizes, and we end up using the actual CPU bits,
      and will generate the above kind of horrors regardless.  Please don't
      use MAXSMP unless you really expect to have machines with thousands of
      cores.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      596ff4a0
    • Masahiro Yamada's avatar
      Remove Intel compiler support · 95207db8
      Masahiro Yamada authored
      include/linux/compiler-intel.h had no update in the past 3 years.
      
      We often forget about the third C compiler to build the kernel.
      
      For example, commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO")
      only mentioned GCC and Clang.
      
      init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
      and nobody has reported any issue.
      
      I guess the Intel Compiler support is broken, and nobody is caring
      about it.
      
      Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
      deprecated:
      
          $ icc -v
          icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
          deprecated and will be removed from product release in the second half
          of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
          compiler moving forward. Please transition to use this compiler. Use
          '-diag-disable=10441' to disable this message.
          icc version 2021.7.0 (gcc version 12.1.0 compatibility)
      
      Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
      complete adoption of LLVM".
      
      lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
      untouched for better sync with https://github.com/facebook/zstd
      
      Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95207db8
  13. Mar 02, 2023
  14. Mar 01, 2023
    • Linus Torvalds's avatar
      capability: just use a 'u64' instead of a 'u32[2]' array · f122a08b
      Linus Torvalds authored
      
      
      Back in 2008 we extended the capability bits from 32 to 64, and we did
      it by extending the single 32-bit capability word from one word to an
      array of two words.  It was then obfuscated by hiding the "2" behind two
      macro expansions, with the reasoning being that maybe it gets extended
      further some day.
      
      That reasoning may have been valid at the time, but the last thing we
      want to do is to extend the capability set any more.  And the array of
      values not only causes source code oddities (with loops to deal with
      it), but also results in worse code generation.  It's a lose-lose
      situation.
      
      So just change the 'u32[2]' into a 'u64' and be done with it.
      
      We still have to deal with the fact that the user space interface is
      designed around an array of these 32-bit values, but that was the case
      before too, since the array layouts were different (ie user space
      doesn't use an array of 32-bit values for individual capability masks,
      but an array of 32-bit slices of multiple masks).
      
      So that marshalling of data is actually simplified too, even if it does
      remain somewhat obscure and odd.
      
      This was all triggered by my reaction to the new "cap_isidentical()"
      introduced recently.  By just using a saner data structure, it went from
      
      	unsigned __capi;
      	CAP_FOR_EACH_U32(__capi) {
      		if (a.cap[__capi] != b.cap[__capi])
      			return false;
      	}
      	return true;
      
      to just being
      
      	return a.val == b.val;
      
      instead.  Which is rather more obvious both to humans and to compilers.
      
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f122a08b
  15. Feb 28, 2023
  16. Feb 27, 2023
  17. Feb 26, 2023
    • Vladimir Oltean's avatar
      net: dsa: seville: ignore mscc-miim read errors from Lynx PCS · 0322ef49
      Vladimir Oltean authored
      
      
      During the refactoring in the commit below, vsc9953_mdio_read() was
      replaced with mscc_miim_read(), which has one extra step: it checks for
      the MSCC_MIIM_DATA_ERROR bits before returning the result.
      
      On T1040RDB, there are 8 QSGMII PCSes belonging to the switch, and they
      are organized in 2 groups. First group responds to MDIO addresses 4-7
      because QSGMIIACR1[MDEV_PORT] is 1, and the second group responds to
      MDIO addresses 8-11 because QSGMIIBCR1[MDEV_PORT] is 2. I have double
      checked that these values are correctly set in the SERDES, as well as
      PCCR1[QSGMA_CFG] and PCCR1[QSGMB_CFG] are both 0b01.
      
      mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 4 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 4 reg 0x5 MIIM_DATA 0x3da01, ERROR
      mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR
      
      As can be seen, the data in MIIM_DATA is still valid despite having the
      MSCC_MIIM_DATA_ERROR bits set. The driver as introduced in commit
      84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953
      switch") was ignoring these bits, perhaps deliberately (although
      unbeknownst to me).
      
      This is an old IP and the hardware team cannot seem to be able to help
      me track down a plausible reason for these failures. I'll keep
      investigating, but in the meantime, this is a direct regression which
      must be restored to a working state.
      
      The only thing I can do is keep ignoring the errors as before.
      
      Fixes: b9965845 ("net: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0322ef49
  18. Feb 25, 2023
  19. Feb 24, 2023
Loading