Skip to content
  1. Feb 27, 2020
  2. Feb 14, 2020
    • Randy Dunlap's avatar
      netdevice.h: fix all kernel-doc and Sphinx warnings · a1fa83bd
      Randy Dunlap authored
      
      
      Eliminate all kernel-doc and Sphinx warnings in
      <linux/netdevice.h>.  Fixes these warnings:
      
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'xdp_bulkq' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
      ../include/linux/netdevice.h:2100: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
      ../include/linux/netdevice.h:3552: WARNING: Inline emphasis start-string without end-string.
      ../include/linux/netdevice.h:3552: WARNING: Inline emphasis start-string without end-string.
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1fa83bd
  3. Feb 13, 2020
  4. Feb 12, 2020
  5. Feb 11, 2020
    • Rafael J. Wysocki's avatar
      ACPICA: Introduce acpi_any_gpe_status_set() · ea128834
      Rafael J. Wysocki authored
      
      
      Introduce a new helper function, acpi_any_gpe_status_set(), for
      checking the status bits of all enabled GPEs in one go.
      
      It is needed to distinguish spurious SCIs from genuine ones when
      deciding whether or not to wake up the system from suspend-to-idle.
      
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ea128834
    • Rafael J. Wysocki's avatar
      ACPI: PM: s2idle: Avoid possible race related to the EC GPE · e3728b50
      Rafael J. Wysocki authored
      
      
      It is theoretically possible for the ACPI EC GPE to be set after the
      s2idle_ops->wake() called from s2idle_loop() has returned and before
      the subsequent pm_wakeup_pending() check is carried out.  If that
      happens, the resulting wakeup event will cause the system to resume
      even though it may be a spurious one.
      
      To avoid that race, first make the ->wake() callback in struct
      platform_s2idle_ops return a bool value indicating whether or not
      to let the system resume and rearrange s2idle_loop() to use that
      value instad of the direct pm_wakeup_pending() call if ->wake() is
      present.
      
      Next, rework acpi_s2idle_wake() to process EC events and check
      pm_wakeup_pending() before re-arming the SCI for system wakeup
      to prevent it from triggering prematurely and add comments to
      that function to explain the rationale for the new code flow.
      
      Fixes: 56b99184 ("PM: sleep: Simplify suspend-to-idle control flow")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e3728b50
    • Tom Zanussi's avatar
      tracing: Consolidate trace() functions · 7276531d
      Tom Zanussi authored
      Move the checking, buffer reserve and buffer commit code in
      synth_event_trace_start/end() into inline functions
      __synth_event_trace_start/end() so they can also be used by
      synth_event_trace() and synth_event_trace_array(), and then have all
      those functions use them.
      
      Also, change synth_event_trace_state.enabled to disabled so it only
      needs to be set if the event is disabled, which is not normally the
      case.
      
      Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org
      
      
      
      Signed-off-by: default avatarTom Zanussi <zanussi@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      7276531d
  6. Feb 08, 2020
    • Linus Torvalds's avatar
      pipe: use exclusive waits when reading or writing · 0ddad21d
      Linus Torvalds authored
      
      
      This makes the pipe code use separate wait-queues and exclusive waiting
      for readers and writers, avoiding a nasty thundering herd problem when
      there are lots of readers waiting for data on a pipe (or, less commonly,
      lots of writers waiting for a pipe to have space).
      
      While this isn't a common occurrence in the traditional "use a pipe as a
      data transport" case, where you typically only have a single reader and
      a single writer process, there is one common special case: using a pipe
      as a source of "locking tokens" rather than for data communication.
      
      In particular, the GNU make jobserver code ends up using a pipe as a way
      to limit parallelism, where each job consumes a token by reading a byte
      from the jobserver pipe, and releases the token by writing a byte back
      to the pipe.
      
      This pattern is fairly traditional on Unix, and works very well, but
      will waste a lot of time waking up a lot of processes when only a single
      reader needs to be woken up when a writer releases a new token.
      
      A simplified test-case of just this pipe interaction is to create 64
      processes, and then pass a single token around between them (this
      test-case also intentionally passes another token that gets ignored to
      test the "wake up next" logic too, in case anybody wonders about it):
      
          #include <unistd.h>
      
          int main(int argc, char **argv)
          {
              int fd[2], counters[2];
      
              pipe(fd);
              counters[0] = 0;
              counters[1] = -1;
              write(fd[1], counters, sizeof(counters));
      
              /* 64 processes */
              fork(); fork(); fork(); fork(); fork(); fork();
      
              do {
                      int i;
                      read(fd[0], &i, sizeof(i));
                      if (i < 0)
                              continue;
                      counters[0] = i+1;
                      write(fd[1], counters, (1+(i & 1)) *sizeof(int));
              } while (counters[0] < 1000000);
              return 0;
          }
      
      and in a perfect world, passing that token around should only cause one
      context switch per transfer, when the writer of a token causes a
      directed wakeup of just a single reader.
      
      But with the "writer wakes all readers" model we traditionally had, on
      my test box the above case causes more than an order of magnitude more
      scheduling: instead of the expected ~1M context switches, "perf stat"
      shows
      
              231,852.37 msec task-clock                #   15.857 CPUs utilized
              11,250,961      context-switches          #    0.049 M/sec
                 616,304      cpu-migrations            #    0.003 M/sec
                   1,648      page-faults               #    0.007 K/sec
       1,097,903,998,514      cycles                    #    4.735 GHz
         120,781,778,352      instructions              #    0.11  insn per cycle
          27,997,056,043      branches                  #  120.754 M/sec
             283,581,233      branch-misses             #    1.01% of all branches
      
            14.621273891 seconds time elapsed
      
             0.018243000 seconds user
             3.611468000 seconds sys
      
      before this commit.
      
      After this commit, I get
      
                5,229.55 msec task-clock                #    3.072 CPUs utilized
               1,212,233      context-switches          #    0.232 M/sec
                 103,951      cpu-migrations            #    0.020 M/sec
                   1,328      page-faults               #    0.254 K/sec
          21,307,456,166      cycles                    #    4.074 GHz
          12,947,819,999      instructions              #    0.61  insn per cycle
           2,881,985,678      branches                  #  551.096 M/sec
              64,267,015      branch-misses             #    2.23% of all branches
      
             1.702148350 seconds time elapsed
      
             0.004868000 seconds user
             0.110786000 seconds sys
      
      instead. Much better.
      
      [ Note! This kernel improvement seems to be very good at triggering a
        race condition in the make jobserver (in GNU make 4.2.1) for me. It's
        a long known bug that was fixed back in June 2017 by GNU make commit
        b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to
        avoid hangs.").
      
        But there wasn't a new release of GNU make until 4.3 on Jan 19 2020,
        so a number of distributions may still have the buggy version. Some
        have backported the fix to their 4.2.1 release, though, and even
        without the fix it's quite timing-dependent whether the bug actually
        is hit. ]
      
      Josh Triplett says:
       "I've been hammering on your pipe fix patch (switching to exclusive
        wait queues) for a month or so, on several different systems, and I've
        run into no issues with it. The patch *substantially* improves
        parallel build times on large (~100 CPU) systems, both with parallel
        make and with other things that use make's pipe-based jobserver.
      
        All current distributions (including stable and long-term stable
        distributions) have versions of GNU make that no longer have the
        jobserver bug"
      
      Tested-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ddad21d
    • Zenghui Yu's avatar
      irqchip/gic-v4.1: Set vpe_l1_base for all redistributors · 8b718d40
      Zenghui Yu authored
      
      
      Currently, we will not set vpe_l1_page for the current RD if we can
      inherit the vPE configuration table from another RD (or ITS), which
      results in an inconsistency between RDs within the same CommonLPIAff
      group.
      
      Let's rename it to vpe_l1_base to indicate the base address of the
      vPE configuration table of this RD, and set it properly for *all*
      v4.1 redistributors.
      
      Signed-off-by: default avatarZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200206075711.1275-3-yuzenghui@huawei.com
      8b718d40
  7. Feb 07, 2020
    • Al Viro's avatar
      prefix-handling analogues of errorf() and friends · a3ff937b
      Al Viro authored
      
      
      called errorfc/infofc/warnfc/invalfc
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a3ff937b
    • Al Viro's avatar
      turn fs_param_is_... into functions · 328de528
      Al Viro authored
      
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      328de528
    • Al Viro's avatar
      fs_parse: handle optional arguments sanely · 48ce73b1
      Al Viro authored
      
      
      Don't bother with "mixed" options that would allow both the
      form with and without argument (i.e. both -o foo and -o foo=bar).
      Rather than trying to shove both into a single fs_parameter_spec,
      allow having with-argument and no-argument specs with the same
      name and teach fs_parse to handle that.
      
      There are very few options of that sort, and they are actually
      easier to handle that way - callers end up with less postprocessing.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      48ce73b1
    • Al Viro's avatar
      fs_parse: fold fs_parameter_desc/fs_parameter_spec · d7167b14
      Al Viro authored
      
      
      The former contains nothing but a pointer to an array of the latter...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d7167b14
    • Eric Sandeen's avatar
      96cafb9c
    • Al Viro's avatar
      add prefix to fs_context->log · cc3c0b53
      Al Viro authored
      
      
      ... turning it into struct p_log embedded into fs_context.  Initialize
      the prefix with fs_type->name, turning fs_parse() into a trivial
      inline wrapper for __fs_parse().
      
      This makes fs_parameter_description->name completely unused.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cc3c0b53
    • Al Viro's avatar
      ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log · c80c98f0
      Al Viro authored
      
      
      ... and now errorf() et.al. are never called with NULL fs_context,
      so we can get rid of conditional in those.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c80c98f0
    • Al Viro's avatar
      new primitive: __fs_parse() · 7f5d3814
      Al Viro authored
      
      
      fs_parse() analogue taking p_log instead of fs_context.
      fs_parse() turned into a wrapper, callers in ceph_common and rbd
      switched to __fs_parse().
      
      As the result, fs_parse() never gets NULL fs_context and neither
      do fs_context-based logging primitives
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7f5d3814
    • Al Viro's avatar
      struct p_log, variants of warnf() et.al. taking that one instead · 3fbb8d55
      Al Viro authored
      
      
      primitives for prefixed logging
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3fbb8d55
    • Al Viro's avatar
    • Al Viro's avatar
      get rid of fs_value_is_filename_empty · aa1918f9
      Al Viro authored
      
      
      Its behaviour is identical to that of fs_value_is_filename.
      It makes no sense, anyway - LOOKUP_EMPTY affects nothing
      whatsoever once the pathname has been imported from userland.
      And both fs_value_is_filename and fs_value_is_filename_empty
      carry an already imported pathname.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aa1918f9
    • Al Viro's avatar
      don't bother with explicit length argument for __lookup_constant() · 34264ae3
      Al Viro authored
      
      
      Have the arrays of constant_table self-terminated (by NULL ->name
      in the final entry).  Simplifies lookup_constant() and allows to
      reuse the search for enum params as well.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      34264ae3
    • Johannes Berg's avatar
      mac80211: use more bits for ack_frame_id · f2b18bac
      Johannes Berg authored
      
      
      It turns out that this wasn't a good idea, I hit a test failure in
      hwsim due to this. That particular failure was easily worked around,
      but it raised questions: if an AP needs to, for example, send action
      frames to each connected station, the current limit is nowhere near
      enough (especially if those stations are sleeping and the frames are
      queued for a while.)
      
      Shuffle around some bits to make more room for ack_frame_id to allow
      up to 8192 queued up frames, that's enough for queueing 4 frames to
      each connected station, even at the maximum of 2007 stations on a
      single AP.
      
      We take the bits from band (which currently only 2 but I leave 3 in
      case we add another band) and from the hw_queue, which can only need
      4 since it has a limit of 16 queues.
      
      Fixes: 6912daed ("mac80211: Shrink the size of ack_frame_id to make room for tx_time_est")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20200115122549.b9a4ef9f4980.Ied52ed90150220b83a280009c590b65d125d087c@changeid
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      f2b18bac
    • Damien Le Moal's avatar
      fs: New zonefs file system · 8dcc1a9d
      Damien Le Moal authored
      
      
      zonefs is a very simple file system exposing each zone of a zoned block
      device as a file. Unlike a regular file system with zoned block device
      support (e.g. f2fs), zonefs does not hide the sequential write
      constraint of zoned block devices to the user. Files representing
      sequential write zones of the device must be written sequentially
      starting from the end of the file (append only writes).
      
      As such, zonefs is in essence closer to a raw block device access
      interface than to a full featured POSIX file system. The goal of zonefs
      is to simplify the implementation of zoned block device support in
      applications by replacing raw block device file accesses with a richer
      file API, avoiding relying on direct block device file ioctls which may
      be more obscure to developers. One example of this approach is the
      implementation of LSM (log-structured merge) tree structures (such as
      used in RocksDB and LevelDB) on zoned block devices by allowing SSTables
      to be stored in a zone file similarly to a regular file system rather
      than as a range of sectors of a zoned device. The introduction of the
      higher level construct "one file is one zone" can help reducing the
      amount of changes needed in the application as well as introducing
      support for different application programming languages.
      
      Zonefs on-disk metadata is reduced to an immutable super block to
      persistently store a magic number and optional feature flags and
      values. On mount, zonefs uses blkdev_report_zones() to obtain the device
      zone configuration and populates the mount point with a static file tree
      solely based on this information. E.g. file sizes come from the device
      zone type and write pointer offset managed by the device itself.
      
      The zone files created on mount have the following characteristics.
      1) Files representing zones of the same type are grouped together
         under a common sub-directory:
           * For conventional zones, the sub-directory "cnv" is used.
           * For sequential write zones, the sub-directory "seq" is used.
        These two directories are the only directories that exist in zonefs.
        Users cannot create other directories and cannot rename nor delete
        the "cnv" and "seq" sub-directories.
      2) The name of zone files is the number of the file within the zone
         type sub-directory, in order of increasing zone start sector.
      3) The size of conventional zone files is fixed to the device zone size.
         Conventional zone files cannot be truncated.
      4) The size of sequential zone files represent the file's zone write
         pointer position relative to the zone start sector. Truncating these
         files is allowed only down to 0, in which case, the zone is reset to
         rewind the zone write pointer position to the start of the zone, or
         up to the zone size, in which case the file's zone is transitioned
         to the FULL state (finish zone operation).
      5) All read and write operations to files are not allowed beyond the
         file zone size. Any access exceeding the zone size is failed with
         the -EFBIG error.
      6) Creating, deleting, renaming or modifying any attribute of files and
         sub-directories is not allowed.
      7) There are no restrictions on the type of read and write operations
         that can be issued to conventional zone files. Buffered, direct and
         mmap read & write operations are accepted. For sequential zone files,
         there are no restrictions on read operations, but all write
         operations must be direct IO append writes. mmap write of sequential
         files is not allowed.
      
      Several optional features of zonefs can be enabled at format time.
      * Conventional zone aggregation: ranges of contiguous conventional
        zones can be aggregated into a single larger file instead of the
        default one file per zone.
      * File ownership: The owner UID and GID of zone files is by default 0
        (root) but can be changed to any valid UID/GID.
      * File access permissions: the default 640 access permissions can be
        changed.
      
      The mkzonefs tool is used to format zoned block devices for use with
      zonefs. This tool is available on Github at:
      
      git@github.com:damien-lemoal/zonefs-tools.git.
      
      zonefs-tools also includes a test suite which can be run against any
      zoned block device, including null_blk block device created with zoned
      mode.
      
      Example: the following formats a 15TB host-managed SMR HDD with 256 MB
      zones with the conventional zones aggregation feature enabled.
      
      $ sudo mkzonefs -o aggr_cnv /dev/sdX
      $ sudo mount -t zonefs /dev/sdX /mnt
      $ ls -l /mnt/
      total 0
      dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
      dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
      
      The size of the zone files sub-directories indicate the number of files
      existing for each type of zones. In this example, there is only one
      conventional zone file (all conventional zones are aggregated under a
      single file).
      
      $ ls -l /mnt/cnv
      total 137101312
      -rw-r----- 1 root root 140391743488 Nov 25 13:23 0
      
      This aggregated conventional zone file can be used as a regular file.
      
      $ sudo mkfs.ext4 /mnt/cnv/0
      $ sudo mount -o loop /mnt/cnv/0 /data
      
      The "seq" sub-directory grouping files for sequential write zones has
      in this example 55356 zones.
      
      $ ls -lv /mnt/seq
      total 14511243264
      -rw-r----- 1 root root 0 Nov 25 13:23 0
      -rw-r----- 1 root root 0 Nov 25 13:23 1
      -rw-r----- 1 root root 0 Nov 25 13:23 2
      ...
      -rw-r----- 1 root root 0 Nov 25 13:23 55354
      -rw-r----- 1 root root 0 Nov 25 13:23 55355
      
      For sequential write zone files, the file size changes as data is
      appended at the end of the file, similarly to any regular file system.
      
      $ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct
      1+0 records in
      1+0 records out
      4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s
      
      $ ls -l /mnt/seq/0
      -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
      
      The written file can be truncated to the zone size, preventing any
      further write operation.
      
      $ truncate -s 268435456 /mnt/seq/0
      $ ls -l /mnt/seq/0
      -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
      
      Truncation to 0 size allows freeing the file zone storage space and
      restart append-writes to the file.
      
      $ truncate -s 0 /mnt/seq/0
      $ ls -l /mnt/seq/0
      -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
      
      Since files are statically mapped to zones on the disk, the number of
      blocks of a file as reported by stat() and fstat() indicates the size
      of the file zone.
      
      $ stat /mnt/seq/0
        File: /mnt/seq/0
        Size: 0       Blocks: 524288     IO Block: 4096   regular empty file
      Device: 870h/2160d      Inode: 50431       Links: 1
      Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/  root)
      Access: 2019-11-25 13:23:57.048971997 +0900
      Modify: 2019-11-25 13:52:25.553805765 +0900
      Change: 2019-11-25 13:52:25.553805765 +0900
       Birth: -
      
      The number of blocks of the file ("Blocks") in units of 512B blocks
      gives the maximum file size of 524288 * 512 B = 256 MB, corresponding
      to the device zone size in this example. Of note is that the "IO block"
      field always indicates the minimum IO size for writes and corresponds
      to the device physical sector size.
      
      This code contains contributions from:
      * Johannes Thumshirn <jthumshirn@suse.de>,
      * Darrick J. Wong <darrick.wong@oracle.com>,
      * Christoph Hellwig <hch@lst.de>,
      * Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and
      * Ting Yao <tingyao@hust.edu.cn>.
      
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      8dcc1a9d
    • Al Viro's avatar
      fold struct fs_parameter_enum into struct constant_table · 5eede625
      Al Viro authored
      
      
      no real difference now
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5eede625
    • Al Viro's avatar
      fs_parse: get rid of ->enums · 2710c957
      Al Viro authored
      
      
      Don't do a single array; attach them to fsparam_enum() entry
      instead.  And don't bother trying to embed the names into those -
      it actually loses memory, with no real speedup worth mentioning.
      
      Simplifies validation as well.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2710c957
    • Al Viro's avatar
      Pass consistent param->type to fs_parse() · 0f89589a
      Al Viro authored
      
      
      As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable;
      both get fs_value_is_string for ->type and NULL for ->string.  To make
      it even more unpleasant, that combination is impossible to produce with
      fsconfig().
      
      Much saner rules would be
              "foo"           => fs_value_is_flag, NULL
      	"foo="          => fs_value_is_string, ""
      	"foo=bar"       => fs_value_is_string, "bar"
      All cases are distinguishable, all results are expressable by fsconfig(),
      ->has_value checks are much simpler that way (to the point of the field
      being useless) and quite a few regressions go away (gfs2 has no business
      accepting -o nodebug=, for example).
      
      Partially based upon patches from Miklos.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0f89589a
  8. Feb 06, 2020
    • Tariq Toukan's avatar
      net/mlx5: Deprecate usage of generic TLS HW capability bit · 61c00cca
      Tariq Toukan authored
      
      
      Deprecate the generic TLS cap bit, use the new TX-specific
      TLS cap bit instead.
      
      Fixes: a12ff35e ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      61c00cca
    • Qian Cai's avatar
      skbuff: fix a data race in skb_queue_len() · 86b18aaa
      Qian Cai authored
      
      
      sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
      
       read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
        unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
      				 net/unix/af_unix.c:1761
        ____sys_sendmsg+0x33e/0x370
        ___sys_sendmsg+0xa6/0xf0
        __sys_sendmsg+0x69/0xf0
        __x64_sys_sendmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
        __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
        __skb_try_recv_datagram+0xbe/0x220
        unix_dgram_recvmsg+0xee/0x850
        ____sys_recvmsg+0x1fb/0x210
        ___sys_recvmsg+0xa2/0xf0
        __sys_recvmsg+0x66/0xf0
        __x64_sys_recvmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Since only the read is operating as lockless, it could introduce a logic
      bug in unix_recvq_full() due to the load tearing. Fix it by adding
      a lockless variant of skb_queue_len() and unix_recvq_full() where
      READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
      the commit d7d16a89 ("net: add skb_queue_empty_lockless()").
      
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86b18aaa
  9. Feb 05, 2020
    • Geert Uytterhoeven's avatar
      of: clk: Make <linux/of_clk.h> self-contained · 5df86714
      Geert Uytterhoeven authored
      
      
      Depending on include order:
      
          include/linux/of_clk.h:11:45: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
           unsigned int of_clk_get_parent_count(struct device_node *np);
      						 ^~~~~~~~~~~
          include/linux/of_clk.h:12:43: warning: ‘struct device_node’ declared inside parameter list will not be visible outside of this definition or declaration
           const char *of_clk_get_parent_name(struct device_node *np, int index);
      					       ^~~~~~~~~~~
          include/linux/of_clk.h:13:31: warning: ‘struct of_device_id’ declared inside parameter list will not be visible outside of this definition or declaration
           void of_clk_init(const struct of_device_id *matches);
      				   ^~~~~~~~~~~~
      
      Fix this by adding forward declarations for struct device_node and
      struct of_device_id.
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Link: https://lkml.kernel.org/r/20200205194649.31309-1-geert+renesas@glider.be
      
      
      Signed-off-by: default avatarStephen Boyd <sboyd@kernel.org>
      5df86714
    • Eric Dumazet's avatar
      bonding/alb: properly access headers in bond_alb_xmit() · 38f88c45
      Eric Dumazet authored
      
      
      syzbot managed to send an IPX packet through bond_alb_xmit()
      and af_packet and triggered a use-after-free.
      
      First, bond_alb_xmit() was using ipx_hdr() helper to reach
      the IPX header, but ipx_hdr() was using the transport offset
      instead of the network offset. In the particular syzbot
      report transport offset was 0xFFFF
      
      This patch removes ipx_hdr() since it was only (mis)used from bonding.
      
      Then we need to make sure IPv4/IPv6/IPX headers are pulled
      in skb->head before dereferencing anything.
      
      BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
      Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
       (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)
      
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
       [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
       [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
       [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
       [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
       [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
       [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
       [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
       [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
       [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
       [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
       [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
       [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
       [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
       [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
       [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
       [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
       [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
       [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
       [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
       [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38f88c45
    • Andy Shevchenko's avatar
      8b7a07c7
    • Andy Shevchenko's avatar
      net: dsa: b53: Platform data shan't include kernel.h · e22e0790
      Andy Shevchenko authored
      
      
      Replace with appropriate types.h.
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22e0790
  10. Feb 04, 2020
Loading