Skip to content
Snippets Groups Projects
  1. Oct 14, 2022
    • Nathan Chancellor's avatar
      drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n · 2130b87b
      Nathan Chancellor authored
      
      After commit 8799c0be ("drm/amd/display: Fix vblank refcount in vrr
      transition"), a build with CONFIG_DEBUG_FS=n is broken due to a
      misplaced brace, along the lines of:
      
        In file included from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_trace.h:39,
                         from drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:41:
        drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: At top level:
        ./include/drm/drm_atomic.h:864:9: error: expected identifier or ‘(’ before ‘for’
          864 |         for ((__i) = 0;                                                 \
              |         ^~~
        drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8317:9: note: in expansion of macro ‘for_each_new_crtc_in_state’
         8317 |         for_each_new_crtc_in_state(state, crtc, new_crtc_state, j)
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Move the brace within the #ifdef so that the file can be built with or
      without CONFIG_DEBUG_FS.
      
      Fixes: 8799c0be ("drm/amd/display: Fix vblank refcount in vrr transition")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2130b87b
  2. Oct 13, 2022
    • Alistair Popple's avatar
      nouveau/dmem: evict device private memory during release · 24988123
      Alistair Popple authored
      When the module is unloaded or a GPU is unbound from the module it is
      possible for device private pages to still be mapped in currently running
      processes.  This can lead to a hangs and RCU stall warnings when unbinding
      the device as memunmap_pages() will wait in an uninterruptible state until
      all device pages have been freed which may never happen.
      
      Fix this by migrating device mappings back to normal CPU memory prior to
      freeing the GPU memory chunks and associated device private pages.
      
      Link: https://lkml.kernel.org/r/66277601fb8fda9af408b33da9887192bf895bda.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      24988123
    • Alistair Popple's avatar
      nouveau/dmem: refactor nouveau_dmem_fault_copy_one() · d9b71939
      Alistair Popple authored
      nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
      the migrate_to_ram() callback and is used to copy data from GPU to CPU
      memory.  It is currently specific to fault handling, however a future
      patch implementing eviction of data during teardown needs similar
      functionality.
      
      Refactor out the core functionality so that it is not specific to fault
      handling.
      
      Link: https://lkml.kernel.org/r/20573d7b4e641a78fde9935f948e64e71c9e709e.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d9b71939
    • Alistair Popple's avatar
      mm: free device private pages have zero refcount · ef233450
      Alistair Popple authored
      Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") device private pages have no longer had an extra reference
      count when the page is in use.  However before handing them back to the
      owning device driver we add an extra reference count such that free pages
      have a reference count of one.
      
      This makes it difficult to tell if a page is free or not because both free
      and in use pages will have a non-zero refcount.  Instead we should return
      pages to the drivers page allocator with a zero reference count.  Kernel
      code can then safely use kernel functions such as get_page_unless_zero().
      
      Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef233450
    • Alistair Popple's avatar
      mm/memory.c: fix race when faulting a device private page · 16ce101d
      Alistair Popple authored
      Patch series "Fix several device private page reference counting issues",
      v2
      
      This series aims to fix a number of page reference counting issues in
      drivers dealing with device private ZONE_DEVICE pages.  These result in
      use-after-free type bugs, either from accessing a struct page which no
      longer exists because it has been removed or accessing fields within the
      struct page which are no longer valid because the page has been freed.
      
      During normal usage it is unlikely these will cause any problems.  However
      without these fixes it is possible to crash the kernel from userspace. 
      These crashes can be triggered either by unloading the kernel module or
      unbinding the device from the driver prior to a userspace task exiting. 
      In modules such as Nouveau it is also possible to trigger some of these
      issues by explicitly closing the device file-descriptor prior to the task
      exiting and then accessing device private memory.
      
      This involves some minor changes to both PowerPC and AMD GPU code. 
      Unfortunately I lack hardware to test either of those so any help there
      would be appreciated.  The changes mimic what is done in for both Nouveau
      and hmm-tests though so I doubt they will cause problems.
      
      
      This patch (of 8):
      
      When the CPU tries to access a device private page the migrate_to_ram()
      callback associated with the pgmap for the page is called.  However no
      reference is taken on the faulting page.  Therefore a concurrent migration
      of the device private page can free the page and possibly the underlying
      pgmap.  This results in a race which can crash the kernel due to the
      migrate_to_ram() function pointer becoming invalid.  It also means drivers
      can't reliably read the zone_device_data field because the page may have
      been freed with memunmap_pages().
      
      Close the race by getting a reference on the page while holding the ptl to
      ensure it has not been freed.  Unfortunately the elevated reference count
      will cause the migration required to handle the fault to fail.  To avoid
      this failure pass the faulting page into the migrate_vma functions so that
      if an elevated reference count is found it can be checked to see if it's
      expected or not.
      
      [mpe@ellerman.id.au: fix build]
        Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
      Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
      Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16ce101d
  3. Oct 12, 2022
  4. Oct 11, 2022
  5. Oct 10, 2022
Loading