- Dec 11, 2018
-
-
Jonathan Marek authored
derived from the a3xx driver and tested on the following hardware: imx51-zii-rdu1 (a200 with 128kb gmem) imx53-qsrb (a200) msm8060-tenderloin (a220) Signed-off-by:
Jonathan Marek <jonathan@marek.ca> Reviewed-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sharat Masetty authored
When the userspace tries to read the crashstate dump, the read side implementation in the driver currently ascii85 encodes all the binary buffers and it does this each time the read system call is called. A userspace tool like cat typically does a page by page read and the number of read calls depends on the size of the data captured by the driver. This is certainly not desirable and does not scale well with large captures. This patch encodes the buffer only once in the read path. With this there is an immediate >10X speed improvement in crashstate save time. Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Reviewed-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
For reasons that I'm sure made perfect sense at the time we were opting to defer the iova alloc / pin on the ringbuffer until HW init time so when we moved to iova reference counting we ended up adding a reference count every time the hardware started. Not that it mattered (because the ring is always around) but it did make the debug output look odd. Allocate and pin the iova at create time instead. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
For debugging purposes it is useful to assign descriptions to buffers so that we know what they are used for. Add a field to the buffer object and use that to name the various kernel side allocations which ends up looking like like this in /d/dri/X/gem: flags id ref offset kaddr size madv name 00040000: I 0 ( 1) 00000000 0000000070b79eca 00004096 memptrs vmas: [gpu: 01000000,mapped,inuse=1] 00020000: I 0 ( 1) 00000000 0000000031ed4074 00032768 ring0 Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add a reference count to track how many times a particular chunk of iova memory is pinned (mapped) in the iomu and add msm_gem_unpin_iova to give up references. It is important to note that msm_gem_unpin_iova replaces msm_gem_put_iova because the new implicit behavior that an assigned iova in a given vma is now valid for the life of the buffer and what we are really focusing on is the use of that iova. For now the unmappings are lazy; once the reference counts go to zero they *COULD* be unmapped dynamically but that will require an outside force such as a shrinker or mm_notifiers. For now, we're just focusing on getting the counting right and setting ourselves up to be ready for the future. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add a new function to get and pin the iova memory in one step (basically renaming the old msm_gem_get_iova function) and switch msm_gem_get_iova() to only allocate an iova but not map it in the IOMMU. This is only currently used by msm_ioctl_gem_info() since all other users of of the iova expect that the memory be immediately available. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Buffer objects allocated with msm_gem_kernel_new() are mostly freed the same way so we can save a few lines of code with a common function. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The a6xx GPU state allocates a LOT of memory. Add a bit of infrastructure to track the memory allocations in the GPU structure and delete them when the state is destroyed much the same way that devm works with the device model as a whole. This protects against the developer accidentally forgetting to add a kfree() to an ever growing list. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add support for gathering and dumping the a6xx GPU state including registers, GMU registers, indexed registers, shader blocks, context clusters and debugbus. v2: Fix bugs discovered by Sharat Masetty Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
If the GPU target doesn't define a list of registers then gracefully skip capturing and/or printing them. This is used by more complex targets like 6xx that have other means of capturing register values. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The gpu_poll_timeout() function can be useful to multiple targets so mvoe it into adreno_gpu.h from the a5xx code. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add trace events to track the progress of a GPU submission msm_gpu_submit occurs at the beginning of the submissions, msm_gpu_submit_flush happens when the submission is put on the ringbuffer and msm_submit_flush_retired is sent when the operation is retired. To make it easier to track the operations a unique sequence number is assigned to each submission and displayed in each event output so a human or a script can easily associate the events related to a specific submission. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add infrastructure to track statistics for GPU submissions by sampling certain perfcounters before and after a submission. To store the statistics, the per-ring memptrs region is expanded to include room for up to 64 entries - this should cover a reasonable amount of inflight submissions without worrying about losing data. The target specific code inserts PM4 commands to sample the counters before and after submission and store them in the data region. The CPU can access the data after the submission retires to make sense of the statistics and communicate them to the user. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Mamta Shukla authored
Use DRM_DEV_INFO/ERROR/WARN instead of dev_info/err/debug to generate drm-formatted specific log messages so that it will be easy to differentiate in case of multiple instances of driver. Signed-off-by:
Mamta Shukla <mamtashukla555@gmail.com> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Oct 23, 2018
-
-
Johan Hovold authored
Use the new of_get_compatible_child() helper to lookup the legacy pwrlevels child node instead of using of_find_compatible_node(), which searches the entire tree from a given start node and thus can return an unrelated (i.e. non-child) node. This also addresses a potential use-after-free (e.g. after probe deferral) as the tree-wide helper drops a reference to its first argument (i.e. the probed device's node). While at it, also fix the related child-node reference leak. Fixes: e2af8b6b ("drm/msm: gpu: Use OPP tables if we can") Cc: stable <stable@vger.kernel.org> # 4.12 Cc: Jordan Crouse <jcrouse@codeaurora.org> Cc: Rob Clark <robdclark@gmail.com> Cc: David Airlie <airlied@linux.ie> Signed-off-by:
Johan Hovold <johan@kernel.org> Signed-off-by:
Rob Herring <robh@kernel.org>
-
Mamta Shukla authored
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR because its better to have inlined function rather than code-opened implementation. Signed-off-by:
Mamta Shukla <mamtashukla555@gmail.com> Reviewed-by:
Sean Paul <sean@poorly.run> Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181018204815.GA23390@armorer
-
- Oct 09, 2018
-
-
Sean Paul authored
This patch uses the proper do_div() macro to perform u64 division and guards against overflow if the result is too large for the unsigned long return type Fixes: a2c3c0a5 drm/msm/a6xx: Add devfreq support for a6xx Cc: Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Sean Paul <seanpaul@chromium.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sean Paul authored
A small fixup I posted with my v2 patch [1] that was dropped. [1]- https://lists.freedesktop.org/archives/freedreno/2018-October/003647.html Signed-off-by:
Sean Paul <seanpaul@chromium.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Oct 07, 2018
-
-
Jordan Crouse authored
The CP performance counter selects were accidentally marked as protected so they couldn't be written from PM4 streams. Remove the protection because user space does have an interest in setting up their own counters. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sean Paul authored
This patch uses the proper do_div() macro to perform u64 division and guards against overflow if the result is too large for the unsigned long return type Fixes: de0a3d09 drm/msm: re-factor devfreq code Cc: Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Sean Paul <seanpaul@chromium.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Oct 04, 2018
-
-
Sharat Masetty authored
Implement routines to estimate GPU busy time and fetching the current frequency for the polling interval. This is required by the devfreq framework which recommends a frequency change if needed. The driver code then tries to set this new frequency on the GPU by sending an Out Of Band(OOB) request to the GMU. Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sharat Masetty authored
The devfreq framework requires the drivers to provide busy time estimations. The GPU driver relies on the hardware performance counteres for the busy time estimations, but different hardware revisions have counters which can be sourced from different clocks. So the busy time estimation will be target dependent. Additionally on targets where the clocks are completely controlled by the on chip microcontroller, fetching and setting the current GPU frequency will be different. This patch aims to embrace these differences by re-factoring the devfreq code a bit. Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sharat Masetty authored
Add a simple function to read 64 registers in the GMU domain Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The target definition for a630 didn't set a reasonable value for inactive_period so it defaulted to zero and we were essentially powering down after every submission. Set it back to the default value to keep the GPU from bouncing too much during regular workloads. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Thomas Zimmermann authored
This patch unifies the naming of DRM functions for reference counting of struct drm_gem_object. The resulting code is more aligned with the rest of the Linux kernel interfaces. Signed-off-by:
Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The only HFI communication with the GMU on sdm845 happens during initialization and all commands are synchronous. A fancy interrupt tasklet and associated infrastructure is entirely not eeded and puts us at the mercy of the scheduler. Instead poll for the message signal and handle the response immediately and go on our way. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The point of the 'force_dma' parameter for of_dma_configure is to force the device to be set up even if DMA capability is not described by the firmware which is exactly the use case we have for GMU - we need SMMU to get set up but we have no other dma capabilities since memory is managed by the GPU driver. Currently we pass false so of_dma_configure() fails and subsequently GMU and GPU probe does as well. Fixes: 4b565ca5 ("drm/msm: Add A6XX device support") Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Tested-by:
Sibi Sankar <sibis@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sharat Masetty authored
In the case where preemption is not enabled, this patch simply skips preemption related initialization in hardware init sequence. Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Reviewed-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
The current design greedily takes a big chunk of the PDC register space instead of just the GPU specific sections which conflicts with other drivers and generally makes a mess of things. Furthermore we only need to map the GPU PDC sections just once during init so map the memory inside the function that uses it and adjust the pointers and register offsets accordingly. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
>From the review for the DT bindings for the GPU/GMU it was suggested that the phandle for the GMU be 'qcom,gmu' instead of just 'gmu'. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Sharat Masetty authored
The index of the perf table was being set in the wrong bit position in the register. With this fix, the GPU clock can be seen running at desired frequency. Signed-off-by:
Sharat Masetty <smasetty@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Aug 10, 2018
-
-
Colin Ian King authored
Trivial fix to spelling mistake in dev_err message and comment Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Add support for the A6XX family of Adreno GPUs. The biggest addition is the GMU (Graphics Management Unit) which takes over most of the power management of the GPU itself but in a ironic twist of fate needs a goodly amount of management itself. Add support for the A6XX core code, the GMU and the HFI (hardware firmware interface) queue that the CPU uses to communicate with the GMU. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Rob Clark authored
Resync generated headers to pull in a6xx registers. Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
Failure to load firmware is the primary reason to fail adreno_load_gpu(). Try to load it first before going into the hardware initialization code and unwinding it. This is important for a6xx because the GMU gets loaded from the runtime power code and it is more costly to fail in that path because of missing firmware. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Aug 05, 2018
-
-
Kees Cook authored
In the quest to remove all stack VLA usage from the kernel[1], this switches to using a kasprintf()ed buffer. Return paths are updated to free the allocation. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
- Jul 30, 2018
-
-
Arnd Bergmann authored
All users of do_gettimeofday() have been removed, but this one recently crept in, along with an incorrect printing of the microseconds portion. This converts it to using ktime_get_real_timespec64() as a direct replacement, and adds the leading zeroes. I considered using monotonic times (ktime_get()) instead, but as this timestamp appears to only be used for humans rather than compared with other timestamps, the real time domain is probably good enough. Fixes: e43b045e2c82 ("drm/msm/gpu: Capture the state of the GPU") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-
Jordan Crouse authored
For hangs, dump copy out the contents of the buffer objects attached to the guilty submission and print them in the crash dump report. Signed-off-by:
Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by:
Rob Clark <robdclark@gmail.com>
-