Bug 97017 - [SKL GT3e guc bisected] ~10% performance drop in most benchmarks
Summary: [SKL GT3e guc bisected] ~10% performance drop in most benchmarks
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: highest blocker
Assignee: Sagar Kamble
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2016-07-21 11:07 UTC by Eero Tamminen
Modified: 2017-07-24 22:41 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: firmware/guc


Attachments
dmesg b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a drm.debug=0xe (137.04 KB, text/plain)
2016-07-21 13:33 UTC, Tomi Sarvela
no flags Details
Enable RC6 immediately (9.68 KB, patch)
2016-07-21 17:31 UTC, Chris Wilson
no flags Details | Splinter Review
Update ring freqs after runtime resume (4.20 KB, patch)
2016-07-21 18:05 UTC, Chris Wilson
no flags Details | Splinter Review
dmesg 62e1baa128f98006261308182fe3006d66b1bf61 drm.debug=0xe (136.58 KB, text/plain)
2016-07-22 07:31 UTC, Tomi Sarvela
no flags Details

Description Eero Tamminen 2016-07-21 11:07:33 UTC
A change in drm-intel-nightly causes ~10% performance drop in benchmarks:
- 10-12% in GpuTest v0.7 "pixmark" (piano, volplosion, Julia32) tests
- 9-11% in GfxBench 4.x ALU, ALU2 and tessellation tests, both onscreen & offscreen
- 9-12% in SynMark batch, geometry, pixel, vertex and compute shader tests
- 11% in GLB 2.7 Fill tests, 8% in T-Rex & Egypt onscreen, 2% in Egypt & T-Rex offscreen
- 5-6% in Unigine Heaven & Valley

GpuTest and GLB 2.7 tests are run as windowed, rest is run as fullscreen in monitor native resolution.

Basically everything except CPU bound 3D tests dropped, regardless of whether it's:
- mostly GPU ALU or memory bandwidth limited
- fullscreened or windowed
- onscreen or offscreen

Drop happens only on SKL GT3e, there's no drop e.g. on SKL GT2 or HSW GT3e.

Drop happened between following 13th and 15th of July drm-intel-nightly commits:
- Good: 5cd2699dfcf12a399553c3186b718667523a19fc
- Bad:  5cd2699dfcf12a399553c3186b718667523a19fc

It's fully reproducible just by changing kernel, but it cannot be automatically bisected by Jenkins because large number of commits in-between are in non-buildable state.
Comment 1 Tomi Sarvela 2016-07-21 11:16:26 UTC
Good: 5cd2699dfcf12a399553c3186b718667523a19fc
2016-07-13_14-43-55 drm-intel-nightly: 2016y-07m-13d-14h-43m-27s UTC integration manifest

Doesn't compile: 2d854c67e3af36b190e8499a3bfad7cdccde0f67
2016-07-14_14-26-01 drm-intel-nightly: 2016y-07m-14d-14h-25m-35s UTC integration manifest

Bad: 30eabcaa6dcea5ca21f7e6c00da7ab4b1910396c
2016-07-15_14-53-45 drm-intel-nightly: 2016y-07m-15d-14h-53m-25s UTC integration manifest
Comment 2 Chris Wilson 2016-07-21 11:35:41 UTC
(In reply to Tomi Sarvela from comment #1)
> Good: 5cd2699dfcf12a399553c3186b718667523a19fc
> 2016-07-13_14-43-55 drm-intel-nightly: 2016y-07m-13d-14h-43m-27s UTC
> integration manifest
> 
> Doesn't compile: 2d854c67e3af36b190e8499a3bfad7cdccde0f67
> 2016-07-14_14-26-01 drm-intel-nightly: 2016y-07m-14d-14h-25m-35s UTC
> integration manifest

Seriously? What's the compilation error.

> Bad: 30eabcaa6dcea5ca21f7e6c00da7ab4b1910396c
> 2016-07-15_14-53-45 drm-intel-nightly: 2016y-07m-15d-14h-53m-25s UTC
> integration manifest

I don't have that commit. What is the shortlog between good/bad?
Comment 3 Tomi Sarvela 2016-07-21 12:36:12 UTC
As requested:

$ git shortlog 5cd2699dfcf12a399553c3186b718667523a19fc..30eabcaa6dcea5ca21f7e6c00da7ab4b1910396c

Aaron Campbell (1):
      iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas

Al Viro (2):
      Use the right predicate in ->atomic_open() instances
      nfs_atomic_open(): prevent parallel nfs_lookup() on a negative hashed

Alan Stern (1):
      SCSI: fix new bug in scsi_dev_info_list string matching

Alex Deucher (49):
      drm/amdgpu: load different smc firmware on some CI variants
      drm/radeon: load different smc firmware on some SI variants
      drm/radeon: load different smc firmware on some CI variants
      drm/amdgpu/gfx7: expand cp jt size to handle GDS as well
      drm/radeon/gfx7: expand cp jt size to handle GDS as well
      drm/amdgpu/gfx8: add state setup for CZ/ST GFX power gating
      drm/amdgpu/gfx8: rename some pg functions
      drm/amdgpu: add new GFX powergating types
      drm/amdgpu/gfx8: add powergating support for CZ/ST
      drm/amdgpu/gfx8: clean up polaris11 PG enable
      drm/amdgpu: disable power control on hybrid laptops
      drm/amdgpu: clean up atpx power control handling
      drm/amdgpu: add a delay after ATPX dGPU power off
      drm/amdgpu/atpx: add a query for ATPX dGPU power control
      drm/amdgpu: use PCI_D3hot for PX systems without dGPU power control
      drm/amdgpu/atpx: drop forcing of dGPU power control
      drm/radeon: disable power control on hybrid laptops
      drm/radeon: clean up atpx power control handling
      drm/radeon: add a delay after ATPX dGPU power off
      drm/radeon/atpx: add a query for ATPX dGPU power control
      drm/radeon: use PCI_D3hot for PX systems without dGPU power control
      drm/radeon/atpx: drop forcing of dGPU power control
      drm/amdgpu/atpx: track whether if this is a hybrid graphics platform
      drm/amdgpu/atpx: hybrid platforms use d3cold
      drm/amdgpu: drop explicit pci D3/D0 setting for ATPX power control
      drm/radeon/atpx: track whether if this is a hybrid graphics platform
      drm/radeon/atpx: hybrid platforms use d3cold
      drm/radeon: drop explicit pci D3/D0 setting for ATPX power control
      drm/amdgpu: work around lack of upstream ACPI support for D3cold
      drm/radeon: work around lack of upstream ACPI support for D3cold
      drm/amdgpu: properly clean up runtime pm
      drm/amdgpu/gfx8: fix CP jump table size
      drm/amdgpu/gfx7: fix CP jump table size
      drm/radeon/cik: fix CP jump table size
      drm/amdgpu: disable compute pipeline sync workaround when using fixed fw
      drm/amdgpu/gmc: make some functions static
      drm/amdgpu: drop wait_for_mc_idle asic callback
      drm/amdgpu: move get_gpu_clock_counter into the gfx struct
      drm/amdgpu: move select_se_sh into the gfx struct
      drm/amdgpu/gfx7: switch to using the existing rlc callbacks
      drm/amdgpu/gfx7: make gfx_v7_0_rlc_stop static
      drm/amdgpu/dce11: update async flip update time
      drm/amdgpu/powerplay/cz: add missing call to powergate VCE
      drm/amdgpu: add IP helpers for wait_for_idle and is_idle
      drm/amdgpu: add missing breaks
      drm/amdgpu: skip invalid ip blocks in ip helpers
      drm/amdgpu/gmc8: remove duplicate wait_for_idle functions
      drm/amdgpu/gmc7: remove duplicate wait_for_idle functions
      drm/amdgpu: remove more of the ring backup code

Alex Xie (3):
      drm/amdgpu: Change some variable names to make code easier understood
      drm/amdgpu: Add comment to describe the purpose of one difficult if statement
      drm/amdgpu: Initialize the variables in a straight-forward way

Alexandre Courbot (21):
      drm/nouveau/tegra: fetch gpu_speedo_id
      drm/nouveau/volt/gk20a: make unused public functions static
      drm/nouveau/volt/gk20a: constify and name v_scale
      drm/nouveau/volt/gk20a: rename constructor
      drm/nouveau/volt/gm20b: add support for vmin parameter
      drm/nouveau/clk/gk20a: properly protect macro argument
      drm/nouveau/clk/gk20a: setup slide once during init
      drm/nouveau/clk/gk20a: reorganize MNP calculation a bit
      drm/nouveau/clk/gk20a: use nvkm_ functions in slide()
      drm/nouveau/clk/gk20a: add and use MNP programming functions
      drm/nouveau/clk/gk20a: parameterize PLL settings
      drm/nouveau/clk/gk20a: factorize n_lo computation code
      drm/nouveau/clk/gk20a: improve MNP programming
      drm/nouveau/clk/gk20a: rename constructor
      drm/nouveau/clk/gm20b: add glitchless and DFS support
      drm/nouveau/secboot: fix kerneldoc for secure boot structures
      drm/nouveau/gr/gf100: handle secure boot errors
      drm/nouveau/secboot/gm200: make firmware loading re-callable
      drm/nouveau/secboot: lazy-load firmware and be more resilient
      drm/nouveau/ttm: remove special handling of coherent objects
      drm/nouveau/bus: remove cpu_coherent flag

Alexandre Demers (2):
      drm/amd/powerplay: fix trivial typo and tidy comment
      drm/amd/powerplay: fix typos in comment in polaris' hwmgr

Alexey Dobriyan (1):
      posix_cpu_timer: Exit early when process has been reaped

Arindam Nath (2):
      drm/amd/amdgpu: make sure VCE is disabled by default
      drm/amd/powerplay: make sure VCE is disabled by default

Arnd Bergmann (1):
      amdgpu: use NULL instead of 0 for pointer

Aviv Heller (1):
      bonding: fix enslavement slave link notifications

Axel Lin (1):
      regulator: qcom_smd: Remove list_voltage callback for rpm_smps_ldo_ops_fixed

Ben Skeggs (71):
      drm/nouveau/top: take nvkm_device as argument to public functions
      drm/nouveau/top: add function to lookup interrupt mask for a given device
      drm/nouveau/mc: allow construction of subclassed device
      drm/nouveau/mc: take nvkm_device as argument to public functions
      drm/nouveau/mc: expose device enable/disable separately, as well as reset
      drm/nouveau/mc: s/intr_mask/intr_stat/
      drm/nouveau/mc: support for temporarily masking interrupts from a specific device
      drm/nouveau/mc/gt215: support for masking interrupts
      drm/nouveau/mc/gf100-: support for masking interrupts
      drm/nouveau/mc/gk104-: add pmu reset mask
      drm/nouveau/secboot: use nvkm_mc_intr_mask/unmask()
      drm/nouveau/secboot: use nvkm_mc_enable/disable()
      drm/nouveau/ltc/gm107-: decode interrupt status to human-readable strings
      drm/nouveau/disp/nv50-: fix lookup of udisp table under certain circumstances
      drm/nouveau/fifo/gk104-: translate engidx into human-readable name in debug output
      drm/nouveau/bios: guard against out-of-bounds accesses to image
      drm/nouveau/bios: pointers beyond end of first image need special handling
      drm/nouveau/disp/g94: implement workaround for dvi issue on fx380
      drm/nouveau: prevent oops if no mmu subdev present
      drm/nouveau/fb/gf100-: allow selection of an alternate big page size
      drm/nouveau/core: increase maximum ce instances to 6
      drm/nouveau/core: increase maximum nvenc instances to 3
      drm/nouveau/core: recognise GP100 chipset
      drm/nouveau/top/gp100: initial support
      drm/nouveau/mc/gp100: initial support
      drm/nouveau/pci/gp100: initial support
      drm/nouveau/tmr/gp100: initial support
      drm/nouveau/bios/gp100: initial support
      drm/nouveau/bios/dp: initial support for 4.2
      drm/nouveau/bios/pll: initial support for BIT 'C' version 2
      drm/nouveau/bios/rammap: 32-bit bios pointers
      drm/nouveau/devinit/gp100: initial support
      drm/nouveau/imem/gp100: initial implementation
      drm/nouveau/fb/gp100: initial support
      drm/nouveau/mmu/gp100: initial support
      drm/nouveau/bar/gp100: initial support
      drm/nouveau/bus/gp100: initial support
      drm/nouveau/fuse/gp100: initial support
      drm/nouveau/gpio/gp100: initial support
      drm/nouveau/i2c/gm204: initial support
      drm/nouveau/ibus/gp100: initial support
      drm/nouveau/ltc/gp100: initial support
      drm/nouveau/secboot/gm200: initial support
      drm/nouveau/dma/gp100: initial implementation
      drm/nouveau/disp/gp100: initial support
      drm/nouveau/fifo/gp100: initial support
      drm/nouveau/ce/gp100: initial support
      drm/nouveau/gr/gp100: initial support
      drm/nouveau/sw/gp100: initial support
      drm/nouveau/core: recognise GP104 chipset
      drm/nouveau/top/gp104: initial support
      drm/nouveau/mc/gp104: initial support
      drm/nouveau/pci/gp104: initial support
      drm/nouveau/tmr/gp104: initial support
      drm/nouveau/bios/gp104: initial support
      drm/nouveau/devinit/gp104: initial support
      drm/nouveau/imem/gp104: initial support
      drm/nouveau/fb/gp104: initial support
      drm/nouveau/mmu/gp104: initial support
      drm/nouveau/bar/gp104: initial support
      drm/nouveau/bus/gp104: initial support
      drm/nouveau/fuse/gp104: initial support
      drm/nouveau/gpio/gp104: initial support
      drm/nouveau/i2c/gp104: initial support
      drm/nouveau/ibus/gp104: initial support
      drm/nouveau/ltc/gp104: initial support
      drm/nouveau/dma/gp104: initial support
      drm/nouveau/disp/gp104: initial support
      drm/nouveau/fifo/gp104: initial support
      drm/nouveau/ce/gp104: initial support
      drm/nouveau: check for supported chipset before booting fbdev off the hw

Bhaktipriya Shridhar (1):
      drm/amdkfd: Remove create_workqueue()

Bjørn Mork (1):
      cdc_ncm: workaround for EM7455 "silent" data interface

Bob Liu (1):
      xen-blkfront: save uncompleted reqs in blkfront_resume()

Borislav Petkov (1):
      x86/amd_nb: Fix boot crash on non-AMD systems

Brian King (1):
      ipr: Clear interrupt on croc/crocodile when running with LSI

Bruno Prémont (1):
      qla2xxx: Fix NULL pointer deref in QLA interrupt

Chris J Arges (1):
      ecryptfs: fix spelling mistakes

Chris Wilson (13):
      drm: Don't overwrite user ioctl arg unless requested
      drm/i915: Update ifdeffery for mutex->owner
      drm/i915/breadcrumbs: Queue hangcheck before sleeping
      drm/i915: Flush GT idle status upon reset
      drm/i915: Preserve current RPS frequency across init
      drm/i915: Perform static RPS frequency setup before userspace
      drm/i915: Move overclocking detection to alongside RPS frequency detection
      drm/i915: Define a separate variable and control for RPS waitboost frequency
      drm/i915: Remove superfluous powersave work flushing
      drm/i915: Defer enabling rc6 til after we submit the first batch/context
      drm/i915: Hide gen6_update_ring_freq()
      drm/i915/fbdev: Drain the suspend worker on retiring
      drm/i915/fbdev: Check for the framebuffer before use

Christian König (44):
      drm/amdgpu: fix coding style in the scheduler v2
      drm/amdgpu: remove begin_job/finish_job
      drm/amdgpu: remove duplicated timeout callback
      drm/amdgpu: fix coding style in amdgpu_job_free
      drm/amdgpu: remove use_shed hack in job cleanup
      drm/amdgpu: properly abstract scheduler timeout handling
      drm/amdgpu: move locking into the functions who need it
      drm/amdgpu: fix and cleanup job destruction
      drm/amdgpu: document amdgpu_sync_get_fence
      drm/amdgpu: generalize the scheduler fence
      drm/amdgpu: remove amdgpu_sync_wait
      drm/amdgpu: add optional ring to amdgpu_sync_is_idle
      drm/amdgpu: prefer VMIDs idle on the current ring
      drm/amdgpu: reuse VMIDs assigned to a VM only if there is also a free one
      drm/amdgpu: use a fence array for VMID management
      drm/amdgpu: remove now unnecessary checks
      drm/amdgpu: stop trying to schedule() with a spin held
      drm/ttm: cleanup ttm_tt_(unbind|destroy)
      drm/ttm: remove NULL checks when calling ttm_tt_destroy
      drm/ttm: remove dummy bo_move implementations
      drm/ttm: add wait for idle in all drivers bo_move functions
      drm/ttm: wait for BO idle in ttm_bo_move_memcpy
      drm/ttm: drop wait for idle in ttm_bo_move_buffer
      drm/ttm: drop waiting for idle in ttm_bo_evict.
      drm/ttm: wait for BO idle after the move in ttm_bo_swapout
      drm/amdgpu: sync to buffer moves before VM updates
      drm/amdgpu: remove pre move wait
      drm/ttm: remove no_gpu_wait param from ttm_bo_move_accel_cleanup
      drm/ttm: remove TTM_BO_PRIV_FLAG_MOVING
      drm/ttm: simplify ttm_bo_wait
      drm/ttm: add the infrastructure for pipelined evictions
      drm/amdgpu: save the PD addr before scheduling the job
      drm/amdgpu: pipeline evictions as well
      drm/amdgpu: add eviction counter
      drm/amdgpu: validate VM PTs only on eviction
      drm/amdgpu: implement HDP functions for UVD v2
      drm/amdgpu: don't update page tables for VM emulation
      drm/ttm: wait for eviction in ttm_bo_force_list_clean
      drm/ttm: fix stupid parameter inversion in the pipeline code
      drm/amdgpu: stop disabling irqs when it isn't neccessary
      drm/amdgpu: fix user fence handling once more
      drm/amdgpu: shorten amdgpu_job_free_resources
      drm/amdgpu: earlier free SA resources
      drm/amdgpu: remove fence parameter from amd_sched_job_init

Christophe Jaillet (1):
      fsl/fman: fix error handling

Chunming Zhou (22):
      drm/amdgpu: add gpu reset to timeout handler
      drm/amdgpu: add return value for pci config reset
      drm/amdgpu: enable BUS master after pci reset
      drm/amdgpu: block scheduler when gpu reset
      drm/amdgpu: evict vram when gpu reset
      drm/amdgpu: add amdgpu_irq_gpu_reset_resume_helper
      drm/amdgpu: must update page table after gpu reset
      drm/amdgpu: save/restore bios scratch when gpu reset
      drm/amdgpu: must update page table after gpu reset
      drm/amdgpu: stop/resume fb access when gpu reset V3
      drm/amdgpu: put old hw fence of job if gpu reset
      drm/amdgpu: remove evict vram
      drm/amd: add parent for sched fence
      drm/amd: add amd_sched_hw_job_reset
      drm/amdgpu: block ttm first before parking scheduler
      drm/amdgpu: force completion for gpu reset
      drm/amdgpu: add amd_sched_job_recovery
      drm/amdgpu: add a bool to specify if needing vm flush V2
      drm/amdgpu: abstract amdgpu_vm_is_gpu_reset
      drm/amdgpu: recovery hw jobs when gpu reset V3
      drm/amdgpu: ib test first after gpu reset
      drm/amdgpu: clean up ring_backup code, no need more

Colin Ian King (2):
      drm/vc4: clean up error exit path on failed dpi_connector allocation
      drm/vc4: remove redundant ret status check

Colin Pitrat (1):
      gpio: sch: Fix Oops on module load on Asus Eee PC 1201

Dan Carpenter (1):
      platform/chrome: cros_ec_dev - double fetch bug in ioctl

Daniel Borkmann (1):
      macsec: set actual real device for xmit when !protect_frames

Daniel Jurgens (5):
      net/mlx5: Fix incorrect page count when in internal error
      net/mlx5: Fix wait_vital for VFs and remove fixed sleep
      net/mlx5e: Timeout if SQ doesn't flush during close
      net/mlx5e: Implement ndo_tx_timeout callback
      net/mlx5e: Handle RQ flush in error cases

Daniel Vetter (9):
      Revert "drm: Resurrect atomic rmfb code"
      Merge remote-tracking branch 'origin/drm-intel-next-fixes' into drm-intel-nightly
      Merge remote-tracking branch 'origin/drm-intel-next-queued' into drm-intel-nightly
      Merge remote-tracking branch 'drm-upstream/drm-next' into drm-intel-nightly
      Merge remote-tracking branch 'sound-upstream/for-next' into drm-intel-nightly
      Merge remote-tracking branch 'sound-upstream/for-linus' into drm-intel-nightly
      Merge remote-tracking branch 'origin/topic/drm-misc' into drm-intel-nightly
      Merge remote-tracking branch 'origin/topic/core-for-CI' into drm-intel-nightly
      drm-intel-nightly: 2016y-07m-15d-14h-53m-25s UTC integration manifest

Dave Airlie (12):
      Merge tag 'drm-amdkfd-next-2016-07-03' of git://people.freedesktop.org/~gabbayo/linux into drm-next
      Merge branch 'drm-etnaviv-next' of git://git.pengutronix.de/git/lst/linux into drm-next
      Merge tag 'drm-hisilicon-next-2016-07-04' of github.com:xin3liang/linux into drm-next
      Merge branch 'drm-next-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-next
      Merge branch 'linux-4.8' of git://github.com/skeggsb/linux into drm-next
      Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
      Merge tag 'drm-intel-fixes-2016-07-14' of git://anongit.freedesktop.org/drm-intel into drm-fixes
      Merge tag 'topic/drm-misc-2016-07-14' of git://anongit.freedesktop.org/drm-intel into drm-next
      Merge tag 'drm-intel-next-2016-07-11' of git://anongit.freedesktop.org/drm-intel into drm-next
      Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
      Merge tag 'drm-vc4-next-2016-07-12' of https://github.com/anholt/linux into drm-next
      Merge branch 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next

Dave Gordon (1):
      drm/i915: unify first-stage engine struct setup

Dave Hansen (1):
      x86/cpu: Fix duplicated X86_BUG(9) macro

David Daney (1):
      MIPS: Fix page table corruption on THP permission changes.

David Mao (2):
      drm/amd/amdgpu : Refine tracepoints to track more information
      drm/amd/amdgpu : adding new tracepoints to track memory information.

David S. Miller (4):
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
      Merge branch 'mlx5-fixes'
      packet: Use symmetric hash for PACKET_FANOUT_HASH.
      Revert "fsl/fman: fix error handling"

Edmondo Tommasina (1):
      drm/radeon: allow PACKET3_PFP_SYNC_ME on evergreen

Eric Anholt (2):
      Merge tag 'drm-vc4-fixes-2016-06-06' into drm-vc4-next
      drm/vc4: Bind the HVS before we bind the individual CRTCs.

Eric Dumazet (1):
      bonding: prevent out of bound accesses

Eric Huang (24):
      drm/amdgpu: add powerplay sclk OD support through sysfs (v2)
      drm/amd/powerplay: add sclk OD support on Fiji
      drm/amd/powerplay: add sclk OD support on Tonga
      drm/amd/powerplay: add sclk OD support on Polaris10
      drm/amdgpu: add the new common pm code to select the clock levels
      drm/amdgpu: add the new common pm code to support sclk OD
      drm/amdgpu: add the CI code to enable clock level selection
      drm/amdgpu: add the CI code to enable sclk OD(OverDrive)
      drm/amdgpu: add the common code to support mclk OD
      drm/amdgpu: add mclk OD(overdrive) support for CI
      drm/amd/powerplay: add mclk OD(overdrive) support for Tonga
      drm/amd/powerplay: add mclk OD(overdrive) support for Fiji
      drm/amd/powerplay: add mclk OD(overdrive) support for Polaris10
      drm/amd/powerplay: set UVD clocks bypass mode for Polaris10
      drm/amd/powerplay: keep soft_pp_table pointer value for re-uploading
      drm/amd/powerplay: add event task of disable dynamic state management
      drm/amd/powerplay: add function disable_dpm_tasks for Fiji
      drm/amd/powerplay: add disable dpm tasks for Tonga
      drm/amd/powerplay: add disable dpm tasks for Polaris10
      drm/amd/powerplay: change backend allocation to backend init
      drm/amd/powerplay: add uploading pptable and resetting powerplay support
      drm/amd/powerplay: remove useless pp_table codes for Tonga/Fiji/Polaris10
      drm/amd/powerplay: remove useless soft pptable in Asic related backend
      drm/amdgpu: some improvement in parsing inputs

Florian Fainelli (1):
      net: bcmsysport: Device stats are unsigned long

Frank Binns (1):
      drm/amd/amdgpu: Set DRIVER_MODESET feature flag at build time

Ganapatrao Kulkarni (1):
      arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx

Ganesh Goudar (1):
      cxgb4: update latest firmware version supported

Haishuang Yan (1):
      geneve: fix max_mtu setting

Hans Verkuil (1):
      [media] v4l2-ioctl: fix stupid mistake in cropcap condition

Huang Rui (4):
      drm/amdgpu: add powercontainment module parameter
      drm/amdgpu: factor out the AMDGPU_INFO_FW_VERSION case branch into amdgpu_firmware_info
      drm/amdgpu: introduce a firmware debugfs to dump all current firmware versions
      drm/amdgpu: change pcie_gen_cap magic code to macro

Hugh Dickins (1):
      tmpfs: fix regression hang in fallocate undo

James Bottomley (1):
      Merge branch 'jejb-fixes' into fixes

James Morse (1):
      arm64: kernel: Save and restore UAO and addr_limit on exception entry

Jan Beulich (4):
      xenbus: don't BUG() on user mode induced condition
      xenbus: don't bail early from xenbus_dev_request_and_reply()
      xenbus: simplify xenbus_dev_request_and_reply()
      xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7

Jarod Wilson (1):
      e1000e: keep Rx/Tx HW_VLAN_CTAG in sync

Jeff Layton (1):
      posix_acl: de-union a_refcount and a_rcu

Jeff Mahoney (2):
      Revert "ecryptfs: forbid opening files without mmap handler"
      ecryptfs: don't allow mmap when the lower fs doesn't support it

Jens Axboe (1):
      Merge branch 'stable/for-jens-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus

Joerg Roedel (1):
      iommu/amd: Fix unity mapping initialization race

Johan Hovold (1):
      Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"

Jon Mason (1):
      MAINTAINERS: Update the Calgary IOMMU entry

Josh Poimboeuf (2):
      perf/x86: Fix 32-bit perf user callgraph collection
      objtool: Fix STACK_FRAME_NON_STANDARD macro checking for function symbols

Julia Lawall (2):
      ecryptfs: drop null test before destroy functions
      drm/nouveau/gr/gk20a: delete unneeded second newline

Junwei Zhang (1):
      drm/amdgpu/dce8: fix flash with white screen on monitor

Karol Herbst (2):
      drm/nouveau/volt: save the voltage range we are able to set
      drm/nouveau/hwmon: add in_min and in_max

Ken Wang (3):
      drm/amdgpu: remove gfx8 registers that vary between asics
      drm/amdgpu: Add a missing register to Polaris golden setting
      drm/amdgpu: fix power distribution issue for Polaris10 XT

Laurent Pinchart (1):
      [media] adv7604: Don't ignore pad number in subdev DV timings pad operations

Linus Torvalds (28):
      Merge tag 'chrome-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
      Merge tag 'sound-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
      Merge tag 'configfs-for-4.7' of git://git.infradead.org/users/hch/configfs
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block
      Merge tag 'pm-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
      Merge tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
      Merge tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux
      Merge tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
      Merge tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
      Merge tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
      Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
      Linux 4.7-rc7
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
      Merge tag 'qcom-smd-list-voltage' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      Merge tag 'acpi-urgent-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
      Merge tag 'media/v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Linus Walleij (1):
      Revert "gpio: gpiolib-of: Allow compile testing"

Lionel Landwerlin (1):
      drm/i915: add missing condition for committing planes on crtc

Lucas Stach (2):
      drm/etnaviv: improve error reporting in GPU init path
      drm/etnaviv: remove generic GPU init failure reporting

Lukas Wunner (3):
      x86/quirks: Apply nvidia_bugs quirk only on root bus
      x86/quirks: Reintroduce scanning of secondary buses
      x86/quirks: Add early quirk to reset Apple AirPort card

Lv Zheng (3):
      ACPICA: Namespace: Fix namespace/interpreter lock ordering
      ACPI / debugger: Fix regression introduced by IS_ERR_VALUE() removal
      ACPI / EC: Fix code ordering issue in ec_remove_handlers()

Lyude (6):
      drm/radeon: Poll for both connect/disconnect on analog connectors
      drm/amdgpu: Poll for both connect/disconnect on analog connectors
      drm/i915/vlv: Make intel_crt_reset() per-encoder
      drm/i915/vlv: Reset the ADPA in vlv_display_power_well_init()
      drm/i915/vlv: Disable HPD in valleyview_crt_detect_hotplug()
      drm/i915: Enable polling when we don't have hpd

Marek Szyprowski (5):
      drm/exynos: iommu: move dma_params configuration code to separate functions
      drm/exynos: iommu: add a check if all sub-devices have iommu controller
      drm/exynos: iommu: remove unused entries from exynos_drm_private strcuture
      drm/exynos: iommu: move ARM specific code to exynos_drm_iommu.h
      drm/exynos: iommu: add support for ARM64 specific code for IOMMU glue

Marek Vasut (1):
      configfs: Remove ppos increment in configfs_write_bin_file

Mario Kleiner (1):
      drm/vc4: Implement precise vblank timestamping.

Mark Rutland (1):
      perf/core: Fix pmu::filter_match for SW-led groups

Martin KaFai Lau (1):
      ipv6: Fix mem leak in rt6i_pcpu

Masanari Iida (1):
      x86/Documentation: Fix various typos in Documentation/x86/ files

Matt Corallo (1):
      net: stmmac: Fix null-function call in ISR on stmmac1000

Matthew Auld (1):
      drm/i915: remove superfluous i915_gem_object_free_mmap_offset call

Matthew Finlay (1):
      net/mlx5e: Copy all L2 headers into inline segment

Mauro Carvalho Chehab (1):
      Merge tag 'v4.7-rc2' into v4l_for_linus

Michel Dänzer (1):
      drm/amdgpu: Unpin BO if we can't get fences in amdgpu_crtc_page_flip

Mohamad Haj Yahia (4):
      net/mlx5: Fix teardown errors that happen in pci error handler
      net/mlx5: Avoid calling sleeping function by the health poll thread
      net/mlx5: Fix potential deadlock in command mode change
      net/mlx5: Add timeout handle to commands with callback

Monk Liu (2):
      drm/amdgpu: clear RB at ring init
      drm/amdgpu: fix ring debugfs bug

Nicolai Hähnle (5):
      drm/amdgpu: add amdgpu.cg_mask and amdgpu.pg_mask parameters
      drm/amdgpu: remove cgs_acpi_method_argument member method_length
      drm/amdgpu: add disable_cu parameter
      drm/amdgpu/gfx7: set USER_SHADER_ARRAY_CONFIG based on disable_cu parameter
      drm/amdgpu/gfx8: set USER_SHADER_ARRAY_CONFIG based on disable_cu parameter

Oded Gabbay (1):
      drm/amdkfd: destroy mutex if process creation fails

Omar Sandoval (1):
      block: fix use-after-free in sys_ioprio_get()

Or Gerlitz (1):
      net/mlx5: Avoid setting unused var when modifying vport node GUID

Paul Burton (2):
      irqchip/mips-gic: Map to VPs using HW VPNum
      irqchip/mips-gic: Match IPI IRQ domain by bus token only

Peter Chen (5):
      gpu: drm: vc4_hdmi: add missing of_node_put after calling of_parse_phandle
      gpu: drm: omapdrm: connector-dvi: add missing of_node_put after calling of_parse_phandle
      gpu: drm: omapdrm: dss-of: add missing of_node_put after calling of_parse_phandle
      gpu: drm: exynos_hdmi: add missing of_node_put after calling of_parse_phandle
      gpu: drm: arcpgu_drv: add missing of_node_put after calling of_parse_phandle

Peter Zijlstra (2):
      sched/fair: Fix effective_load() to consistently use smoothed load
      sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion

Rafael J. Wysocki (7):
      x86/power/64: Fix kernel text mapping corruption during image restoration
      Merge branches 'pm-cpuidle-fixes' and 'pm-sleep-fixes'
      Merge branches 'acpica-fixes', 'acpi-pci-fixes' and 'acpi-debug-fixes'
      Revert "ACPICA: Namespace: Fix namespace/interpreter lock ordering"
      Revert "ACPICA: Namespace: Fix deadlock triggered by MLC support in dynamic table loading"
      Revert "ACPI 2.0 / AML: Improve module level execution by moving the If/Else/While execution to per-table basis"
      Merge branches 'acpica-fixes' and 'acpi-ec-fixes'

Rana Shahout (2):
      net/mlx5e: Fix select queue callback
      net/mlx5e: Validate BW weight values of ETS

Randy Dunlap (1):
      init/Kconfig: keep Expert users menu together

Rex Zhu (8):
      drm/amd/powerplay: functions's return state was reversed
      drm/amd/powerplay: change condition judgment as function's return value changed.
      drm/amdgpu: get number of shade engine by cgs interface.
      drm/amd/powerplay: add mvdd dpm support.
      drm/amd/powerplay: add shared definitions for di/dt feature.
      drm/amd/powerplay: add definitions related to di/dt feature for fiji and polaris.
      drm/amdgpu: add read/write function for GC CAC programming
      drm/amd/powerplay: don't add invalid voltage.

Richard Alpe (1):
      tipc: fix nl compat regression for link statistics

Rob Herring (1):
      drm: vc4: enable XBGR8888 and ABGR8888 pixel formats

Roy Spliet (2):
      drm/nouveau/clk/gf100-: Clean up PLL locking test
      drm/nouveau/clk/gf100: Read secondary bypass postdiv when required

Russell King (1):
      drm/etnaviv: enable GPU module level clock gating support

Russell King - ARM Linux (1):
      net: mvneta: fix open() error cleanup

Sergio Valverde (1):
      enc28j60: Fix race condition in enc28j60 driver

Shaker Daibes (1):
      net/mlx5e: Log link state changes

Shmulik Ladkani (1):
      ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output

Shreyas B. Prabhu (1):
      cpuidle: Fix last_residency division

Sinan Kaya (3):
      ACPI,PCI,IRQ: factor in PCI possible
      Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
      ACPI,PCI,IRQ: separate ISA penalty calculation

Sinclair Yeh (7):
      drm/vmwgfx: Add a check to handle host message failure
      drm/vmwgfx: Work around mode set failure in 2D VMs
      drm/vmwgfx: Add an option to change assumed FB bpp
      drm/ttm: Make ttm_bo_mem_compat available
      drm/vmwgfx: Check pin count before attempting to move a buffer
      drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
      drm/vmwgfx: Fix error paths when mapping framebuffer

Sony Chacko (1):
      qlcnic: add wmb() call in transmit data path.

Soohoon Lee (1):
      usbnet: Stop RX Q on MTU change

Stefan Hauser (1):
      net: phy: dp83867: Fix initialization of PHYCR register

Stephane Eranian (1):
      perf/x86/intel: Update event constraints when HT is off

Tahsin Erdogan (1):
      writeback: inode cgroup wb switch should not call ihold()

Thomas Gleixner (1):
      cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds scribble

Thomas Hellstrom (1):
      drm/vmwgfx: Fix corner case screen target management

Tobias Jakobi (4):
      drm/rockchip: make fbdev support really optional
      drm/rcar-du: make fbdev support really optional
      drm/atmel-hlcdc: make fbdev support really optional
      drm/nouveau: make fbdev support really optional

Tom St Denis (15):
      drm/amdgpu/gfx8: Enable GFX PG on CZ
      drm/amdgpu/gfx8: Add serdes wait for idle in CGCG en/disable
      drm/amd/amdgpu: Convert ring debugfs entries to binary
      drm/amd/amdgpu: ring debugfs is read in increments of 4 bytes
      drm/amdgpu/trace:  Add tracepoints to MMIO read/writes
      drm/amdgpu/gfx8: Switch Stoney to share CZ's RLC functions
      drm/amdgpu/gfx8: Enable CG on Stoney
      drm/amdgpu/gfx8: Enable PG on Stoney
      drm/amdgpu/gfx8: Tidy up various PG helpers
      drm/amdgpu/gfx80:  Add QUICK_PG bit to GFX header and use it.
      drm/amdgpu/uvd6: De-numberify startup
      drm/amd/gfx: add instance field to select_se_sh (v3)
      drm/amd/amdgpu: Add gca config debug entry (v4)
      drm/amd/amdgpu: Add bank selection for MMIO debugfs (v3)
      drm/amd/powerplay:  Unify family defines

Tvrtko Ursulin (6):
      drm/i915: Prepare for engine init unification
      drm/i915: Unify engine init loop
      drm/i915: Make more use of the shared engine irq setup
      drm/i915: Simplify intel_init_ring_buffer prototype
      drm/i915: Move common engine setup into intel_engine_cs.c
      drm/i915: Pull out some more common engine init code

Ursula Braun (1):
      qeth: delete napi struct when removing a qeth device

Vegard Nossum (4):
      RDS: fix rds_tcp_init() error path
      net: fix decnet rtnexthop parsing
      apparmor: fix oops, validate buffer size in apparmor_setprocattr()
      perf/x86: Fix bogus kernel printk, again

Ville Syrjälä (4):
      x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
      drm/i915: Ignore panel type from OpRegion on SKL
      drm/i915: Unbreak interrupts on pre-gen6
      drm/i915: Ignore panel type from OpRegion on SKL

WANG Cong (1):
      net_sched: fix mirrored packets checksum

Wei Yongjun (1):
      drm/hisilicon: Fix return value check in ade_dts_parse()

Wei Yuan (1):
      eCryptfs: fix typos in comment

Xin Long (1):
      ixgbevf: ixgbevf_write/read_posted_mbx should use IXGBE_ERR_MBX to initialize ret_val

Xinliang Liu (1):
      drm/hisilicon: Fix ADE vblank on/off handling

Zoltan Kuscsik (1):
      drm/hisilicon: add select HISI_KIRIN_DW_DSI

hayeswang (2):
      r8152: clear LINK_OFF_WAKE_EN after autoresume
      r8152: fix runtime function for RTL8152

yanyang1 (1):
      drm/amdgpu: print smc fw info in CGS.
Comment 4 Tomi Sarvela 2016-07-21 13:21:26 UTC
Made the coarse autobisect-script even better with git bisect skip.

Bisected the problem to first bad commit:

BISECT_BEFORE 62e1baa128f98006261308182fe3006d66b1bf61
BISECT_AFTER b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a

git://anongit.freedesktop.org/drm-intel

commit b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jul 13 09:10:37 2016 +0100

    drm/i915: Defer enabling rc6 til after we submit the first batch/context

Manually tested good and bad commit, checks out.

Tested with i915.enable_rc6=0, bad is still bad.

Kernels compiled without debugging options for performance reasons.
Comment 5 Eero Tamminen 2016-07-21 13:28:41 UTC
Is there something in i915 rc6 handling that differs between SKL GT2 and GT3e?

PS. Forgot to mention earlier that testing was done on Ubuntu 16.04 + Unity, with kernel/X/Mesa built from Git.  Gap is there both with CPU P-state powersave (Ubuntu default) and performance governors, and whether compiz is compositing doesn't affect the results.
Comment 6 Tomi Sarvela 2016-07-21 13:33:38 UTC
Created attachment 125227 [details]
dmesg b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a drm.debug=0xe
Comment 7 Chris Wilson 2016-07-21 15:04:10 UTC
(In reply to Eero Tamminen from comment #5)
> Is there something in i915 rc6 handling that differs between SKL GT2 and
> GT3e?

Actually there is.

NEEDS_WaRsDisableCoarsePowerGating() is skl gt3/gt4.

Not sure how that relates to the patch, comment 6 shows that we are still enabling RC6 pretty earlier, so it should not be a complete failure...

Tomi, could you also post the dmesg from b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a^ (the last good commit)?
Comment 8 Chris Wilson 2016-07-21 15:14:48 UTC
NEEDS_WaRsDisableCoarsePowerGating() also impacts the guc it seems - another variable to check (whether not this bug is affected by enabling/disabling the guc).
Comment 9 Chris Wilson 2016-07-21 17:31:46 UTC
Created attachment 125236 [details] [review]
Enable RC6 immediately

The complexity of that patch is no longer required, so let's try a simpler version.
Comment 10 Chris Wilson 2016-07-21 18:03:40 UTC
Hmm, that was overkill (Mika will complain again about not waiting for a context before enabling RC6). A better couple of patches would be
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug97017
Comment 11 Chris Wilson 2016-07-21 18:05:17 UTC
Created attachment 125239 [details] [review]
Update ring freqs after runtime resume

This is the most likely candidate from https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug97017
Comment 12 Tomi Sarvela 2016-07-22 07:31:08 UTC
Created attachment 125252 [details]
dmesg 62e1baa128f98006261308182fe3006d66b1bf61 drm.debug=0xe

Dmesg from last good
Comment 13 Chris Wilson 2016-07-22 08:47:12 UTC
(In reply to Tomi Sarvela from comment #12)
> Created attachment 125252 [details]
> dmesg 62e1baa128f98006261308182fe3006d66b1bf61 drm.debug=0xe
> 
> Dmesg from last good

RC6 on occurs at the same time, so I think it's not the change in deferral mechanism.
Comment 14 Chris Wilson 2016-07-25 08:15:41 UTC
Tomi, can you see if the attached https://bugs.freedesktop.org/attachment.cgi?id=125239 helps?
Comment 15 Tomi Sarvela 2016-07-26 10:26:33 UTC
Tested the following combinations
Nightly: Nightly_666, regression noticed, commit 30eabca
Patch: https://bugs.freedesktop.org/attachment.cgi?id=125239
GuC options: i915.enable_guc_submission= i915.enable_guc_loading=

Nightly + GuC enabled = bad
Nightly + Guc disabled = good
Nightly + Patch + GuC enabled = bad
Nightly + Patch + GuC disabled = good
First bad + GuC disabled = good
First bad + Guc default (enabled) = bad
Comment 16 Tomi Sarvela 2016-07-26 10:46:08 UTC
(In reply to Tomi Sarvela from comment #15)
> Tested the following combinations
> Nightly: Nightly_666, regression noticed, commit 30eabca
> Patch: https://bugs.freedesktop.org/attachment.cgi?id=125239
> GuC options: i915.enable_guc_submission= i915.enable_guc_loading=
> 
> Nightly + GuC enabled = bad
> Nightly + Guc disabled = good
> Nightly + Patch + GuC enabled = bad
> Nightly + Patch + GuC disabled = good
> First bad + GuC disabled = good
> First bad + Guc default (enabled) = bad

Last good + Guc enabled = good
Comment 17 Jari Tahvanainen 2016-09-21 11:53:22 UTC
Highest+Blocker due to Regression w/o workaround
Comment 18 Carlos Santa 2016-10-05 01:24:29 UTC
The drop in performance (5-10%) due to the presence of the GuC firmware is still seen when running the benchmarks. The numbers do improve when running it without the GuC.

This is on top of nightly (Oct 3rd) + Mesa 11.01 + Guc/HuC patch series and enabling both i915.enable_guc_loading=2 and i915.enable_guc_submission=2.

@Tommy, when you say "Last good + Guc enabled = good" what was the last good commit? I missed that during my investigation as I thought just introducing the GuC was the cause.
Comment 19 Tomi Sarvela 2016-10-07 07:35:25 UTC
Last good is BISECT_BEFORE 62e1baa128f98006261308182fe3006d66b1bf61
Comment 20 Chris Wilson 2016-10-07 07:44:05 UTC
Note that the WaRsDisableCoarsePowerGating is also meant to be improved by GuC 9.x
Comment 21 Carlos Santa 2016-10-11 02:48:39 UTC
The latest update on this front:

Next steps:

1. Continue experimenting w/ at least 4 more different builds to prove that the performance issue is somehow related to the inter dependencies between the changes from the last bad commit (commit b7137e0cf1e55b5b0cb88fbd85425a1bc0d24c3a - "drm/i915: Defer enabling rc6 til after we submit the first batch/context") and having the system w/ GuC enabled for load/submission.

The 4 builds include:

a. Last good commit + no GuC
b. Last good commit + GuC
c. Last bad commit + no GuC
d. Last badd commit + GuC

2. Try with the latest (approved from VPG) GuC w/ version 9.x as suggested by Chris W to see if there is an improvement to WaRsDisableCoarsePowerGating() which is related to the RC6 changes affecting SKL. (Note, on SKL we are in fact using an older version of the GuC).
Comment 22 Carlos Santa 2016-10-18 02:19:25 UTC
Link to the data sheet showing results of GuC 6.1 vs GuC 9.13 using drm-nightly (Oct 17).

https://docs.google.com/a/intel.com/spreadsheets/d/1Y6VBlZsZ6NRRHUeyJtST4XLiEjIOc0SzGV1cjGRM1T0/edit?usp=sharing

The gap is still there (GuC 9.x vs without GuC) but it's specially larger on certain tests only (~10% but not across aboard).

See the data above for reference.
Comment 23 Carlos Santa 2016-10-29 01:40:40 UTC
This issue will be investigated by the VPG team. We are able to bring the performance back as previously seen before the GuC by forcing the GPU to run at a lower frequency (i.e., 300MHz). So far the analysis of the investigation points to an imbalance of the GPU/CPU frequencies when the GuC submission is enabled causing the GPU frequencies to run too high. On certain test loads the lower CPU frequency may be causing the drop in fps.
Comment 24 dog 2016-12-13 14:52:58 UTC
Jeff, can you update this bug with your team's plan to investigate and fix?
Comment 25 dog 2016-12-13 14:54:01 UTC
Jeff, can you update this bug with your team's plan to investigate and fix?
Comment 26 Jeff McGee 2016-12-13 15:41:01 UTC
Assigning to Sagar from my team. He has been leading PnP efforts around GuC. Sagar - can you give a summary?
Comment 27 dog 2016-12-14 00:07:54 UTC
See the description at the top.  If you have new GuC FW that Eero can retest with, let him know.  He can likely retest faster than you can.
Comment 28 Sagar Kamble 2017-01-11 18:15:52 UTC
SLPC Turbo helps resolve these regressions. This was tested with v9 GuC firmware and patches at https://patchwork.freedesktop.org/series/17537/. Perf-meter output shows fluctuations in the frequency with Host RPS wheres SLPC Turbo keeps running at high frequency. CPG is disabled for GT3 SKU so CPG forcewakes latency may not be stalling submission rate as in APL. More debug/analysis in progress.
Comment 29 Sagar Kamble 2017-01-13 10:50:34 UTC
Performance drop is happening due intermittent lowering of GT frequency by Host RPS. This lowering is happening due to burst of RP UP interrupts that makes Host RPS adjustments go bad and overflow negative.

Kernel patch has been posted for trybot testing at https://patchwork.freedesktop.org/patch/133064/.
This fix will apply to all platforms.
Comment 30 yann 2017-01-20 13:35:48 UTC
Eero, as original reporter, can you try Sagar's patchset and confirm then the status?
thanks
Comment 31 Sagar Kamble 2017-01-25 11:17:28 UTC
Performance cycle is completed with fix (with v9 GuC firmware) and all these regressions are fixed.

Results for DRM-Tip 362e5eb + http://pixel.fi.intel.com/~tsa/non-slpc-ww3.5.mbox + v9 GuC firmware at benchsrv
Custom/SKL_6260U_nuci5/2017-01-24T13:45:02Z

Base results:  Custom/SKL_6260U_nuci5/2017-01-09T14:17:38Z
Comment 32 Eero Tamminen 2017-02-13 13:19:48 UTC
(In reply to yann from comment #30)
> Eero, as original reporter, can you try Sagar's patchset and confirm then
> the status?

Fix verification can be only done only after the fix actually is in drm-tip (and AFAIK it's not there yet)...

Why the bug is in NEEDINFO state?
Comment 33 Sagar Kamble 2017-02-14 06:37:46 UTC
Although the fix from GuC perspective is not merged, other patch that is making Host RPS handle erroneous adjustment properly is merged from https://patchwork.freedesktop.org/series/18252/.

Specifically following patch is helping resolve the issue that is merged in drm-tip:
7e79a68 drm/i915: Set adjustment to zero on Up/Down interrupts if freq is already max/min

With current available GuC firmware 6.1 for SKL I could verify this fix for workloads. Scores for 3 runs:

default:
glb_egypt_fixedtime = 129.0 <= 130.0 <= 131.0
gfxbench3_alu_offscreen = 287.9 <= 288.0 <= 288.1

with guc enabled
glb_egypt_fixedtime = 130.0 <= 130.0 <= 130.0
gfxbench3_alu_offscreen = 287.9 <= 288.2 <= 289.5

with fix reverted
glb_egypt_fixedtime = 119.0 <= 119.0 <= 120.0
gfxbench3_alu_offscreen = 254.5 <= 254.5 <= 254.5

If this bug is not gated by GuC submission enabling we can close this.
Hence I am marking this as resolved. Kindly correct if this needs to changed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.