Bug 98915 - NULL pointer dereference on boot - amdgpu_debugfs_add_files
Summary: NULL pointer dereference on boot - amdgpu_debugfs_add_files
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-30 15:33 UTC by Rafael Ristovski
Modified: 2016-12-10 11:28 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel log (4.27 KB, text/plain)
2016-11-30 15:33 UTC, Rafael Ristovski
no flags Details

Description Rafael Ristovski 2016-11-30 15:33:48 UTC
Created attachment 128289 [details]
Kernel log

When booting linux-next version 20161129+ (iirc 20161128 worked fine) the kernel spits out a null pointer deref. which can be traced to amdgpu_debugfs_add_files.

Kernel log attached.
Comment 1 Rafael Ristovski 2016-11-30 15:50:38 UTC
HW Details:

AMD Radeon HD 8850M (Mobile chipset)


03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Venus PRO [Radeon HD 8850M / R9 M265X] [1002:6823] (prog-if 00 [VGA controller])
	Subsystem: Dell Venus PRO [Radeon HD 8850M / R9 M265X] [1028:05eb]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 48
	Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at c0500000 (64-bit, non-prefetchable) [size=256K]
	Region 4: I/O ports at 3000 [size=256]
	Expansion ROM at c0540000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0f00c  Data: 4172
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [270 v1] #19
	Kernel driver in use: amdgpu
	Kernel modules: radeon, amdgpu
Comment 2 Alex Deucher 2016-12-03 23:37:35 UTC
Can you bisect?
Comment 3 Nicolai Stange 2016-12-05 11:43:16 UTC
(In reply to Alex Deucher from comment #2)
> Can you bisect?

No need: most likely, the offending commit is 8a357d10043c ("drm: Nerf DRM_CONTROL nodes").

C.f. the discussion at http://lkml.kernel.org/r/20161203144700.2307-1-nicstange@gmail.com

A patch for amdgpu is in the works as well.
Comment 4 Rafael Ristovski 2016-12-10 11:28:26 UTC
Fixed as of https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=58309befa82d81f6e9dc36a92d2a339ef2144535

drm/amdgpu: don't add files at control minor debugfs directory


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.