Bug 108980 - GF117: MMIO write of 0000001f FAULT at 6013d4 [ IBUS ]
Summary: GF117: MMIO write of 0000001f FAULT at 6013d4 [ IBUS ]
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-08 14:49 UTC by Johnny B. Goode
Modified: 2019-04-21 06:00 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
nouveau debug (24.05 KB, application/x-bzip)
2018-12-09 04:02 UTC, Johnny B. Goode
no flags Details
GF117 rom (35.77 KB, application/x-bzip)
2018-12-09 04:04 UTC, Johnny B. Goode
no flags Details
avoid touching falcon registers on fini (1.22 KB, patch)
2018-12-12 02:43 UTC, Ilia Mirkin
no flags Details | Splinter Review
fix gf117 volt speedo fuse id (1.15 KB, patch)
2018-12-13 04:55 UTC, Ilia Mirkin
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny B. Goode 2018-12-08 14:49:13 UTC
Kernel 4.10 and next, till the last:

# dmesg | grep nouveau
[   15.161745] nouveau: detected PR support, will not use DSM
[   15.163013] nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[   15.164408] nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[   15.186628] nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[   15.271378] nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[   15.289248] nouveau 0000:08:00.0: volt: couldn't find speedo value, volting not possible
[   15.289262] nouveau 0000:08:00.0: bus: MMIO write of 0000001f FAULT at 6013d4 [ IBUS ]
[   15.935388] nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[   15.936749] nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[   15.938436] nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[   15.939583] nouveau 0000:08:00.0: DRM: DCB version 4.0
[   15.942999] nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[   15.944156] [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 1

Before kernel 4.10.x
(in this example 4.9.140):

# dmesg | grep nouveau
[   15.755065] nouveau: detected PR support, will not use DSM
[   15.756863] nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[   15.758852] nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[   15.785141] nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[   15.868828] nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[   16.510223] nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[   16.510227] nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[   16.510232] nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[   16.510236] nouveau 0000:08:00.0: DRM: DCB version 4.0
[   16.510240] nouveau 0000:08:00.0: DRM: Pointer to flat panel table invalid
[   16.640514] nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[   16.640527] [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 1
[   22.007459] nouveau 0000:08:00.0: DRM: evicting buffers...
[   22.008617] nouveau 0000:08:00.0: DRM: waiting for kernel channels to go idle...
[   22.009842] nouveau 0000:08:00.0: DRM: suspending client object trees...
[   22.011334] nouveau 0000:08:00.0: DRM: suspending kernel object tree...

Hardware:

Dell Inspiron 3542 (0652).

# lspci -vvv
08:00.0 3D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/810M/820M / GT 620M/625M/630M/720M] (rev a1)
	Subsystem: Dell GeForce 820M
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 49
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at d000 [size=128]
	Expansion ROM at f7000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee003b8  Data: 0000
	Capabilities: [78] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

Distribution: Fedora, Gentoo
Comment 1 Ilia Mirkin 2018-12-08 17:07:18 UTC
GF117 doesn't have a display unit. Looks like something is writing to the VGA registers anyways.

No real harm, but we should avoid it.
Comment 2 Ilia Mirkin 2018-12-08 18:19:34 UTC
Could you include your vbios (easiest to grab it from /sys/kernel/debug/dri/*/vbios.rom)?

Also a boot with nouveau.debug=trace would help narrow down the source, even if it's not the bios init logic.
Comment 3 Johnny B. Goode 2018-12-09 04:02:58 UTC
Created attachment 142760 [details]
nouveau debug
Comment 4 Johnny B. Goode 2018-12-09 04:04:24 UTC
Created attachment 142761 [details]
GF117 rom
Comment 5 Ilia Mirkin 2018-12-09 04:59:24 UTC
This is quite confusing. It would appear it's happening in volt's oneinit function:

[   17.475047] kernel: nouveau 0000:08:00.0: volt: init running...
[   17.475749] kernel: nouveau 0000:08:00.0: volt: one-time init running...
[   17.476499] kernel: nouveau 0000:08:00.0: volt: couldn't find speedo value, volting not possible
[   17.476513] kernel: nouveau 0000:08:00.0: bus: MMIO write of ffff881f FAULT at 6013d4 [ IBUS ]
[   17.477150] kernel: nouveau 0000:08:00.0: volt: one-time init completed in 668us
[   17.478504] kernel: nouveau 0000:08:00.0: volt: current voltage unknown
[   17.479174] kernel: nouveau 0000:08:00.0: volt: init completed in 3426us

However nothing in there could cause that... of course the notification only comes in later (via interrupt servicing), so it may not be directly correlated to these messages.

Auditing the other common suspects, nothing in the bios will try to use any of the IO/etc commands which generally trigger these.

AHA! I have a theory.

devinit's preinit calls nvkm_lockvgac(). Can you try commenting that out?

drivers/gpu/drm/nouveau/nvkm/subdev/devinit/base.c:nvkm_devinit_preinit -- just comment out the call to "nvkm_lockvgac".

If that doesn't help, then I'm out of ideas, and we have to take more of a shotgun approach -- add a

WARN() at the top of drivers/gpu/drm/nouveau/nvkm/engine/disp/vga.c:nvkm_wrport and rdport, and see who's calling them. No one should be on a GF117, I believe.
Comment 6 Johnny B. Goode 2018-12-09 09:19:02 UTC
After commenting out "nvkm_lockvgac"

dmesg | grep nouveau
[   15.460433] nouveau: detected PR support, will not use DSM
[   15.460460] nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[   15.542557] nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[   15.572939] nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[   15.656684] nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[   15.674657] nouveau 0000:08:00.0: volt: couldn't find speedo value, volting not possible
[   15.674671] nouveau 0000:08:00.0: bus: MMIO read of 00000000 FAULT at 084048 [ IBUS ]
[   16.325269] nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[   16.326428] nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[   16.327626] nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[   16.328896] nouveau 0000:08:00.0: DRM: DCB version 4.0
[   16.331925] nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[   16.333193] [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 1

Now it is MMIO read of 00000000 FAULT at 084048 [ IBUS ]
Comment 7 Ilia Mirkin 2018-12-09 15:35:06 UTC
OK, I see what's going on. This is going to be a game of whack-a-mole.

We make a bunch of MMIO accesses to "illegal" areas. The "error" buffer is only 1-deep, i.e. it keeps track of only the last one. When we enable reporting, it reports whichever was the first, or the last such error. With the vga write fixed, now it's something else.

84048 is PVLD.ACCESS_EN. I guess we're accessing it before it's enabled? It looks like this is coming from nvkm_falcon_fini (which is, counter-intuitively, called as part of the initialization flow).

The logic in core/subdev.c is to first run ->fini() and then do nvkm_mc_reset() which will turn off-then-on the bit that enables the engine.

Ben - what's the proper way to deal with this? Tweak PMC.ENABLE in nvkm_device_fini? [We could always just clear out those errors before first enabling the bus intr reporting, but that would just be sticking our heads in the sand...]

In the meanwhile, I'll work on a proper patch for the vgalock thing.
Comment 8 Ilia Mirkin 2018-12-09 17:03:59 UTC
... and of course fixing that vga thing isn't as easy as it looks.

nvkm_device_preinit first calls all the preinit's and THEN computes the disable mask. So when nvkm_devinit_preinit runs, it won't have the disable mask that tells it that the display unit is fused off.

And unfortunately I don't know why that logic is there in the first place, so this is one for Ben as well. Perhaps it's a holdover from the time of the ancients, and can be restricted to family < NV_50 entirely?
Comment 9 Ilia Mirkin 2018-12-12 02:43:08 UTC
Created attachment 142783 [details] [review]
avoid touching falcon registers on fini

See what triggers this next. (Keep your lockvga patch in place too. Still working on that one.)
Comment 10 Johnny B. Goode 2018-12-12 18:12:34 UTC
It's like chasing rabbits.

dmesg | grep nouveau
[   14.869172] nouveau: detected PR support, will not use DSM
[   14.869201] nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[   14.979493] nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[   15.005578] nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[   15.089370] nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[   15.106637] nouveau 0000:08:00.0: volt: couldn't find speedo value, volting not possible
[   15.106651] nouveau 0000:08:00.0: bus: MMIO read of 00000000 FAULT at 0212cc [ IBUS ]
[   15.753323] nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[   15.754432] nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[   15.755693] nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[   15.756797] nouveau 0000:08:00.0: DRM: DCB version 4.0
[   15.760007] nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[   15.761582] [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 1
Comment 11 Ilia Mirkin 2018-12-13 04:08:20 UTC
(In reply to Johnny B. Goode from comment #10)
> [   15.106637] nouveau 0000:08:00.0: volt: couldn't find speedo value,
> volting not possible
> [   15.106651] nouveau 0000:08:00.0: bus: MMIO read of 00000000 FAULT at
> 0212cc [ IBUS ]

$ lookup -a d7 212cc
PFUSE.FUSES.SPEEDO => 0

Ben - do we need to enable something more? We're going through gf100_fuse_read, which is pretty careful about enabling / disabling fuses around the read. Is fuse 0x1cc just not available on GF117?
Comment 12 Ilia Mirkin 2018-12-13 04:55:52 UTC
Created attachment 142798 [details] [review]
fix gf117 volt speedo fuse id

Legit bug here -- looks like the speedo fuse moved on GF117 to the kepler location. (Still in the old place on GF119, which logically came before GF117.)
Comment 13 Johnny B. Goode 2018-12-13 18:09:03 UTC
Now it is clean

dmesg | grep nouveau
[   14.413889] nouveau: detected PR support, will not use DSM
[   14.414961] nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[   14.416192] nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[   14.438204] nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[   14.522882] nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[   15.185221] nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[   15.186151] nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[   15.187077] nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[   15.188001] nouveau 0000:08:00.0: DRM: DCB version 4.0
[   15.190738] nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[   15.191642] [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 0
Comment 14 Ilia Mirkin 2018-12-13 18:14:49 UTC
Woohoo! I'll submit my 2 patches upstream, but we'll have to think what to do about the lockvga situation ... we do it REALLY early right now.
Comment 15 Johnny B. Goode 2018-12-13 18:19:26 UTC
Super. Thank you :)
Comment 16 Johnny B. Goode 2019-04-21 06:00:28 UTC
Kernel 5 generates more faults then earlier version.

# uname -a
Linux 5.0.9-gentoo #1 SMP Sun Apr 21 04:31:13 CEST 2019 x86_64 Intel(R) Core(TM) i3-4030U CPU @ 1.90GHz GenuineIntel GNU/Linux

# journalctl -b -1 --no-hostname -o short-monotonic | grep nouveau
[    5.629856] kernel: nouveau: detected PR support, will not use DSM
[    5.629893] kernel: nouveau 0000:08:00.0: enabling device (0006 -> 0007)
[    5.630103] kernel: nouveau 0000:08:00.0: NVIDIA GF117 (0d7000a2)
[    5.785996] kernel: nouveau 0000:08:00.0: bios: version 75.17.86.00.04
[    5.870928] kernel: nouveau 0000:08:00.0: fb: 2048 MiB DDR3
[    6.527871] kernel: nouveau 0000:08:00.0: DRM: VRAM: 2048 MiB
[    6.527873] kernel: nouveau 0000:08:00.0: DRM: GART: 1048576 MiB
[    6.527876] kernel: nouveau 0000:08:00.0: DRM: Pointer to TMDS table invalid
[    6.527878] kernel: nouveau 0000:08:00.0: DRM: DCB version 4.0
[    6.529675] kernel: nouveau 0000:08:00.0: DRM: MM: using COPY0 for buffer copies
[    6.529850] kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:08:00.0 on minor 1
[    7.681975] kernel: nouveau 0000:08:00.0: bus: MMIO write of 0000001f FAULT at 6013d4 [ IBUS ]
[    7.682002] kernel: nouveau 0000:08:00.0: bus: MMIO write of badf1001 FAULT at 50405c [ IBUS ]
[   36.995439] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[   47.395823] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  186.745334] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  186.745358] kernel: nouveau 0000:08:00.0: bus: MMIO write of badf1001 FAULT at 50405c [ IBUS ]
[  308.263103] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  325.400452] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  345.742702] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  360.574654] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  374.383876] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  523.869265] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  554.220856] kernel: nouveau 0000:08:00.0: bus: MMIO write of 04048d1f FAULT at 6013d4 [ IBUS ]
[  655.092116] kernel: nouveau 0000:08:00.0: bus: MMIO write of 0000001f FAULT at 6013d4 [ IBUS ]


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.