56461 – NV11 black screen & kernel hang on loading nouveaufb

Bug 56461 - NV11 black screen & kernel hang on loading nouveaufb

Summary: NV11 black screen & kernel hang on loading nouveaufb

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/nouveau (show other bugs)
Version:	git
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Nouveau Project
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-10-27 13:17 UTC by Chris Paulson-Ellis
Modified:	2013-04-18 18:29 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg output prior to modprobe nouveau (48.79 KB, text/plain) 2012-10-27 13:17 UTC, Chris Paulson-Ellis	no flags	Details
serial console output after modprobe nouveau - until kernel hang (3.56 KB, text/plain) 2012-10-27 13:19 UTC, Chris Paulson-Ellis	no flags	Details
dmesg output for working kernel (55.62 KB, text/plain) 2012-10-27 13:20 UTC, Chris Paulson-Ellis	no flags	Details
Video BIOS as produced by vbtracetool (56.00 KB, application/octet-stream) 2012-11-12 22:09 UTC, Chris Paulson-Ellis	no flags	Details
fix (554 bytes, patch) 2012-11-12 23:24 UTC, Marcin Slusarz	no flags	Details \| Splinter Review
console output of hang with patch (2.91 KB, text/plain) 2012-11-13 01:09 UTC, Chris Paulson-Ellis	no flags	Details
timer fix (830 bytes, patch) 2012-11-13 07:40 UTC, Marcin Slusarz	no flags	Details \| Splinter Review
timer fix, take two (654 bytes, patch) 2012-12-09 23:31 UTC, Emil Velikov	no flags	Details \| Splinter Review
Disable TV detection (583 bytes, patch) 2012-12-11 21:27 UTC, Emil Velikov	no flags	Details \| Splinter Review
console output for last working commit (4.91 KB, text/plain) 2012-12-12 00:08 UTC, Chris Paulson-Ellis	no flags	Details
console output for first hanging commit (3.65 KB, text/plain) 2012-12-12 00:09 UTC, Chris Paulson-Ellis	no flags	Details
dpms fix (520 bytes, patch) 2012-12-12 19:35 UTC, Marcin Slusarz	no flags	Details \| Splinter Review
fixes currently being applied to nouveau HEAD (1.32 KB, text/plain) 2012-12-13 22:05 UTC, Chris Paulson-Ellis	no flags	Details
console output with current HEAD + fixes (3.47 KB, text/plain) 2012-12-13 22:06 UTC, Chris Paulson-Ellis	no flags	Details
console output for working 70790f4f + dpms fix (7.27 KB, text/plain) 2012-12-18 21:54 UTC, Chris Paulson-Ellis	no flags	Details
console output for 'hsync' problem 70790f4f + 3bb076af + dpms fix (6.81 KB, text/plain) 2012-12-18 21:55 UTC, Chris Paulson-Ellis	no flags	Details
console output for fading to black cb75d97e + 3bb076af + dpms fix (6.14 KB, text/plain) 2012-12-18 21:56 UTC, Chris Paulson-Ellis	no flags	Details
console output for plain modprobe nouveau (3.58 KB, text/plain) 2012-12-27 13:16 UTC, Chris Paulson-Ellis	no flags	Details
console output for modprobe nouveau apgmode=0 (3.38 KB, text/plain) 2012-12-27 13:17 UTC, Chris Paulson-Ellis	no flags	Details
console output for modprobe nouveau config=DEVINIT=NvForcePost=1 (4.26 KB, text/plain) 2012-12-27 13:19 UTC, Chris Paulson-Ellis	no flags	Details
console output for modprobe nouveau apgmode=0 config=DEVINIT=NvForcePost=1 (3.52 KB, text/plain) 2012-12-27 13:20 UTC, Chris Paulson-Ellis	no flags	Details
console output for fading to black cb75d97e + 3bb076af + 92441b22 + i2c hacks + gart size fix (5.10 KB, text/plain) 2013-02-03 20:46 UTC, Chris Paulson-Ellis	no flags	Details
Show Obsolete (1) View All

Description Chris Paulson-Ellis 2012-10-27 13:17:59 UTC

Created attachment 69149 [details]
dmesg output prior to modprobe nouveau

My NV11 [GeForce2 Go] system hangs with a black screen when loading the nouveau framebuffer.

After fixing #42384 - many thanks - my system was working. It still works if I load kernel 3.2.7-1.fc16.i686. Later kernels, including one compiled from recent commit 214e40f of nouveau/master fail.

I managed to some console messages by booting with a serial console into VGA text mode with nouveau blacklisted, then modprobing nouveau manually. I'm attaching the dmesg output from before the modprobe and the console output from after the modprobe.

I'm also attaching the dmesg output (including the post-modprobe output) from the working 3.2 kernel.

I'm guessing it will be a memory allocation problem as my ancient system has a very large screen for a laptop - 1600x1200 and only 64MiB of VRAM.

Comment 1 Chris Paulson-Ellis 2012-10-27 13:19:07 UTC

Created attachment 69150 [details]
serial console output after modprobe nouveau - until kernel hang

Comment 2 Chris Paulson-Ellis 2012-10-27 13:20:07 UTC

Created attachment 69151 [details]
dmesg output for working kernel

Comment 3 Marcin Slusarz 2012-11-11 20:09:28 UTC

Please attach VBIOS image. See http://nouveau.freedesktop.org/wiki/DumpingVideoBios for description how to obtain it.

Comment 4 Chris Paulson-Ellis 2012-11-12 22:09:48 UTC

Created attachment 69965 [details]
Video BIOS as produced by vbtracetool

Video BIOS as requested - thank you for caring about my ancient hardware.
Chris.

Comment 5 Marcin Slusarz 2012-11-12 23:24:35 UTC

Created attachment 69969 [details] [review]
fix

This should fix it.

(Condition checking DCB entry sanity was accidentally inverted during rewrite.)

Comment 6 Chris Paulson-Ellis 2012-11-13 01:09:56 UTC

Created attachment 69973 [details]
console output of hang with patch

With that patch it's still hanging the kernel with a black screen. I've attached the serial console output from modprobe nouveau.

Comment 7 Marcin Slusarz 2012-11-13 07:40:13 UTC

Created attachment 69982 [details] [review]
timer fix

If this patch won't fix it, then I'm afraid you need to bisect this bug.

Comment 8 Chris Paulson-Ellis 2012-11-18 15:35:30 UTC

Okay, that was... interesting. I haven't quite managed to pin it down to a single commit, but here is what I've found out.

I did a manual bisect, only looking at commits that make changes in drivers/gpu/drm/nouveau. The area of interest had unrelated problems causing a NULL pointer dereference and a compile problem. I fixed this by cherry picking the following 2 commits each time I tested a new commit:

dea7e0ac  ttm: fix agp since ttm tt rework
095f979a  drm/nouveau/pm: fix build with HWMON off

With these changes, the last nouveau commit that works is:

d2edab4a  drm/nouveau/pm: fix missing volt changes when boot voltage is undefined

The next 3 nouveau commits all fail, but with a NULL pointer dereference in nouveau_hw_load_state() rather than with a black-screen hang:

2a44e499  drm/nouveau/disp: introduce proper init/fini, separate from create/destroy
cf41d53b  drm/nouveau: re-jig fbcon suspend/resume process a little
1772fcc6  drm/nv50/disp: fix evo for create/init + destroy/fini split

In the next nouveau commit, the NULL pointer dereference has gone and it fails with the black-screen hang:

f62b27db  drm/nouveau: shutdown display on suspend/hibernate

The stack trace for the 3 commits that oops rather than hang is as follows:

[   51.360006] Call Trace:
[   51.360006]  [<f0c78c69>] ? nv_crtc_restore+0x52/0x10f [nouveau]
[   51.360006]  [<f0cd3215>] ? ch7006_write+0x1f/0x50 [ch7006]
[   51.360006]  [<f0c7a623>] ? nv04_display_init+0x48/0x58 [nouveau]
[   51.360006]  [<f0c3214a>] ? nouveau_display_create+0x211/0x3f2 [nouveau]
[   51.360006]  [<f0c12b16>] ? nouveau_card_init+0x138a/0x1459 [nouveau]
[   51.360006]  [<f0c11744>] ? nouveau_stub_init+0x3/0x3 [nouveau]
[   51.360006]  [<f0c13092>] ? nouveau_load+0x3ce/0x6ad [nouveau]
[   51.360006]  [<f0ade3e0>] ? drm_get_pci_dev+0x13e/0x249 [drm]
[   51.360006]  [<c104580d>] ? __blocking_notifier_call_chain+0x47/0x4f
[   51.360006]  [<c113eb56>] ? pci_device_probe+0x47/0x68
[   51.360006]  [<c11ab567>] ? driver_probe_device+0x4a/0x13a
[   51.360006]  [<c113ea91>] ? pci_match_device+0x8b/0x99
[   51.360006]  [<c11ab6b9>] ? __driver_attach+0x62/0x64
[   51.360006]  [<c11ab657>] ? driver_probe_device+0x13a/0x13a
[   51.360006]  [<c11aaa38>] ? bus_for_each_dev+0x3f/0x63
[   51.360006]  [<c113eac2>] ? pci_dev_put+0xd/0xd
[   51.360006]  [<c11ab329>] ? driver_attach+0x19/0x1e
[   51.360006]  [<c11ab657>] ? driver_probe_device+0x13a/0x13a
[   51.360006]  [<c11ab0a8>] ? bus_add_driver+0x17d/0x24d
[   51.360006]  [<c113eac2>] ? pci_dev_put+0xd/0xd
[   51.360006]  [<c11ab8a9>] ? driver_register+0x57/0xec
[   51.360006]  [<f0ade51c>] ? drm_pci_init+0x31/0xe6 [drm]
[   51.360006]  [<c113ef22>] ? __pci_register_driver+0x31/0x8f
[   51.360006]  [<c1001027>] ? do_one_initcall+0x27/0x150
[   51.360006]  [<c108b650>] ? __vunmap+0xa0/0xd1
[   51.360006]  [<f0cb6000>] ? 0xf0cb5fff
[   51.360006]  [<c1054f97>] ? sys_init_module+0xccc/0x18f2
[   51.360006]  [<c109be00>] ? sys_close+0x66/0xa7
[   51.360006]  [<c127fb10>] ? sysenter_do_call+0x12/0x26
[   51.360006] Code: 00 28 60 00 f6 05 10 83 ca f0 10 0f 85 1e 12 00 00 8b 83 00 02 00 00 8b 50 18 01 ea 89 c8 e8 45 fd 50 d0 8b 83 78 01 00 00 31 c9 <8b> 2c b8 85 ed 0f 95 c1 83 ff 01 19 ed 81 e5 00 e0 ff ff 81 c5
[   51.360006] EIP: [<f0c227fc>] nouveau_hw_load_state+0x9a5/0x26d9 [nouveau] SS:ESP 0068:ee665cc4

Comment 9 Chris Paulson-Ellis 2012-12-09 22:53:49 UTC

Hi,

I only just realised that you sent me a second patch (I thought you were referring to the first patch, so I went and did the bisect as per my previous message). I've just tried the second patch against nouveau HEAD. Unfortunately it causes a NULL pointer dereference. I think this is because when this line in nv04_timer_init...

n = nouveau_hw_get_clock(((struct nouveau_drm *)nouveau_client(priv))->dev, PLL_CORE);

...calls nouveau_client() it gets NULL, because the parent of nv04_timer_priv is a nouveau_device, not a nouveau_drm. I can't see any way to get a pointer to the drm_device from nv04_timer_init, so I don't think calling nouveau_hw_get_clock from here is possible.

Chris.

Comment 10 Emil Velikov 2012-12-09 23:31:14 UTC

Created attachment 71248 [details] [review]
timer fix, take two

Sweet, can you the second version of the timer fix. Apply on top of the original patch 69969

Comment 11 Emil Velikov 2012-12-09 23:34:12 UTC

* if using latest git, only the patch mentioned in commit 10 should be necessary

Comment 12 Emil Velikov 2012-12-11 12:10:10 UTC

Comment on attachment 71248 [details] [review]
timer fix, take two

Appropriate solution here would be to use

n = nouveau_hw_get_clock(pci_get_drvdata(device->pdev), PLL_CORE);

Note that this will will lead to NULL dereff in nouveau_hw_get_pllvals(). The function uses nouveau_drm, which is created at a later stage

Unless someone has an another idea, consider bisecting the issue

Comment 13 Chris Paulson-Ellis 2012-12-11 19:59:06 UTC

With that modification, the PTIMER warning goes away, but unfortunately I still have a hung machine with a black screen (serial console output below).

I did the bisect and got it down to one of 4 nouveau commits - 3 that NULL deref, then 1 that hangs - see comment 8 above.

Chris.

# modprobe nouveau
[   94.851054] wmi: Mapper loaded
[   95.027154] [drm] Initialized drm 1.1.0 20060810
[   95.543293] nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x011200b2
[   95.553280] nouveau  [  DEVICE][0000:01:00.0] Chipset: NV11 (NV11)
[   95.559722] nouveau  [  DEVICE][0000:01:00.0] Family : NV10
[   95.567178] nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
[   95.641819] nouveau  [   VBIOS][0000:01:00.0] ... checksum invalid
[   95.648038] nouveau  [   VBIOS][0000:01:00.0] checking PROM for image...
[   95.655178] nouveau  [   VBIOS][0000:01:00.0] ... signature not found
[   95.661697] nouveau  [   VBIOS][0000:01:00.0] checking ACPI for image...
[   95.668473] nouveau  [   VBIOS][0000:01:00.0] ... signature not found
[   95.674986] nouveau  [   VBIOS][0000:01:00.0] checking PCIROM for image...
[   95.682106] nouveau  [   VBIOS][0000:01:00.0] ... checksum invalid
[   95.688415] nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
[   95.694932] nouveau  [   VBIOS][0000:01:00.0] BMP version 5.14
[   95.701085] nouveau  [   VBIOS][0000:01:00.0] version 03.11.01.44
[   95.708967] nouveau  [     PFB][0000:01:00.0] RAM type: DDR1
[   95.720366] nouveau  [     PFB][0000:01:00.0] RAM size: 16 MiB
[   95.726454] nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
[   95.737623] agpgart-intel 0000:00:00.0: AGP 2.0 bridge
[   95.742868] agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode
[   95.749841] nouveau 0000:01:00.0: putting AGP V2 device into 4x mode
[   95.756394] [TTM] Zone  kernel: Available graphics memory: 386962 kiB
[   95.762940] [TTM] Initializing pool allocator
[   95.767609] nouveau  [     DRM] VRAM: 15 MiB
[   95.772083] nouveau  [     DRM] GART: 64 MiB
[   95.776432] nouveau  [     DRM] BMP version 5.20
[   95.781128] nouveau  [     DRM] DCB version 1.5
[   95.785740] nouveau  [     DRM] DCB outp 00: f0003f00 000088b8
[   95.791655] nouveau  [     DRM] DCB outp 01: f2045f14 0000ffff
[   95.797570] nouveau  [     DRM] DCB outp 02: f4204011 ffffffff
[   95.803705] nouveau  [     DRM] BIOS FP mode: 1600x1200 (162000kHz pixel clock)
[   95.811877] nouveau  [     DRM] Saving VGA fonts
[   95.899625] nouveau  [     I2C][0000:01:00.0] detected TV encoder: ch7006
[   95.995616] ch7006 1-0075: Detected version ID: 50
[   96.039234] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   96.045928] [drm] No driver support for vblank timestamp query.
[   96.077240] nouveau  [     DRM] 0 available performance level(s)
[   96.083444] nouveau  [     DRM] c: core 17MHz memory 332MHz
[   96.091238] nouveau  [     DRM] MM: using M2MF for buffer copies
[   96.172664] nouveau  [     DRM] allocated 1600x1200 fb: 0x8000, bo ede66400
[   96.179814] fbcon: nouveaufb (fb0) is primary device

Comment 14 Emil Velikov 2012-12-11 21:27:46 UTC

Created attachment 71361 [details] [review]
Disable TV detection

This patch will remove the TV detection on your nv11.
Thus it will workaround the NULL dereffs and let you complete the bisect

Comment 15 Chris Paulson-Ellis 2012-12-12 00:07:18 UTC

It still NULL derefd with that patch - in the same place, but with a different call stack. However, after looking at where it was happening, I found another commit to cherry-pick that fixes it:

afada5e0  drm/nv04/disp: disable vblank interrupts when disabling display

The first of the commits that was previously NULL derefing is now black screen hanging instead, so I think I've found the problem commit - 2a44e499.

To recap...

The last working commit is d2edab4a with dea7e0ac & 095f979a cherry-picked.

The first hanging commit is 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry-picked.

I'm attaching the console output for these 2 runs.

Chris.

Comment 16 Chris Paulson-Ellis 2012-12-12 00:08:39 UTC

Created attachment 71364 [details]
console output for last working commit

Comment 17 Chris Paulson-Ellis 2012-12-12 00:09:21 UTC

Created attachment 71365 [details]
console output for first hanging commit

Comment 18 Emil Velikov 2012-12-12 12:23:24 UTC

The patch cleans up the create-> init || fini -> destroy paths

In the original code, nv04_display_init was only executed during resume
(disregard that it's executed twice)
* eng[i]->init()
* engine->display.init()

Using 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry picked
* try to suspend/resume after nouveau is loaded
* sprinkle some nv_info() in the nv04_dfp_restore() codepath to establish what exactly has caused the issue. I would assume that some encoder/connector may not have yet been completely setup

Comment 19 Marcin Slusarz 2012-12-12 19:35:06 UTC

Created attachment 71403 [details] [review]
dpms fix

This should fix it - now for real ;P

Comment 20 Chris Paulson-Ellis 2012-12-13 22:04:11 UTC

We're getting there!

Your fix worked - for a few minutes - when applied to the previously hanging 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry-picked.

When applied to the nouveau HEAD, Linux no longer hangs, but the display isn't driven properly - the LCD goes white, then fades to black.

I'll attach my current patch to nouveau HEAD and the corresponding console output.

Chris.

Comment 21 Chris Paulson-Ellis 2012-12-13 22:05:16 UTC

Created attachment 71471 [details]
fixes currently being applied to nouveau HEAD

Comment 22 Chris Paulson-Ellis 2012-12-13 22:06:01 UTC

Created attachment 71472 [details]
console output with current HEAD + fixes

Comment 23 Marcin Slusarz 2012-12-13 22:42:49 UTC

This is probably another bug (you are really lucky ;). I don't see anything obvious in your latest log, so you could bisect it again - now between 2a44e499 with nv04_dfp.c fix (and other cherry-picks when necessary) and nouveau HEAD.

Is timer/nv04.c change needed for 2a44e499 to light up your monitor?

Comment 24 Chris Paulson-Ellis 2012-12-15 14:55:55 UTC

(In reply to comment #23)
> This is probably another bug (you are really lucky ;).

I will bisect again. Don't hold your breath!

> Is timer/nv04.c change needed for 2a44e499 to light up your monitor?

No. I'm only using it when running the patched nouveau git HEAD. Even then I can leave it out without anything changing - It just suppresses this warning:
nouveau W[  PTIMER][0000:01:00.0] unknown input clock freq

When running the patched git HEAD... Although the LCD is not being driven, the driver isn't completely dead. I can start and stop X and the screen will go white, then fade to black each time I do it. This suggests that the backlight control & power management are okay, but the display timings are wrong. Looking at the working & broken console output, they both show the same pixel clock. Is there some way to get the driver to output more verbose display timing info (fbset -i shows all zeros in the timing line in both the working & non-working cases)?

Comment 25 Chris Paulson-Ellis 2012-12-16 17:50:34 UTC

I've been bisecting and it appears that there are at least 2 further regressions...

At one point in the bisection, the display goes from working to visible, but with what look like hsync problems. I can see all the lines on the display, but each line is shifting horizontally by about 16 pixels or so - giving a shifting, fuzzy picture.

I've pinned this problem down to the following commit:

486a45c2  drm/nouveau/i2c: do parsing of i2c-related vbios info in nouveau_i2c.c

There are no differences in the console output between this commit and its working parent commit.

Much later in the bisection, this hsync problem is replaced with the fade-to-black problem I described before. I haven't finished bisecting this yet, but will continue.

Chris.

Comment 26 Chris Paulson-Ellis 2012-12-16 21:38:12 UTC

I've completed the bisection. The good news is that the 'hsync' problem mentioned above, although present for a large part of the history, is fixed before the fade-to-black problem occurs. The bad news is the the problem commit is rather a large one:

cb75d97e  drm/nouveau: implement devinit subdev, and new init table parser

Comment 27 Marcin Slusarz 2012-12-16 22:32:22 UTC

Are there any differences in dmesg before and after this commit?
If the answer is "no", please attach at least one of them (both if "yes").

Comment 28 Marcin Slusarz 2012-12-17 17:11:16 UTC

BTW, did you remember to cherry-pick 3bb076af2ae571a48465972d5747175cec3564cd (upstream version of the fix from comment 5) during bisection?

Comment 29 Chris Paulson-Ellis 2012-12-18 21:52:57 UTC

I didn't include 3bb076af during this bisect on the basis that it made no difference when running nouveau HEAD. I have now retested the problem commit cb75d97e and it's parent 70790f4f, both with 3bb076af cherry-picked.

Things just get more complicated... Although the fade-to-black problem hasn't changed, including 3bb076af makes the previously working parent commit exhibit the 'hsync' problem I described before!

All tests include the dpms-fix patch from comment 19 to prevent the original hard-hang problem.

In summary:

1) 70790f4f + dpms-fix             -  works
2) 70790f4f + 3bb076af + dpms-fix  -  the 'hsync' problem reappears!
3) cb75d97e + dpms-fix             -  fades to black
4) cb75d97e + 3bb076af + dpms-fix  -  fades to black

There's quite a bit of difference in the console logs between 70790f4f & cb75d97e, but nothing obvious (to me). I'm attaching console logs for cases 1, 2 & 4.

Comment 30 Chris Paulson-Ellis 2012-12-18 21:54:11 UTC

Created attachment 71755 [details]
console output for working 70790f4f + dpms fix

Comment 31 Chris Paulson-Ellis 2012-12-18 21:55:34 UTC

Created attachment 71756 [details]
console output for 'hsync' problem 70790f4f + 3bb076af + dpms fix

Comment 32 Chris Paulson-Ellis 2012-12-18 21:56:33 UTC

Created attachment 71757 [details]
console output for fading to black cb75d97e + 3bb076af + dpms fix

Comment 33 Marcin Slusarz 2012-12-25 19:53:38 UTC

Does booting with nouveau.agpmode=0 and/or nouveau.config=DEVINIT=NvForcePost=1 change anything? (please test all 3 combinations)

Comment 34 Ondrej Zary 2012-12-26 16:02:57 UTC

I have the same problem on DELL Latitude C810 laptop. 3.2 is the latest working kernel, 3.3-rc1 does not work (LCD fades to white, then backlight turns off and machine hangs).

d2edab4acffb35a6e24259886d377774efd37e6e is the latest working version
(with dea7e0ac45fd28f90bbc38ff226d36a9f788efbf added to fix AGP crash)

2a44e4997c5fee8e1da1589ff57e0bd1c53f03ce is the first bad commit. It oopses in nv_load_state_ext (dev->vblank_enabled is NULL). With line 1024 in nouveau_hw.c commented out, the oops is gone and the fade&hang problem appears.

Comment 35 Ondrej Zary 2012-12-26 22:06:44 UTC

70790f4f (+dpms-fix) works for me too and cb75d97e (+dpms-fix) does not.

Does not work even with any combination of nouveau.agpmode=0 and nouveau.config=DEVINIT=NvForcePost=1

Comment 36 Chris Paulson-Ellis 2012-12-27 13:14:35 UTC

(In reply to comment #33)
> Does booting with nouveau.agpmode=0 and/or
> nouveau.config=DEVINIT=NvForcePost=1 change anything? (please test all 3
> combinations)

For these tests, I'm working with nouveau HEAD 73e5cf2d + dpms-fix.

Doing "modprobe nouveau" results in the fade-to-black problem as before - as do all the following tests.

Doing "modprobe nouveau apgmode=0" removes the console messages about AGP 4x mode and changes the GART aperture from 64 MiB to 128 MiB.

Doing "modprobe nouveau config=DEVINIT=NvForcePost=1" adds a console message about "running init tables", changes the VRAM size from 15 MiB to 31 MiB (!) and results in a never ending stream of PFIFO errors.

Doing "modprobe nouveau apgmode=0 config=DEVINIT=NvForcePost=1" has the expected console changes from the 2 tests above, but with a GPU lockup error instead of the PFIFO errors.

I'll attach console logs for the above tests.

Chris.

Comment 37 Chris Paulson-Ellis 2012-12-27 13:16:59 UTC

Created attachment 72178 [details]
console output for plain modprobe nouveau

Comment 38 Chris Paulson-Ellis 2012-12-27 13:17:56 UTC

Created attachment 72179 [details]
console output for modprobe nouveau apgmode=0

Comment 39 Chris Paulson-Ellis 2012-12-27 13:19:37 UTC

Created attachment 72180 [details]
console output for modprobe nouveau config=DEVINIT=NvForcePost=1

Comment 40 Chris Paulson-Ellis 2012-12-27 13:20:22 UTC

Created attachment 72181 [details]
console output for modprobe nouveau apgmode=0 config=DEVINIT=NvForcePost=1

Comment 41 Chris Paulson-Ellis 2013-01-07 01:53:46 UTC

Hi,

I've been looking at the 'hsync' problem I first mentioned in comment 25. This problem seems to be caused by any interaction with i2c bus 2 and was uncovered by commit 486a45c2, because it fixed a previous bug...

My VBIOS lists 3 i2c buses. Bus 0 is referenced by DCB entry 0 -  OUTPUT_ANALOGUE. Bus 1 is referenced by DCB entries 1 & 2 - OUTPUT_LVDS & OUTPUT_TV.

Prior to 486a45c2, bus 2 is never properly driven because nouveau_i2c_init() is called without there having been a call to read_dcb_i2c_entry() for index 2, so the bit-bang i2c adapter gets set up with 0 for the rd & wr variables (a later commit renames these to sense & drive). Surprisingly, the resulting incorrect CRTC register read/writes don't seem to trash the system and everything works okay.

Commit 486a45c2 fixes the bug by parsing all 3 i2c entries up-front, so bus 2 gets a bit-bang i2c adapter with sensible rd & wr values. However, any access to this bus seems to cause the display to show the 'hsync' problem. The only reference to the bus is from nouveau_temp_probe_i2c(), so I've worked around the problem with the following hack:

--- a/drivers/gpu/drm/nouveau/nouveau_temp.c
+++ b/drivers/gpu/drm/nouveau/nouveau_temp.c
@@ -287,11 +287,13 @@ static void
 nouveau_temp_probe_i2c(struct drm_device *dev)
 {
 	struct i2c_board_info info[] = {
+#if 0
 		{ I2C_BOARD_INFO("w83l785ts", 0x2d) },
 		{ I2C_BOARD_INFO("w83781d", 0x2d) },
 		{ I2C_BOARD_INFO("adt7473", 0x2e) },
 		{ I2C_BOARD_INFO("f75375", 0x2e) },
 		{ I2C_BOARD_INFO("lm99", 0x4c) },
+#endif
 		{ }
 	};

Re-adding any of the I2C_BOARD_INFO lines - I tried them 1 at a time - makes the problem come back. The simple 1 byte address test i2c transfer is enough to trigger the problem.

The above work around is enough to make my display work again for commit 486a45c2, but the problem comes back in a later commit when the i2c lines are reset in the bit-bang adapter init code. I've had to add the following hack to work around the problem again:

--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
@@ -327,9 +333,13 @@ nouveau_i2c_ctor(struct nouveau_object *parent, struct nouveau_object *engine,
 		i2c_set_adapdata(&port->adapter, i2c);
 
 		if (port->adapter.algo != &nouveau_i2c_aux_algo) {
+            if(i==2) {
+			nv_warn(i2c, "I2C%d: type %d index %x/%x - supressing scl/sda init\n", i, port->type, port->drive, port->sense);
+            } else {
 			nouveau_i2c_drive_scl(port, 0);
 			nouveau_i2c_drive_sda(port, 1);
 			nouveau_i2c_drive_scl(port, 1);
+            }
 
So it appears that even this simple attempt to reset the i2c lines on bus 2 is enough to destabilise the display.

Obviously, these 2 hacks will need to be replaced with something better, but I've no idea what - perhaps a board specific i2c bus blacklist!

None of this makes any difference to the fade-to-back problem introduced by commit cb75d97e, but I can now get its parent commit 70790f4f working. In summary, this is where I am:

70790f4f + 3bb076af + dpms-fix + i2c-2-hacks  -  works
cb75d97e + 3bb076af + dpms-fix + i2c-2-hacks  -  fades to black

Looking at the console output from these 2 tests, the only significant change seems to be a change of the AGP GART aperture from a sensible 64 MiB to 3712 MiB, which doesn't look right. The GART seem to be initialised later than before too.

Regards,
Chris.

Comment 42 Chris Paulson-Ellis 2013-02-03 20:44:37 UTC

I found the bug in commit cb75d97e that results in the incorrect GART aperture and fixed it with this patch:

--- a/drivers/gpu/drm/nouveau/nouveau_compat.c
+++ b/drivers/gpu/drm/nouveau/nouveau_compat.c
@@ -17,7 +17,7 @@ nvdrm_gart_init(struct drm_device *dev, u64 *base, u64 *size)
 	struct nouveau_drm *drm = nouveau_newpriv(dev);
 	if (drm->agp.stat == ENABLED) {
 		*base = drm->agp.base;
-		*size = drm->agp.base;
+		*size = drm->agp.size;
 		return 0;
 	}
 	return -ENODEV;

However, the display still fades to black. I now get an error that I didn't get with the parent commit:

PFIFO_DMA_PUSHER - Ch 0 Get 0x04000000 Put 0x00001088 State 0xc0000000 (err: MEM_FAULT) Push 0x00000000

This message appears at the end of enabling the LDVS output, so it's probably related (I'll attach the console log).

I'm not sure how to debug further. I'm wondering in particular how to trace the effect of various changes in commit cb75d97e, such as those made to run_digital_op_script()? Perhaps I need to trace all register read/writes during the devinit phase and compare to the parent commit? How would I do this?

Comment 43 Chris Paulson-Ellis 2013-02-03 20:46:39 UTC

Created attachment 74152 [details]
console output for fading to black cb75d97e + 3bb076af + 92441b22 + i2c hacks + gart size fix

Comment 44 Marcin Slusarz 2013-02-03 21:13:07 UTC

You can use mmiotrace to trace all register reads and writes and then parse it  with demmio (from envytools) to have names attached to registers.

http://nouveau.freedesktop.org/wiki/Development

Comment 45 Emil Velikov 2013-04-04 19:57:42 UTC

Can you confirm if the commit fixes the issue ?

commit f6853faa85793bf23b46787e4039824d275453c2
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Tue Feb 26 02:33:12 2013 +0100

    drm/nouveau: Fix typo in init_idx_addr_latched().
    
    Fixes script-based modesetting on some LVDS panels.

Comment 46 Chris Paulson-Ellis 2013-04-16 22:27:08 UTC

Hi.

Sorry for the delay, but good news... commit f6853faa does indeed fix the problem. It works when cherry picked on top of the commits & patches described before. I've also checked the HEAD of nouveau/master as of today (557f8126) and it is working with no patches required.

Hooray! My brain was hurting trying to make sense of mmio traces (when I got time to look at it at all).

OpenGL doesn't work at all, but this is not a surprise. My user space is ancient and I'm not sure if it's ever worked anyway.

Once again, thanks to all involved for caring about my ancient hardware.

Regards,
Chris.

Comment 47 Emil Velikov 2013-04-18 18:29:43 UTC

Glad to hear, that it's working :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.