Created attachment 69149 [details] dmesg output prior to modprobe nouveau My NV11 [GeForce2 Go] system hangs with a black screen when loading the nouveau framebuffer. After fixing #42384 - many thanks - my system was working. It still works if I load kernel 3.2.7-1.fc16.i686. Later kernels, including one compiled from recent commit 214e40f of nouveau/master fail. I managed to some console messages by booting with a serial console into VGA text mode with nouveau blacklisted, then modprobing nouveau manually. I'm attaching the dmesg output from before the modprobe and the console output from after the modprobe. I'm also attaching the dmesg output (including the post-modprobe output) from the working 3.2 kernel. I'm guessing it will be a memory allocation problem as my ancient system has a very large screen for a laptop - 1600x1200 and only 64MiB of VRAM.
Created attachment 69150 [details] serial console output after modprobe nouveau - until kernel hang
Created attachment 69151 [details] dmesg output for working kernel
Please attach VBIOS image. See http://nouveau.freedesktop.org/wiki/DumpingVideoBios for description how to obtain it.
Created attachment 69965 [details] Video BIOS as produced by vbtracetool Video BIOS as requested - thank you for caring about my ancient hardware. Chris.
Created attachment 69969 [details] [review] fix This should fix it. (Condition checking DCB entry sanity was accidentally inverted during rewrite.)
Created attachment 69973 [details] console output of hang with patch With that patch it's still hanging the kernel with a black screen. I've attached the serial console output from modprobe nouveau.
Created attachment 69982 [details] [review] timer fix If this patch won't fix it, then I'm afraid you need to bisect this bug.
Okay, that was... interesting. I haven't quite managed to pin it down to a single commit, but here is what I've found out. I did a manual bisect, only looking at commits that make changes in drivers/gpu/drm/nouveau. The area of interest had unrelated problems causing a NULL pointer dereference and a compile problem. I fixed this by cherry picking the following 2 commits each time I tested a new commit: dea7e0ac ttm: fix agp since ttm tt rework 095f979a drm/nouveau/pm: fix build with HWMON off With these changes, the last nouveau commit that works is: d2edab4a drm/nouveau/pm: fix missing volt changes when boot voltage is undefined The next 3 nouveau commits all fail, but with a NULL pointer dereference in nouveau_hw_load_state() rather than with a black-screen hang: 2a44e499 drm/nouveau/disp: introduce proper init/fini, separate from create/destroy cf41d53b drm/nouveau: re-jig fbcon suspend/resume process a little 1772fcc6 drm/nv50/disp: fix evo for create/init + destroy/fini split In the next nouveau commit, the NULL pointer dereference has gone and it fails with the black-screen hang: f62b27db drm/nouveau: shutdown display on suspend/hibernate The stack trace for the 3 commits that oops rather than hang is as follows: [ 51.360006] Call Trace: [ 51.360006] [<f0c78c69>] ? nv_crtc_restore+0x52/0x10f [nouveau] [ 51.360006] [<f0cd3215>] ? ch7006_write+0x1f/0x50 [ch7006] [ 51.360006] [<f0c7a623>] ? nv04_display_init+0x48/0x58 [nouveau] [ 51.360006] [<f0c3214a>] ? nouveau_display_create+0x211/0x3f2 [nouveau] [ 51.360006] [<f0c12b16>] ? nouveau_card_init+0x138a/0x1459 [nouveau] [ 51.360006] [<f0c11744>] ? nouveau_stub_init+0x3/0x3 [nouveau] [ 51.360006] [<f0c13092>] ? nouveau_load+0x3ce/0x6ad [nouveau] [ 51.360006] [<f0ade3e0>] ? drm_get_pci_dev+0x13e/0x249 [drm] [ 51.360006] [<c104580d>] ? __blocking_notifier_call_chain+0x47/0x4f [ 51.360006] [<c113eb56>] ? pci_device_probe+0x47/0x68 [ 51.360006] [<c11ab567>] ? driver_probe_device+0x4a/0x13a [ 51.360006] [<c113ea91>] ? pci_match_device+0x8b/0x99 [ 51.360006] [<c11ab6b9>] ? __driver_attach+0x62/0x64 [ 51.360006] [<c11ab657>] ? driver_probe_device+0x13a/0x13a [ 51.360006] [<c11aaa38>] ? bus_for_each_dev+0x3f/0x63 [ 51.360006] [<c113eac2>] ? pci_dev_put+0xd/0xd [ 51.360006] [<c11ab329>] ? driver_attach+0x19/0x1e [ 51.360006] [<c11ab657>] ? driver_probe_device+0x13a/0x13a [ 51.360006] [<c11ab0a8>] ? bus_add_driver+0x17d/0x24d [ 51.360006] [<c113eac2>] ? pci_dev_put+0xd/0xd [ 51.360006] [<c11ab8a9>] ? driver_register+0x57/0xec [ 51.360006] [<f0ade51c>] ? drm_pci_init+0x31/0xe6 [drm] [ 51.360006] [<c113ef22>] ? __pci_register_driver+0x31/0x8f [ 51.360006] [<c1001027>] ? do_one_initcall+0x27/0x150 [ 51.360006] [<c108b650>] ? __vunmap+0xa0/0xd1 [ 51.360006] [<f0cb6000>] ? 0xf0cb5fff [ 51.360006] [<c1054f97>] ? sys_init_module+0xccc/0x18f2 [ 51.360006] [<c109be00>] ? sys_close+0x66/0xa7 [ 51.360006] [<c127fb10>] ? sysenter_do_call+0x12/0x26 [ 51.360006] Code: 00 28 60 00 f6 05 10 83 ca f0 10 0f 85 1e 12 00 00 8b 83 00 02 00 00 8b 50 18 01 ea 89 c8 e8 45 fd 50 d0 8b 83 78 01 00 00 31 c9 <8b> 2c b8 85 ed 0f 95 c1 83 ff 01 19 ed 81 e5 00 e0 ff ff 81 c5 [ 51.360006] EIP: [<f0c227fc>] nouveau_hw_load_state+0x9a5/0x26d9 [nouveau] SS:ESP 0068:ee665cc4
Hi, I only just realised that you sent me a second patch (I thought you were referring to the first patch, so I went and did the bisect as per my previous message). I've just tried the second patch against nouveau HEAD. Unfortunately it causes a NULL pointer dereference. I think this is because when this line in nv04_timer_init... n = nouveau_hw_get_clock(((struct nouveau_drm *)nouveau_client(priv))->dev, PLL_CORE); ...calls nouveau_client() it gets NULL, because the parent of nv04_timer_priv is a nouveau_device, not a nouveau_drm. I can't see any way to get a pointer to the drm_device from nv04_timer_init, so I don't think calling nouveau_hw_get_clock from here is possible. Chris.
Created attachment 71248 [details] [review] timer fix, take two Sweet, can you the second version of the timer fix. Apply on top of the original patch 69969
* if using latest git, only the patch mentioned in commit 10 should be necessary
Comment on attachment 71248 [details] [review] timer fix, take two Appropriate solution here would be to use n = nouveau_hw_get_clock(pci_get_drvdata(device->pdev), PLL_CORE); Note that this will will lead to NULL dereff in nouveau_hw_get_pllvals(). The function uses nouveau_drm, which is created at a later stage Unless someone has an another idea, consider bisecting the issue
With that modification, the PTIMER warning goes away, but unfortunately I still have a hung machine with a black screen (serial console output below). I did the bisect and got it down to one of 4 nouveau commits - 3 that NULL deref, then 1 that hangs - see comment 8 above. Chris. # modprobe nouveau [ 94.851054] wmi: Mapper loaded [ 95.027154] [drm] Initialized drm 1.1.0 20060810 [ 95.543293] nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x011200b2 [ 95.553280] nouveau [ DEVICE][0000:01:00.0] Chipset: NV11 (NV11) [ 95.559722] nouveau [ DEVICE][0000:01:00.0] Family : NV10 [ 95.567178] nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image... [ 95.641819] nouveau [ VBIOS][0000:01:00.0] ... checksum invalid [ 95.648038] nouveau [ VBIOS][0000:01:00.0] checking PROM for image... [ 95.655178] nouveau [ VBIOS][0000:01:00.0] ... signature not found [ 95.661697] nouveau [ VBIOS][0000:01:00.0] checking ACPI for image... [ 95.668473] nouveau [ VBIOS][0000:01:00.0] ... signature not found [ 95.674986] nouveau [ VBIOS][0000:01:00.0] checking PCIROM for image... [ 95.682106] nouveau [ VBIOS][0000:01:00.0] ... checksum invalid [ 95.688415] nouveau [ VBIOS][0000:01:00.0] using image from PRAMIN [ 95.694932] nouveau [ VBIOS][0000:01:00.0] BMP version 5.14 [ 95.701085] nouveau [ VBIOS][0000:01:00.0] version 03.11.01.44 [ 95.708967] nouveau [ PFB][0000:01:00.0] RAM type: DDR1 [ 95.720366] nouveau [ PFB][0000:01:00.0] RAM size: 16 MiB [ 95.726454] nouveau [ PFB][0000:01:00.0] ZCOMP: 0 tags [ 95.737623] agpgart-intel 0000:00:00.0: AGP 2.0 bridge [ 95.742868] agpgart-intel 0000:00:00.0: putting AGP V2 device into 4x mode [ 95.749841] nouveau 0000:01:00.0: putting AGP V2 device into 4x mode [ 95.756394] [TTM] Zone kernel: Available graphics memory: 386962 kiB [ 95.762940] [TTM] Initializing pool allocator [ 95.767609] nouveau [ DRM] VRAM: 15 MiB [ 95.772083] nouveau [ DRM] GART: 64 MiB [ 95.776432] nouveau [ DRM] BMP version 5.20 [ 95.781128] nouveau [ DRM] DCB version 1.5 [ 95.785740] nouveau [ DRM] DCB outp 00: f0003f00 000088b8 [ 95.791655] nouveau [ DRM] DCB outp 01: f2045f14 0000ffff [ 95.797570] nouveau [ DRM] DCB outp 02: f4204011 ffffffff [ 95.803705] nouveau [ DRM] BIOS FP mode: 1600x1200 (162000kHz pixel clock) [ 95.811877] nouveau [ DRM] Saving VGA fonts [ 95.899625] nouveau [ I2C][0000:01:00.0] detected TV encoder: ch7006 [ 95.995616] ch7006 1-0075: Detected version ID: 50 [ 96.039234] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [ 96.045928] [drm] No driver support for vblank timestamp query. [ 96.077240] nouveau [ DRM] 0 available performance level(s) [ 96.083444] nouveau [ DRM] c: core 17MHz memory 332MHz [ 96.091238] nouveau [ DRM] MM: using M2MF for buffer copies [ 96.172664] nouveau [ DRM] allocated 1600x1200 fb: 0x8000, bo ede66400 [ 96.179814] fbcon: nouveaufb (fb0) is primary device
Created attachment 71361 [details] [review] Disable TV detection This patch will remove the TV detection on your nv11. Thus it will workaround the NULL dereffs and let you complete the bisect
It still NULL derefd with that patch - in the same place, but with a different call stack. However, after looking at where it was happening, I found another commit to cherry-pick that fixes it: afada5e0 drm/nv04/disp: disable vblank interrupts when disabling display The first of the commits that was previously NULL derefing is now black screen hanging instead, so I think I've found the problem commit - 2a44e499. To recap... The last working commit is d2edab4a with dea7e0ac & 095f979a cherry-picked. The first hanging commit is 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry-picked. I'm attaching the console output for these 2 runs. Chris.
Created attachment 71364 [details] console output for last working commit
Created attachment 71365 [details] console output for first hanging commit
The patch cleans up the create-> init || fini -> destroy paths In the original code, nv04_display_init was only executed during resume (disregard that it's executed twice) * eng[i]->init() * engine->display.init() Using 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry picked * try to suspend/resume after nouveau is loaded * sprinkle some nv_info() in the nv04_dfp_restore() codepath to establish what exactly has caused the issue. I would assume that some encoder/connector may not have yet been completely setup
Created attachment 71403 [details] [review] dpms fix This should fix it - now for real ;P
We're getting there! Your fix worked - for a few minutes - when applied to the previously hanging 2a44e499 with dea7e0ac, 095f979a & afada5e0 cherry-picked. When applied to the nouveau HEAD, Linux no longer hangs, but the display isn't driven properly - the LCD goes white, then fades to black. I'll attach my current patch to nouveau HEAD and the corresponding console output. Chris.
Created attachment 71471 [details] fixes currently being applied to nouveau HEAD
Created attachment 71472 [details] console output with current HEAD + fixes
This is probably another bug (you are really lucky ;). I don't see anything obvious in your latest log, so you could bisect it again - now between 2a44e499 with nv04_dfp.c fix (and other cherry-picks when necessary) and nouveau HEAD. Is timer/nv04.c change needed for 2a44e499 to light up your monitor?
(In reply to comment #23) > This is probably another bug (you are really lucky ;). I will bisect again. Don't hold your breath! > Is timer/nv04.c change needed for 2a44e499 to light up your monitor? No. I'm only using it when running the patched nouveau git HEAD. Even then I can leave it out without anything changing - It just suppresses this warning: nouveau W[ PTIMER][0000:01:00.0] unknown input clock freq When running the patched git HEAD... Although the LCD is not being driven, the driver isn't completely dead. I can start and stop X and the screen will go white, then fade to black each time I do it. This suggests that the backlight control & power management are okay, but the display timings are wrong. Looking at the working & broken console output, they both show the same pixel clock. Is there some way to get the driver to output more verbose display timing info (fbset -i shows all zeros in the timing line in both the working & non-working cases)?
I've been bisecting and it appears that there are at least 2 further regressions... At one point in the bisection, the display goes from working to visible, but with what look like hsync problems. I can see all the lines on the display, but each line is shifting horizontally by about 16 pixels or so - giving a shifting, fuzzy picture. I've pinned this problem down to the following commit: 486a45c2 drm/nouveau/i2c: do parsing of i2c-related vbios info in nouveau_i2c.c There are no differences in the console output between this commit and its working parent commit. Much later in the bisection, this hsync problem is replaced with the fade-to-black problem I described before. I haven't finished bisecting this yet, but will continue. Chris.
I've completed the bisection. The good news is that the 'hsync' problem mentioned above, although present for a large part of the history, is fixed before the fade-to-black problem occurs. The bad news is the the problem commit is rather a large one: cb75d97e drm/nouveau: implement devinit subdev, and new init table parser
Are there any differences in dmesg before and after this commit? If the answer is "no", please attach at least one of them (both if "yes").
BTW, did you remember to cherry-pick 3bb076af2ae571a48465972d5747175cec3564cd (upstream version of the fix from comment 5) during bisection?
I didn't include 3bb076af during this bisect on the basis that it made no difference when running nouveau HEAD. I have now retested the problem commit cb75d97e and it's parent 70790f4f, both with 3bb076af cherry-picked. Things just get more complicated... Although the fade-to-black problem hasn't changed, including 3bb076af makes the previously working parent commit exhibit the 'hsync' problem I described before! All tests include the dpms-fix patch from comment 19 to prevent the original hard-hang problem. In summary: 1) 70790f4f + dpms-fix - works 2) 70790f4f + 3bb076af + dpms-fix - the 'hsync' problem reappears! 3) cb75d97e + dpms-fix - fades to black 4) cb75d97e + 3bb076af + dpms-fix - fades to black There's quite a bit of difference in the console logs between 70790f4f & cb75d97e, but nothing obvious (to me). I'm attaching console logs for cases 1, 2 & 4.
Created attachment 71755 [details] console output for working 70790f4f + dpms fix
Created attachment 71756 [details] console output for 'hsync' problem 70790f4f + 3bb076af + dpms fix
Created attachment 71757 [details] console output for fading to black cb75d97e + 3bb076af + dpms fix
Does booting with nouveau.agpmode=0 and/or nouveau.config=DEVINIT=NvForcePost=1 change anything? (please test all 3 combinations)
I have the same problem on DELL Latitude C810 laptop. 3.2 is the latest working kernel, 3.3-rc1 does not work (LCD fades to white, then backlight turns off and machine hangs). d2edab4acffb35a6e24259886d377774efd37e6e is the latest working version (with dea7e0ac45fd28f90bbc38ff226d36a9f788efbf added to fix AGP crash) 2a44e4997c5fee8e1da1589ff57e0bd1c53f03ce is the first bad commit. It oopses in nv_load_state_ext (dev->vblank_enabled is NULL). With line 1024 in nouveau_hw.c commented out, the oops is gone and the fade&hang problem appears.
70790f4f (+dpms-fix) works for me too and cb75d97e (+dpms-fix) does not. Does not work even with any combination of nouveau.agpmode=0 and nouveau.config=DEVINIT=NvForcePost=1
(In reply to comment #33) > Does booting with nouveau.agpmode=0 and/or > nouveau.config=DEVINIT=NvForcePost=1 change anything? (please test all 3 > combinations) For these tests, I'm working with nouveau HEAD 73e5cf2d + dpms-fix. Doing "modprobe nouveau" results in the fade-to-black problem as before - as do all the following tests. Doing "modprobe nouveau apgmode=0" removes the console messages about AGP 4x mode and changes the GART aperture from 64 MiB to 128 MiB. Doing "modprobe nouveau config=DEVINIT=NvForcePost=1" adds a console message about "running init tables", changes the VRAM size from 15 MiB to 31 MiB (!) and results in a never ending stream of PFIFO errors. Doing "modprobe nouveau apgmode=0 config=DEVINIT=NvForcePost=1" has the expected console changes from the 2 tests above, but with a GPU lockup error instead of the PFIFO errors. I'll attach console logs for the above tests. Chris.
Created attachment 72178 [details] console output for plain modprobe nouveau
Created attachment 72179 [details] console output for modprobe nouveau apgmode=0
Created attachment 72180 [details] console output for modprobe nouveau config=DEVINIT=NvForcePost=1
Created attachment 72181 [details] console output for modprobe nouveau apgmode=0 config=DEVINIT=NvForcePost=1
Hi, I've been looking at the 'hsync' problem I first mentioned in comment 25. This problem seems to be caused by any interaction with i2c bus 2 and was uncovered by commit 486a45c2, because it fixed a previous bug... My VBIOS lists 3 i2c buses. Bus 0 is referenced by DCB entry 0 - OUTPUT_ANALOGUE. Bus 1 is referenced by DCB entries 1 & 2 - OUTPUT_LVDS & OUTPUT_TV. Prior to 486a45c2, bus 2 is never properly driven because nouveau_i2c_init() is called without there having been a call to read_dcb_i2c_entry() for index 2, so the bit-bang i2c adapter gets set up with 0 for the rd & wr variables (a later commit renames these to sense & drive). Surprisingly, the resulting incorrect CRTC register read/writes don't seem to trash the system and everything works okay. Commit 486a45c2 fixes the bug by parsing all 3 i2c entries up-front, so bus 2 gets a bit-bang i2c adapter with sensible rd & wr values. However, any access to this bus seems to cause the display to show the 'hsync' problem. The only reference to the bus is from nouveau_temp_probe_i2c(), so I've worked around the problem with the following hack: --- a/drivers/gpu/drm/nouveau/nouveau_temp.c +++ b/drivers/gpu/drm/nouveau/nouveau_temp.c @@ -287,11 +287,13 @@ static void nouveau_temp_probe_i2c(struct drm_device *dev) { struct i2c_board_info info[] = { +#if 0 { I2C_BOARD_INFO("w83l785ts", 0x2d) }, { I2C_BOARD_INFO("w83781d", 0x2d) }, { I2C_BOARD_INFO("adt7473", 0x2e) }, { I2C_BOARD_INFO("f75375", 0x2e) }, { I2C_BOARD_INFO("lm99", 0x4c) }, +#endif { } }; Re-adding any of the I2C_BOARD_INFO lines - I tried them 1 at a time - makes the problem come back. The simple 1 byte address test i2c transfer is enough to trigger the problem. The above work around is enough to make my display work again for commit 486a45c2, but the problem comes back in a later commit when the i2c lines are reset in the bit-bang adapter init code. I've had to add the following hack to work around the problem again: --- a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c @@ -327,9 +333,13 @@ nouveau_i2c_ctor(struct nouveau_object *parent, struct nouveau_object *engine, i2c_set_adapdata(&port->adapter, i2c); if (port->adapter.algo != &nouveau_i2c_aux_algo) { + if(i==2) { + nv_warn(i2c, "I2C%d: type %d index %x/%x - supressing scl/sda init\n", i, port->type, port->drive, port->sense); + } else { nouveau_i2c_drive_scl(port, 0); nouveau_i2c_drive_sda(port, 1); nouveau_i2c_drive_scl(port, 1); + } So it appears that even this simple attempt to reset the i2c lines on bus 2 is enough to destabilise the display. Obviously, these 2 hacks will need to be replaced with something better, but I've no idea what - perhaps a board specific i2c bus blacklist! None of this makes any difference to the fade-to-back problem introduced by commit cb75d97e, but I can now get its parent commit 70790f4f working. In summary, this is where I am: 70790f4f + 3bb076af + dpms-fix + i2c-2-hacks - works cb75d97e + 3bb076af + dpms-fix + i2c-2-hacks - fades to black Looking at the console output from these 2 tests, the only significant change seems to be a change of the AGP GART aperture from a sensible 64 MiB to 3712 MiB, which doesn't look right. The GART seem to be initialised later than before too. Regards, Chris.
I found the bug in commit cb75d97e that results in the incorrect GART aperture and fixed it with this patch: --- a/drivers/gpu/drm/nouveau/nouveau_compat.c +++ b/drivers/gpu/drm/nouveau/nouveau_compat.c @@ -17,7 +17,7 @@ nvdrm_gart_init(struct drm_device *dev, u64 *base, u64 *size) struct nouveau_drm *drm = nouveau_newpriv(dev); if (drm->agp.stat == ENABLED) { *base = drm->agp.base; - *size = drm->agp.base; + *size = drm->agp.size; return 0; } return -ENODEV; However, the display still fades to black. I now get an error that I didn't get with the parent commit: PFIFO_DMA_PUSHER - Ch 0 Get 0x04000000 Put 0x00001088 State 0xc0000000 (err: MEM_FAULT) Push 0x00000000 This message appears at the end of enabling the LDVS output, so it's probably related (I'll attach the console log). I'm not sure how to debug further. I'm wondering in particular how to trace the effect of various changes in commit cb75d97e, such as those made to run_digital_op_script()? Perhaps I need to trace all register read/writes during the devinit phase and compare to the parent commit? How would I do this?
Created attachment 74152 [details] console output for fading to black cb75d97e + 3bb076af + 92441b22 + i2c hacks + gart size fix
You can use mmiotrace to trace all register reads and writes and then parse it with demmio (from envytools) to have names attached to registers. http://nouveau.freedesktop.org/wiki/Development
Can you confirm if the commit fixes the issue ? commit f6853faa85793bf23b46787e4039824d275453c2 Author: Francisco Jerez <currojerez@riseup.net> Date: Tue Feb 26 02:33:12 2013 +0100 drm/nouveau: Fix typo in init_idx_addr_latched(). Fixes script-based modesetting on some LVDS panels.
Hi. Sorry for the delay, but good news... commit f6853faa does indeed fix the problem. It works when cherry picked on top of the commits & patches described before. I've also checked the HEAD of nouveau/master as of today (557f8126) and it is working with no patches required. Hooray! My brain was hurting trying to make sense of mmio traces (when I got time to look at it at all). OpenGL doesn't work at all, but this is not a surprise. My user space is ancient and I'm not sure if it's ever worked anyway. Once again, thanks to all involved for caring about my ancient hardware. Regards, Chris.
Glad to hear, that it's working :)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.