Summary: | [NV34] [v3.14-rc1] nouveau: get 0x10000000 put 0x0000ed30 state 0xc0000000 (err: MEM_FAULT) push 0x00000000 | ||||||
---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Ronald <ronald645> | ||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||
Severity: | normal | ||||||
Priority: | medium | CC: | math.parent | ||||
Version: | git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
Btw, is http://cgit.freedesktop.org/nouveau/linux-2.6 not used anymore? I kind of liked it. Makes bisecting easier on aged machines. And it allows to test new patches more easily. Now it's all hidden between a huge changeset between v3.13 final and v3.14-rc1. And sometimes even spread out over several periods in time causing full rebuilds. Please notice the new CC (is that a +1?). Made a small typo. X starts, but not without problems. I saw rectangular parts of Opera alternate between correct display and black triangles. triangles->squares. (In reply to comment #1) > Btw, is http://cgit.freedesktop.org/nouveau/linux-2.6 not used anymore? It is... e.g. look at the drm-nouveau-next branch. I don't think the 'master' branch is really maintained anymore, this is happening in http://cgit.freedesktop.org/~darktama/nouveau/ which you may also be able to use to bisect without the rest of the kernel, if your range is sufficiently small. (Although that repo is a little different...) > > I kind of liked it. Makes bisecting easier on aged machines. And it allows > to test new patches more easily. > > Now it's all hidden between a huge changeset between v3.13 final and > v3.14-rc1. And sometimes even spread out over several periods in time > causing full rebuilds. I'm confused by what you're trying to say here. v3.14-rc1 and v3.13 are all available in linus's repo. Use that to bisect.... git bisect start v3.14-rc1 v3.13 -- drivers/gpu/drm/nouveau That should only look at changes in nouveau between those 2 revs... Hmm seems like a duplicate of the old bug you've reported - bug 71116. Can you update libdrm/libdrm-nouveau to 2.4.48 before doing any bisection ? A fresh dmesg may be useful as well. @Ilia Mirkin: Ah, yes the next repo seems promising. Thank you. About the 'old' master repo: Say, for example, stable was v3.8 and then this machine would happily use it. Every week I would pull from this master branch the patches for v3.9 based on the v3.8 tree. These patches are localized in the nouveau driver, so I have quick rebuilds and thus quick bisects. Problems are narrowed down easily and most importantly: fast. Using Linux his tree sometimes means doing a lot more bisects since a lot of changes come in at once after two months. Furthermore, it's not just the nouveau driver that changes but the entire tree. So bisecting would imply also doing full rebuilds. However, drm-nouveau-next seems to fill the gap which master left behind. @Emil Velikov: Does not seem like it: [gebruiker@delta linux]$ pacman -Qi libdrm Naam : libdrm Versie : 2.4.52-1 Beschrijving : Userspace interface to kernel DRM services Architectuur : i686 ArchLinux is quick with it's updates. Which is really nice. Ow well, on to the bisect then =) It's "drm/nv50-: map TTM_PL_SYSTEM through a BAR for CPU access" Despite it's name it touches generic stuff. I'm running v3.14-rc2 with this patch reverted -> no issue's yet. commit a554090664728384c94b027ba15bc7df87f9ac09 Author: Maarten Lankhorst <maarten.lankhorst@canonical.com> Date: Tue Nov 12 13:34:09 2013 +0100 drm/nv50-: map TTM_PL_SYSTEM through a BAR for CPU access Moves bo's to TTM_PL_TT for BAR mapping, to hide tiling from user. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index e4623e9..39ca36c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1241,6 +1241,7 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem) { struct ttm_mem_type_manager *man = &bdev->man[mem->mem_type]; struct nouveau_drm *drm = nouveau_bdev(bdev); + struct nouveau_mem *node = mem->mm_node; struct drm_device *dev = drm->dev; int ret; @@ -1263,14 +1264,16 @@ nouveau_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem) mem->bus.is_iomem = !dev->agp->cant_use_aperture; } #endif - break; + if (!node->memtype) + /* untiled */ + break; + /* fallthrough, tiled memory */ case TTM_PL_VRAM: mem->bus.offset = mem->start << PAGE_SHIFT; mem->bus.base = pci_resource_start(dev->pdev, 1); mem->bus.is_iomem = true; if (nv_device(drm->device)->card_type >= NV_50) { struct nouveau_bar *bar = nouveau_bar(drm->device); - struct nouveau_mem *node = mem->mm_node; ret = bar->umap(bar, node, NV_MEM_ACCESS_RW, &node->bar_vma); @@ -1306,6 +1309,7 @@ nouveau_ttm_fault_reserve_notify(struct ttm_buffer_object *bo) struct nouveau_bo *nvbo = nouveau_bo(bo); struct nouveau_device *device = nv_device(drm->device); u32 mappable = pci_resource_len(device->pdev, 1) >> PAGE_SHIFT; + int ret; /* as long as the bo isn't in vram, and isn't tiled, we've got * nothing to do here. @@ -1314,10 +1318,20 @@ nouveau_ttm_fault_reserve_notify(struct ttm_buffer_object *bo) if (nv_device(drm->device)->card_type < NV_50 || !nouveau_bo_tile_layout(nvbo)) return 0; + + if (bo->mem.mem_type == TTM_PL_SYSTEM) { + nouveau_bo_placement_set(nvbo, TTM_PL_TT, 0); + + ret = nouveau_bo_validate(nvbo, false, false); + if (ret) + return ret; + } + return 0; } /* make sure bo is in mappable vram */ - if (bo->mem.start + bo->mem.num_pages < mappable) + if (nv_device(drm->device)->card_type >= NV_50 || + bo->mem.start + bo->mem.num_pages < mappable) return 0; That code _really_ shouldn't affect anything pre-nv50... The only thing is in second hunk. node->memtype is only ever set to != 0 for nv50+, but... who knows. Can you add a WARN_ON_ONCE(node->memtype) in there? The third hunk only affects code for card_type >= NV_50... unless I'm reading something very wrong. Aha, I think I know what's going on. You're using AGP, which means you get to use the ttm_bo_manager_func. This in turn allocates nodes as drm_mm_node, which is *totally* different from nouveau_mem. (And that's what gets stored in mem->mm_node.) Can you change if (!node->memtype) /* untiled */ break; to if (nv_device(drm->device)->card_type < NV_50 || !node->memtype) break; and see if that helps? I actually semi-suspect that this is a giant issue with the TTM stuff the way we're using it, but perhaps all the other uses are guarded behind card < NV_50 logic as well, but that's not obvious to me. Yes, that fixes it. (In reply to comment #9) > Yes, that fixes it. Awesome! I sent a patch to the ML + cc'd you (I assume you got it) a few hours ago. Feel free to respond with a Tested-by. I'll close this bug when the patch hits mainline. Thanks for bisecting, and sorry that you're running into so many problems! No need to apologize. I'm learning a lot *and* get to contribute. I'll get back to you about the other 2 bugs. The fix should now be upstream, and will be included in the next 3.14-rc: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34d595081812da62b5357579267c4ab5eae64ac1 Thanks, I changed my gitconfig and I'm already pulling from drm-nouveau-next :) . |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 93524 [details] Full dmesg of crash Tried to boot a v3.14-rc1 kernel. Previous kernel was v3.13-rc8. I'm attaching the full dmesg, the only relevant lines were this: [ 2.357486] nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 0 [DRM] get 0x10000000 put 0x0000ed30 state 0xc0000000 (err: MEM_FAULT) push 0x00000000 [ 2.357764] Console: switching to colour frame buffer device 160x64 [ 2.606574] nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 0 [DRM] get 0x10000000 put 0x000091cc state 0xc0000000 (err: MEM_FAULT) push 0x00000000 [ 17.350008] nouveau E[ DRM] failed to idle channel 0xcccc0000 [DRM] [ 17.350011] nouveau E[ DRM] GPU lockup - switching to software fbcon It booted (nice). But the KMS was initially corrupted in the top section. The penguin was there, but stuff next to it was pinkish garbled. Once it switched from /dev/console to /dev/tty1 (I think that is what happens) the corruption is gone. X starts, but not without problems. I saw rectangular parts of Opera alternate between correct display and black triangles. Weird thing is, no more new errors popped up. Following list shows the suspects: git log --topo-order --oneline v3.13-rc8^...HEAD --no-merges -- drivers/gpu/drm/nouveau Makefile 38dbfb5 Linus 3.14-rc1 f3980dc drm/nouveau: resume display if any later suspend bits fail 09c3de1 drm/nouveau: fix lock unbalance in nouveau_crtc_page_flip d83ef85 drm/nouveau: implement hooks for needed for drm vblank timestamping support d2fa7d3 drm/nouveau/disp: add a method to fetch info needed by drm vblank timestamping eb2e968 drm/nv50: fill in crtc mode struct members from crtc_mode_fixup 1139ffb drm/nouveau: call drm_vblank_cleanup() earlier 2332b31 drm/nouveau: create base display from common code ea7dce9 drm/nv50/gr: print mpc trap name when it's not an mp trap f750ecc drm/nv50/gr: update list of mp errors, make it a bitfield e2dd003 drm/nv50/gr: add more trap names to print on error f87cd8b drm/nouveau/devinit: lock/unlock crtc regs for all devices, not just pre-nv50 d5c1e84 drm/nouveau: hold mutex while syncing to kernel channel 4019aaa drm/nv50-/devinit: prevent use of engines marked as disabled by hw/vbios f0d13e3 drm/nouveau/device: provide a way for devinit to mark engines as disabled cf33601 drm/nouveau/devinit: tidy up the subdev class definition 5222555 drm/nouveau/bar: tidy up the subdev and object class definitions ab60619 drm/nouveau/instmem: tidy up the object class definition 24a4ae8 drm/nouveau/instmem: tidy up the subdev class definition 64c672a drm/nouveau/pwr: implement a simple i2c stack 2e9dfe2 drm/nouveau/pwr: have rd/wr32 routines clobber data instead of addr 7321623 drm/nve0/fb: turn off some bits in 10f584 at init cb54dd2 drm/nve0/fb/gddr5: merge a fix from ddr3 for one of the timing settings b13d0e4 drm/nve0/fb/gddr5: yet another random 10f200 bit c814a60 drm/nvc0-/fb: hook up skeleton interrupt handler 7f39e59 drm/nve0/fb/gddr5: more 10f200 stuff 12642e3 drm/nve0/clk: report ddr memory frequency 1a894c0 drm/nouveau/fb/gddr5: make sure we update mr7 when we're supposed to a8ccbb7 drm/nve0/fb/gddr5: 10f698/69c cfe1760 drm/nve0/fb: it's now safe to obey the memory voltage setting properly 46bf1c3 drm/nve0/fb: multi-stage reclock is required for certain transitions 1789cab drm/nouveau/clk: allow fb to signal it needs to do a multi-stage reclock b655f2b drm/nve0/fb/gddr5: parse bios data into struct rather than using directly ea8b4a3 drm/nve0/fb/gddr5: found LP3 setting 971372e drm/nve0/fb: note the memory voltage toggle, not using it yet db6735c drm/nve0/fb/gddr5: somewhat better attempt at 100770/10f604/610/614 f4aa2c6 drm/nve0/fb/gddr5: fixup delays a bit 1522eca drm/nouveau/bios: timing 2.0 entries can have subentries 09692e5 drm/nve0/fb/gddr5: note another semi-unknown 1e1d6b4 drm/nouveau/fb/gddr5: modify mr8 with high bits of CL/WR e7084c6 drm/nve0/fb/gddr5: fix calculation of RDQS setting 334565a drm/nve0/fb/gddr5: switch off some other random bit at some point 0189169 drm/nve0/fb/gddr5: punt all 10f910/914 accesses through ram_train d394fb1 drm/nve0/fb/gddr5: not all memory partitions are created equal dd95c8f drm/nve0/fb: typo in register name 0a0dc8f drm/nouveau/bios: make common code to handle ramcfg strap etc 5905439 drm/nve0/fb/gddr5: fix an assumption of sane memory controller layout 2daaf5b drm/nve0/fb/gddr5: fix behaviour of lp3 setting cb1567c drm/nve0/fifo: recover from mmu faults on bar1/bar3 649ec92 drm/nve0/fifo: keep mmu fault interrupts enabled at all times e1b6b14 drm/nve0/fifo: update human-readable mmu fault descriptions e9fb980 drm/nve0/fifo: document more intr status bits 9f8459c drm/nve0/fifo: populate PBDMA status bitfield with more definitions 39b0554 drm/nve0/fifo: s/subfifo/PBDMA/ f82c44a drm/nve0/fifo: s/playlist/runlist/ f76dd80 drm/nvf0/gr: enable acceleration with our chsw ucode aa97cd3 drm/nv108/gr: enable acceleration with our chsw ucode 5d91e19 drm/nvc0-/gr: handle fwmthd interrupts in ucode e1b22bc drm/nvc0-/gr: fiddle some magic around strand init 96616b4 drm/nv108/gr: initial support (need external fuc) daa9ab5 drm/nv108/ce: enable copy engines a763951 drm/nv108/fifo: initial support a0f95f1 drm/nvf0/gr: remove a copy+pasto in ctx reglist 67af60f drm/nvc0-/gr: bring in some macros to abstract falcon isa differences 90d6db1 drm/nouveau/falcon: use vmalloc to create firwmare copies d96bf43 drm/nouveau/gem: remove (now) unneeded pre-validate fence sync cef9e99 drm/nouveau/ttm: explicitly wait for bo idle before memcpy buffer move 35b8141 drm/nouveau/ttm: explicity sync with kernel channel before moving buffer 3c57d85 drm/nouveau/ttm: tidy up creation of temporary buffer move vmas ab9b18a drm/nv04/plane: add support for nv04/nv05 video overlay 7ffb078 drm/nv10/plane: add YUYV support a554090 drm/nv50-: map TTM_PL_SYSTEM through a BAR for CPU access ce8f769 drm/nouveau: fix m2mf copy to tiled gart 2e2cfbe drm/nouveau/vm: reduce number of entry-points to vm_map() d0ce7b856 drm/nouveau: make vga_switcheroo code depend on VGA_SWITCHEROO 85b2331 drm: Kill DRM_*MEMORYBARRIER 1d6ac18 drm: Kill DRM_COPY_(TO|FROM)_USER bfd8303 drm: Kill DRM_HZ b072e53 ACPI / nouveau: replace open-coded _DSM code with helper functions 4988d0a nouveau / ACPI: fix memory leak in ACPI _DSM related code 8b48463 ACPI: Clean up inclusions of ACPI header files d8ec26d Linux 3.13 72de182 drm/nouveau/mxm: fix null deref on load fdd239a drm/nouveau: fix null ptr dereferences on some boards 7e22e91 Linux 3.13-rc8