When stressing the machine a bit (scrolling large amounts of icons, or running gtkperf usually does the trick) but also after some time of normal use, I inevitably get X to reset because of a hang in the DMA queue.
Jan 27 01:23:23 legolas kernel: [ 205.234985] [drm] PFIFO_DMA_PUSHER - Ch 1 put: 32, get:948 mthd 0x0a2c status 0x00001000
Jan 27 01:23:23 legolas kernel: [ 205.235011] [drm] PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Jan 27 01:23:23 legolas kernel: [ 205.235021] [drm] PGRAPH_ERROR - Ch 1/6 Class 0x009f Mthd 0x0a00 Data 0x00030303:0x00011229
Jan 27 01:23:23 legolas kernel: [ 205.235032] [drm] PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Jan 27 01:23:23 legolas kernel: [ 205.235040] [drm] PGRAPH_ERROR - Ch 1/6 Class 0x009f Mthd 0x0a04 Data 0x00030303
This then continues until method 0a2c. The PGRAPH_ERROR are not always there. The pusher interrupt is always happening. Sometimes, I also get
[drm] PFIFO_CACHE_ERROR - Ch 1/0 Mthd 0x0000 Data 0x6f000000 status 0x00000001
Xorg log attached.
Created attachment 13998 [details]
fifo dump of crash. I have many similar ones. I could not find a connection between them. But please ask if more are needed.
Created attachment 14002 [details]
list of methods before the dma lockup
This is a list of methods called by the X driver before it locked up. In this particular case I didn't have any output on whether BEGIN_RING==OUT_RING but in a previous crash log I outputted this info and it seemed to be ok (the only cases in which we have 1 out_ring more is when we hit nouveau_dma_wait)
bug #14287 looks identical to this one.
Created attachment 14214 [details] [review]
patch enabling pcigart and making sure cmdbuf goes there
DMA queue hangs only happen when the command buffer is in FB. Turning on
AGP on powerpc leads to lockups with DFS, even when using a PCI gart.
Therefore, current drm has agp_init disabled. This patch enables PCI
gart in the DDX, and makes sure the cmd buffer goes there.
patch was applied, fixed.