Summary: | Kernel unaligned access at TPC[105d9fb4] nvkm_instobj_wr32+0x14/0x20 [nouveau] | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Kieron Gillespie <ciaran.gillespie> | ||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||
Severity: | normal | ||||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | unspecified | ||||||||||||||||
Hardware: | SPARC | ||||||||||||||||
OS: | Linux (All) | ||||||||||||||||
Whiteboard: | |||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||
Attachments: |
|
Description
Kieron Gillespie
2016-07-07 02:23:25 UTC
What GPU is this? Is Sparc64 a BE system? Are you using 4K pages? (if not, use 4K pages) So Sparc64 is a big endian archetecutre. The request to use 4K huge pages is not possible on Sarc64 as the smallest it supports is 8K (which is what my system is currently using.) The card is a GeForce FX5200 128MB DDR PCI The GPU I am currently testing is a bit of a fossil, but I had great success on this SPARC system in the past managed to get full hardware acceleration working, sometime early 2015. Hmmm... maybe with nv3x the 4K pages aren't such a hard requirement. Definitely people on PPC64 with 64K pages had trouble with nv4x though. But, if it worked before, it can work again. Since this isn't exactly *the* most common setup, you're going to have to do a bit more of the work. Try more kernels. Nouveau got a huge rewrite in kernel 4.3, try 4.2 maybe? That rewrite ended up breaking BE briefly, but I fixed it up again and it was working semi-recently on my FX5200 in a G5 (PPC64, also BE). iowrite32_native is used all over the place to write to the card's MMIO space in one of the BARs (can never remember which). The specific error seems to indicate that we did a wr32 on an instobj to a non-32-bit-aligned address. This would be very surprising. Please boot with nouveau.debug=trace and attach a full log of the result. (It should be large.) Also, please try several kernels, including both pre- and post-4.3 ones. Created attachment 124978 [details]
Message log while nouveau.debug=trace
So I enabled the debug trace and I let it run for sometime, roughly 15 minutes, though it looks like it didn't get terribly far. I am going to try and collected more information by having it run for several hours, but I figured I'd upload this in case it's at all useful.
(In reply to Kieron Gillespie from comment #4) > Created attachment 124978 [details] > Message log while nouveau.debug=trace > > So I enabled the debug trace and I let it run for sometime, roughly 15 > minutes, though it looks like it didn't get terribly far. I am going to try > and collected more information by having it run for several hours, but I > figured I'd upload this in case it's at all useful. Hm, something bad is going on. It's supposed to work much more gracefully. First off ... where are all the init messages from nouveau loading? Do you have a digital screen you can connect? It looks like something keeps trying to get the scanout position but can't (see the error returned by nv04_disp_scanoutpos), which in turn floods the logs. (In reply to Kieron Gillespie from comment #4) > Created attachment 124978 [details] > Message log while nouveau.debug=trace > > So I enabled the debug trace and I let it run for sometime, roughly 15 > minutes, though it looks like it didn't get terribly far. I am going to try > and collected more information by having it run for several hours, but I > figured I'd upload this in case it's at all useful. Also, looks like your Xorg is in a restart loop, perhaps logs from that could be interesting. So it is actually connected to a display, and I can get a console with nouveau, I'll try to get a better Xorg output, I think the driver is still having trouble auto-detecting the device. Also I am going to connect one of my serial cables so I can get a cleaner output, I think that messages is missing some of the very early boot messsages. Not sure but would like to rule it out. The constant restarting of Xorg is coming from the lightdm service. It's constantly trying over and over again. Created attachment 124979 [details]
messages forcing BusID in Xorg config
So this time I logged into the box remotely and stop the lightdm service I then ran "Xorg -config xorg.conf.broke -verbose 6" I'll also attach the Xorg.0.log and the config file.
The Xorg log almost makes it look like it is working, though all I am left with on the screen is a blank screen with a single non-blinking cursor in the top left corner of the monitor, alsmost like it switched to the virtual terminal but didn't actually clear the screen and didn't start to draw anything.
Created attachment 124980 [details]
Xorg.0.log using xorg.conf.broke
Created attachment 124981 [details]
xorg.conf.broke
Created attachment 124982 [details]
lspci output from the system in question
So I tried unplugging the monitor just to see what would happen, welp... Message from syslogd@celestia at Jul 9 22:55:21 ... kernel:[ 2211.347894] Kernel panic - not syncing: Irrecoverable deferred error trap. Message from syslogd@celestia at Jul 9 22:55:21 ... kernel:[ 2211.347894] Message from syslogd@celestia at Jul 9 22:55:21 ... kernel:[ 2213.461991] Press Stop-A (L1-A) to return to the boot prom Message from syslogd@celestia at Jul 9 22:55:21 ... kernel:[ 2213.534081] ---[ end Kernel panic - not syncing: Irrecoverable deferred error trap. Message from syslogd@celestia at Jul 9 22:55:21 ... kernel:[ 2213.534081] Message from syslogd@celestia at Jul 9 22:55:54 ... kernel:[ 2246.560543] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [Xorg:3774] Well now I know what happens! :P OK, well let's start small. One source of problems is that we have drivers/gpu/drm/nouveau/nouveau_bios.h:#define ROM16(x) le16_to_cpu(*(u16 *)&(x)) Which can only work on aligned pointers x, but it gets called with unaligned offsets in nouveau_bios.c Can you try changing that to #define ROM16(x) get_unaligned_le16(&(x)) I'm guessing that will help with the first group of unaligned traps. (In reply to Ilia Mirkin from comment #14) > OK, well let's start small. One source of problems is that we have > > drivers/gpu/drm/nouveau/nouveau_bios.h:#define ROM16(x) le16_to_cpu(*(u16 > *)&(x)) > > Which can only work on aligned pointers x, but it gets called with unaligned > offsets in nouveau_bios.c > > Can you try changing that to > > #define ROM16(x) get_unaligned_le16(&(x)) > > I'm guessing that will help with the first group of unaligned traps. Oh, and same treatment for ROM32 of course (and ROM64 while you're at it, but that never gets called from what I can tell). Created attachment 124983 [details]
panic_console.out
So now that I am logging directly from the seiral terminal I appear to be getting more information.
There are times when the system boots the nouveau driver it's self crashes. I was able to catch it this time.
(In reply to Ilia Mirkin from comment #15) > (In reply to Ilia Mirkin from comment #14) > > OK, well let's start small. One source of problems is that we have > > > > drivers/gpu/drm/nouveau/nouveau_bios.h:#define ROM16(x) le16_to_cpu(*(u16 > > *)&(x)) > > > > Which can only work on aligned pointers x, but it gets called with unaligned > > offsets in nouveau_bios.c > > > > Can you try changing that to > > > > #define ROM16(x) get_unaligned_le16(&(x)) > > > > I'm guessing that will help with the first group of unaligned traps. > > Oh, and same treatment for ROM32 of course (and ROM64 while you're at it, > but that never gets called from what I can tell). Alright I'll give that a shot and let you know, thanks for the help! I believe these two patches should be relevant to your situation: https://lists.freedesktop.org/archives/nouveau/2016-July/025683.html https://lists.freedesktop.org/archives/nouveau/2016-July/025688.html Whether it resolves anything ... who knows. Should at least get rid of all the unaligned access errors. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/273. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.