Summary: | can't start GDM when building mesa master branch with LTO | ||
---|---|---|---|
Product: | Mesa | Reporter: | Marcos Simental <mrkzmrkz> |
Component: | Drivers/Gallium/llvmpipe | Assignee: | Caio Marcelo de Oliveira Filho <caio.oliveira> |
Status: | RESOLVED FIXED | QA Contact: | mesa-dev |
Severity: | normal | ||
Priority: | medium | CC: | caio.oliveira, mrkzmrkz |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | systemd journalctl output showing core-dumps of gnome-session-check-accelerated-gles-helper and gnome-shell |
Description
Marcos Simental
2019-06-10 22:38:51 UTC
As the backtrace shows, scene=0x0, which shouldn't happen. The scene pointer is obtained in thread_function(), in: lp_rast_begin( rast, lp_scene_dequeue( rast->full_scenes, TRUE ) ); The lp_scene_dequeue function is not expected to return NULL, but does: struct lp_scene * lp_scene_dequeue(struct lp_scene_queue *queue, boolean wait) { struct scene_packet packet; enum pipe_error ret; packet.scene = NULL; ret = util_ringbuffer_dequeue(queue->ring, &packet.header, sizeof packet / 4, wait ); if (ret != PIPE_OK) return NULL; return packet.scene; } Possibility 1: there was an error. But if that were the case, we'd expect this to happen with and without LTO. Possibility 2: Undefined Behaviour in the code. Note how the assigns packet.scene = NULL, never passes it to util_ringbuffer_dequeue, then returns it. It passes packet.header and the size of packet: struct scene_packet { struct util_packet header; struct lp_scene *scene; }; And expects the callee fill it: for (i = 0; i < ring_packet->dwords; i++) { packet[i] = ring->buf[ring->tail]; ring->tail++; ring->tail &= ring->mask; } I'll look at the disassembly to confirm possibility 2. Disassembly: 0x00007ffff24cc389 <+489>: xor %edi,%edi 0x00007ffff24cc38b <+491>: movq $0x0,0x10(%r12) 0x00007ffff24cc394 <+500>: callq 0x7ffff24a5b00 <lp_scene_begin_rasterization(lp_scene*)> That's a constant null pointer being passed to lp_scene_begin_rasterization(). I believe this is a violation of strict aliasing. The compiler is correct. The code in llvmpipe is buggy. This patch appears to fix the problem: From 49607f0524539cb836065b626bb3d3946061c486 Mon Sep 17 00:00:00 2001 From: Thiago Macieira <thiago.macieira@intel.com> Date: Mon, 10 Jun 2019 19:13:12 -0700 Subject: [PATCH] Attempt at fixing strict aliasing violation in dequeueing packets lp_scene_dequeue() calls util_ringbuffer_dequeue() with a payload that isn't an array of struct util_packets. When util_ringbuffer_dequeue() made the direct copy, the compiler discarded because it wasn't writing the right type. Signed-off-by: Thiago Macieira <thiago.macieira@intel.com> --- src/gallium/auxiliary/util/u_ringbuffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/util/u_ringbuffer.c b/src/gallium/auxiliary/util/u_ringbuffer.c index f6bb910671e..cebb908410c 100644 --- a/src/gallium/auxiliary/util/u_ringbuffer.c +++ b/src/gallium/auxiliary/util/u_ringbuffer.c @@ -146,7 +146,7 @@ enum pipe_error util_ringbuffer_dequeue( struct util_ringbuffer *ring, /* Copy data from ring: */ for (i = 0; i < ring_packet->dwords; i++) { - packet[i] = ring->buf[ring->tail]; + memcpy(packet + i, ring->buf + ring->tail, sizeof(*packet)); ring->tail++; ring->tail &= ring->mask; } -- 2.22.0 (In reply to Thiago Macieira from comment #3) > This patch appears to fix the problem: > > From 49607f0524539cb836065b626bb3d3946061c486 Mon Sep 17 00:00:00 2001 > From: Thiago Macieira <thiago.macieira@intel.com> > Date: Mon, 10 Jun 2019 19:13:12 -0700 > Subject: [PATCH] Attempt at fixing strict aliasing violation in dequeueing > packets > > lp_scene_dequeue() calls util_ringbuffer_dequeue() with a payload that > isn't an array of struct util_packets. When util_ringbuffer_dequeue() > made the direct copy, the compiler discarded because it wasn't writing > the right type. > > Signed-off-by: Thiago Macieira <thiago.macieira@intel.com> > --- > src/gallium/auxiliary/util/u_ringbuffer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/auxiliary/util/u_ringbuffer.c > b/src/gallium/auxiliary/util/u_ringbuffer.c > index f6bb910671e..cebb908410c 100644 > --- a/src/gallium/auxiliary/util/u_ringbuffer.c > +++ b/src/gallium/auxiliary/util/u_ringbuffer.c > @@ -146,7 +146,7 @@ enum pipe_error util_ringbuffer_dequeue( struct > util_ringbuffer *ring, > /* Copy data from ring: > */ > for (i = 0; i < ring_packet->dwords; i++) { > - packet[i] = ring->buf[ring->tail]; > + memcpy(packet + i, ring->buf + ring->tail, sizeof(*packet)); > ring->tail++; > ring->tail &= ring->mask; > } > -- > 2.22.0 This patch works for me I had a conversation with Thiago, and we need the patch needs an update. I'll create an MR in GitLab with those. (In reply to Caio Marcelo de Oliveira Filho from comment #5) > I had a conversation with Thiago, and we need the patch needs an update. > I'll create an MR in GitLab with those. I meant: "We think the patch needs an update" That MR landed. Patch that fix the issue here is commit 397d1a18ef78ddf46efda44d6783105f9fd87f7e Author: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Date: Wed Jun 12 15:32:30 2019 -0700 llvmpipe: Don't use u_ringbuffer for lp_scene_queue Inline the ring buffer and signal logic into lp_scene_queue instead of using a u_ringbuffer. The code ends up simpler since there's no need to handle serializing data from / to packets. This fixes a crash when compiling Mesa with LTO, that happened because of util_ringbuffer_dequeue() was writing data after the "header packet", as shown below struct scene_packet { struct util_packet header; struct lp_scene *scene; }; /* Snippet of old lp_scene_deque(). */ packet.scene = NULL; ret = util_ringbuffer_dequeue(queue->ring, &packet.header, sizeof packet / 4, return packet.scene; but due to the way aliasing analysis work the compiler didn't considered the "&packet->header" to alias with "packet->scene". With the aggressive inlining done by LTO, this would end up always returning NULL instead of the content read by util_ringbuffer_dequeue(). Issue found by Marcos Simental and Thiago Macieira. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110884 Reviewed-by: Roland Scheidegger <sroland@vmware.com> Marcos, could you test master again? (In reply to Caio Marcelo de Oliveira Filho from comment #9) > Marcos, could you test master again? Yep, your patch also works for me. We can close this issue now I guess. Thanks! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.