Summary: | Cogs is slow | ||
---|---|---|---|
Product: | Mesa | Reporter: | Pavel Ondračka <pavel.ondracka> |
Component: | Mesa core | Assignee: | mesa-dev |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | niels_ole, ptpzz, sa, thejoe |
Version: | git | Keywords: | patch |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Pavel Ondračka
2011-08-26 11:07:54 UTC
(In reply to comment #0) > The game Cogs is very slow with the r300g (and r600g) driver. This was > originally mentioned in bug 39572, however that bug was about different problem > and is already closed. There are two experimental patches for r300g and r600g > in that bug, so I'm opening a new bug to keep track of them. Setting component > to mesa core because the problem seems to be in u_vbuf_manager. > I think it's not a mesa problem, it's a problem of the game which uses opengl in the way which is clearly described in the gl spec as the way to get reduced performance. This patch is a workaround for the game problem and it was written just to check that I located the problem correctly. All other (properly optimized) apps won't benefit from this patch, probably they'll become a bit slower due to additional overhead with this workaround. That's why I'm not sure that I want to submit such patches to mesa. The best solution is to fix the game itself, so I think it makes sense to ask the game developers about it (or to open the sources). (In reply to comment #1) > (In reply to comment #0) > > The game Cogs is very slow with the r300g (and r600g) driver. This was > > originally mentioned in bug 39572, however that bug was about different problem > > and is already closed. There are two experimental patches for r300g and r600g > > in that bug, so I'm opening a new bug to keep track of them. Setting component > > to mesa core because the problem seems to be in u_vbuf_manager. > > > > I think it's not a mesa problem, it's a problem of the game which uses opengl > in the way which is clearly described in the gl spec as the way to get reduced > performance. This patch is a workaround for the game problem and it was written > just to check that I located the problem correctly. All other (properly > optimized) apps won't benefit from this patch, probably they'll become a bit > slower due to additional overhead with this workaround. That's why I'm not sure > that I want to submit such patches to mesa. > > The best solution is to fix the game itself, so I think it makes sense to ask > the game developers about it (or to open the sources). OK, I've posted about this at Cogs forums: http://forum.lazy8studios.com/index.php?topic=532.0 we shall see what can Cogs developers say about this. According to Cogs developer: "The Cogs vertex data is already very heavily optimized into triangle strips using nVidia's algorithms to greatly reduce the amount of data that needs to be sent to the GPU. If modifying a driver can result in a 10x performance improvement, the problem is the original driver." I don't know what to think about this. Same here on Intel GMA 4500 and Mesa 7.12-devel (In reply to comment #3) > According to Cogs developer: "The Cogs vertex data is already very heavily > optimized into triangle strips using nVidia's algorithms to greatly reduce the > amount of data that needs to be sent to the GPU. If modifying a driver can > result in a 10x performance improvement, the problem is the original driver." > > I don't know what to think about this. When I was fixing original bug 39572, I downloaded and tried the demo on their site first (with wine). Running build of this game for another OS through additional layer (wine) results in excellent performance with the *original* driver. If without modifying the driver we can get a 10x performance improvement, I think the problem is not in the driver. In hope that the game developers are reading this: IIRR, when running through wine, vertex data is in the gl buffer objects, instead of client arrays, so that it's not needed to upload it with every draw call (and it's not needed to filter this data in the driver as these patches do). I'm not sure if it's wine's optimizations, or the game itself does it this way through d3d. Why not to do the same with opengl in the native linux build? This will make the game fast for *all* drivers, without the need for any patches or workarounds. IIRR the draw call which is so slow in the native linux build comes from Mesh::draw(). We usually just try to optimize drivers even for a totally inefficient OpenGL usage and get the most performance out of it no matter what. Some comments regarding the patches: 1) Can u_vbuf_mgr_draw_optimize be called inside u_vbuf_mgr_draw_begin instead of explicitly by a driver? 2) Can you directly write the new vertices into a buffer object instead of user memory, which must be uploaded to a buffer object at a later point anyway? 3) Could you please comment the code a bit more, e.g. how it interacts with the other parts of u_vbuf_mgr. 4) We could use the translate module to convert indexed vertices in several separate buffers into one non-indexed interleaved buffer. I think nv50 and nvc0 always do that for user buffers. I think it's an optimization we definitely want. (In reply to comment #6) > We usually just try to optimize drivers even for a totally inefficient OpenGL > usage and get the most performance out of it no matter what. > OK, I understand that mesa would benefit from such optimization, but taking into consideration the time which I'll need to spend on analyzing different ways of doing it, implementing and testing, and potential benefits for applications, it wasn't a top priority for me. These patches were intended for doing a quick test, handling just this single case. But if we want more universal solution, then I'm not sure that vbuf_mgr is a best place for this optimization - maybe it makes sense to do it somewhere else in the mesa, not on the driver level? > Some comments regarding the patches: > 1) Can u_vbuf_mgr_draw_optimize be called inside u_vbuf_mgr_draw_begin instead > of explicitly by a driver? IIRR I called it explicitly from the r600g because I wasn't sure initially if it's needed for (or will work with) r300g. It seems this call can be moved into the u_vbuf_mgr_draw_begin. > 2) Can you directly write the new vertices into a buffer object instead of user > memory, which must be uploaded to a buffer object at a later point anyway? I was thinking about it, but it seemed to be more complicated, and I thought that when we avoided the copy for 4000 vertices, avoiding another copy for 4 vertices doesn't change much for quick test. But it seems possible to avoid this copy and memory allocation too. > 3) Could you please comment the code a bit more, e.g. how it interacts with the > other parts of u_vbuf_mgr. It saves vertex buffer before replacing it with the newly created buffer, and sets boolean flag which means that it needs to restore vertex buffer and free allocated memory in the vbuf_mgr_draw_end, IIRR that's all interaction. (I don't remember now if it's really needed to save it, or maybe it's needed to save/restore even more data - e.g. index buffer too. I need to check the code to give the exact answer.) > 4) We could use the translate module to convert indexed vertices in several > separate buffers into one non-indexed interleaved buffer. I think nv50 and nvc0 > always do that for user buffers. I was looking into it, but it seemed to be the overkill for the simple test. I just wanted to copy vertex data from single interleaved buffer as is, without any format conversions and/or other modifications. But maybe it makes sense to use it if we want to have universal solution. > > I think it's an optimization we definitely want. Should I try to improve these patches for vbuf_mgr according to your comments? Or may be it's better to reimplement it in some other way and in some other place? (In reply to comment #7) > OK, I understand that mesa would benefit from such optimization, but taking > into consideration the time which I'll need to spend on analyzing different > ways of doing it, implementing and testing, and potential benefits for > applications, it wasn't a top priority for me. > > These patches were intended for doing a quick test, handling just this single > case. But if we want more universal solution, then I'm not sure that vbuf_mgr > is a best place for this optimization - maybe it makes sense to do it somewhere > else in the mesa, not on the driver level? I have been thinking about moving u_vbuf_mgr from the drivers to st/mesa or somewhere between drivers and state trackers, along with user buffer uploads. u_vbuf_mgr can be used by state trackers too, it's not limited to "just drivers". However we should consider that certain drivers can fetch vertices from user memory just fine. Those are softpipe, llvmpipe, software vertex processing on r300 swtcl chipsets (e.g. ATI Xpress and AMD 690 chipsets), i915, and pretty much anything that makes use of draw_context. Doing vertex uploads on those already-slow drivers would make them even slower, obviously. > IIRR I called it explicitly from the r600g because I wasn't sure initially if > it's needed for (or will work with) r300g. It seems this call can be moved into > the u_vbuf_mgr_draw_begin. Vertex uploads and fallbacks work exactly the same in both r300g and r600g. > > > 2) Can you directly write the new vertices into a buffer object instead of user > > memory, which must be uploaded to a buffer object at a later point anyway? > I was thinking about it, but it seemed to be more complicated, and I thought > that when we avoided the copy for 4000 vertices, avoiding another copy for 4 > vertices doesn't change much for quick test. But it seems possible to avoid > this copy and memory allocation too. > > > 3) Could you please comment the code a bit more, e.g. how it interacts with the > > other parts of u_vbuf_mgr. > It saves vertex buffer before replacing it with the newly created buffer, and > sets boolean flag which means that it needs to restore vertex buffer and free > allocated memory in the vbuf_mgr_draw_end, IIRR that's all interaction. (I > don't remember now if it's really needed to save it, or maybe it's needed to > save/restore even more data - e.g. index buffer too. I need to check the code > to give the exact answer.) The index buffer isn't changed, so it's ok. > > > 4) We could use the translate module to convert indexed vertices in several > > separate buffers into one non-indexed interleaved buffer. I think nv50 and nvc0 > > always do that for user buffers. > I was looking into it, but it seemed to be the overkill for the simple test. I > just wanted to copy vertex data from single interleaved buffer as is, without > any format conversions and/or other modifications. But maybe it makes sense to > use it if we want to have universal solution. > > > > > I think it's an optimization we definitely want. > Should I try to improve these patches for vbuf_mgr according to your comments? > Or may be it's better to reimplement it in some other way and in some other > place? Well, it's a completely different approach to uploading vertices. The translation of vertices into one non-indexed interleaved vertex buffer (as I was suggesting) could replace the rest of u_vbuf_mgr completely, i.e. uploads and format fallbacks. So there would have to be some code determining whether the rest of u_vbuf_mgr should be used (fine-grained uploads based on the min/max_index data) or the pure translate way (complete translation of all vertices into a non-indexed draw operation). I am for using the translate module and doing complete translation in the cases where it's worth it, and using the rest of u_vbuf_mgr otherwise. (In reply to comment #4) > Same here on Intel GMA 4500 and Mesa 7.12-devel BTW I'm not a developer, but I think only gallium drivers are using u_vbuf_mgr. So even if those patches could make it in they won't probably solve this problem for you. Maybe you should open a new bug against i915 or i965 driver? Or is a universal solution possible? (In reply to comment #9) > (In reply to comment #4) > > Same here on Intel GMA 4500 and Mesa 7.12-devel > BTW I'm not a developer, but I think only gallium drivers are using u_vbuf_mgr. > So even if those patches could make it in they won't probably solve this > problem for you. Maybe you should open a new bug against i915 or i965 driver? > Or is a universal solution possible? But the problem maybe is in the same component for the Intel drivers side. No graphical glitch just a few frams per second The problem with Linux Mesa Intel drivers is they support more features for example more OpenGL extensions than Windows Intel drivers but Windows drivers are bug and performance proof. (In reply to comment #10) > But the problem maybe is in the same component for the Intel drivers side. i965 is not using gallium, so no. (In reply to comment #11) > (In reply to comment #10) > > But the problem maybe is in the same component for the Intel drivers side. > i965 is not using gallium, so no. Sorry, I meant u_vbuf_manager equivalent component in the i965 driver not in gallium Fixed by ce44bae366ade59fb2dbdfbfe5a1ab8d24518a57 for radeons. You'll probably need separate bug reports for the other drivers if they have this issue. Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.