Bug 55825

Summary: [Bisected i965]Oglc max_values(advanced.fragmentProgram.GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB) causes OOM-killer
Product: Mesa Reporter: lu hua <huax.lu>
Component: Drivers/DRI/i965Assignee: Eric Anholt <eric>
Status: VERIFIED FIXED QA Contact:
Severity: major    
Priority: medium CC: idr, pavel.ondracka, xunx.fang
Version: git   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg

Description lu hua 2012-10-10 07:43:36 UTC
Created attachment 68387 [details]
dmesg

System Environment:
--------------------------
Arch:             x86_64
Platform:         Ironlake
Libdrm:(master)libdrm-2.4.39-16-g14db948127e549ea9234e02d8e112de3871f8f9f
Xserver:(master)xorg-server-1.13.0-49-g0a75bd640b3dc26b89d9e342999a7f4b7e98edbf
Xf86_video_intel:(master)2.20.9-43-gfb5205a86da09b344dbc20598655e917c263125c
Libva:(staging)86484495155e65fd8ac33ed3ede43fb42defd966
Libva_intel_driver:(staging)f557dd6ad06c31bcf787468e804c948ecc4cf39b
Kernel:	(drm-intel-nightly) 7735f5cccb4a57c5da5554f3e005bbd5f7325f40
Mesa: (master)87a34131c427b40a561cfef1513b446a0eeabc39

Bug detailed description:
-------------------------
It happens on ironlake, sandybridge and ivybridge with mesa master branch.It doesn't happen on 9.0 branch.
Case 'max_values(advanced.fragmentProgram.GL_MAX_PROGRAM_INSTRUCTIONS_ARB)' also has this issue on ironlake.

Bisect shows:97615b2d8c7c3cea6fd3a43bcb1739a96e2046c4 is the first bad commit.
commit 97615b2d8c7c3cea6fd3a43bcb1739a96e2046c4
Author:     Eric Anholt <eric@anholt.net>
AuthorDate: Mon Aug 27 14:35:01 2012 -0700
Commit:     Eric Anholt <eric@anholt.net>
CommitDate: Mon Oct 8 08:50:27 2012 -0700

    i965: Replace brw_wm_* with dumping code into the fs_visitor.

    This makes a giant pile of code newly dead.  It also fixes TXB on newer
    chipsets, which has been totally broken (I now have a piglit test for that).
    It passes the same set of Ian's ARB_fragment_program tests.  It also improves
    high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better
    optimization and having 8-wide along with 16-wide shaders.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>


Visual Report:
ID      |ACCELERA|DB      |REND_T  |SURF_T  |C_BUF_T |BUF_S   |RED_S   |
     131|       1|       1|      gl|  wipbpx|    rgba|      32|       8|

GREEN_S |BLUE_S  |ALPHA_S |DEPTH_S |STENC_S |ACCUM_S |SPL_BUF |SAMPLES |
       8|       8|       8|      24|       8|      64|       0|       0|

SRGB    |TEX_RGB |TEX_RGBA|CAVEAT  |SWAP    |M_PBUF_W|M_PBUF_H|M_PBUF_P
      -1|       0|       0|    slow|   undef|       0|       0|       0

OpenGL Report.
    Vendor - 'Intel Open Source Technology Center'
    Renderer - 'Mesa DRI Intel(R) Ironlake Desktop '
    Version - '2.1 Mesa 9.1-devel (git-fd32199)'
    GLSL Version - '1.20'

>> MaxValues (max_values)  test:
--> 2.1.1 - advanced.fragmentProgram.GL_MAX_PROGRAM_INSTRUCTIONS_ARB subcase:
Killed

Reproduce steps:
----------------------------
1. xinit
2. ./oglconform -z -suite all -v 2 -D 131 -test max_values \ advanced.fragmentProgram.GL_MAX_PROGRAM_INSTRUCTIONS_ARB
Comment 1 Eric Anholt 2012-10-16 00:33:29 UTC
This will probably be WONTFIX, but I'll leave it open for a while in case I find something easy to trim.
Comment 2 Ian Romanick 2012-10-17 01:00:01 UTC
Was this fixed by 014aaa9?

commit 014aaa97d3d7f78629e6e030953be0e9fb7f33dd
Author: Eric Anholt <eric@anholt.net>
Date:   Fri Sep 21 16:04:52 2012 +0200

    i965: Reduce maximum GL_ARB_fragment_program instruction count to 1024.
    
    I don't know of any programs that would need more than this.  The larger
    programs I've seen have neared 100 instructions.  This prevent excessive
    runtimes of automatic tests that attempt to test up to the exposed maximums
    (like fp-long-alu).
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 3 lu hua 2012-10-18 05:35:16 UTC
It still happens on commit: 017c6fb324194ba1c2e15fbee2f85a2fd8f140c4.
Comment 4 Eric Anholt 2012-11-12 22:25:01 UTC
massif output:

 35 126,618,636,882    5,441,331,072    5,438,065,485     3,265,587            0
99.94% (5,438,065,485B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->99.52% (5,415,162,158B) 0x8AD7BE6: ralloc_size (ralloc.c:109)
| ->79.01% (4,299,424,756B) 0x8A5A03C: ra_alloc_interference_graph (register_allocate.c:329)
| | ->79.01% (4,299,424,756B) 0x865EC78: fs_visitor::assign_regs() (brw_fs_reg_allocate.cpp:410)
| | | ->79.01% (4,299,424,756B) 0x8647796: fs_visitor::run() (brw_fs.cpp:1951)
Comment 5 Pavel Ondračka 2013-02-16 16:25:48 UTC
Commit 97615b2d8c7c3cea6fd3a43bcb1739a96e2046c4 also causes following piglit failures on my GM45:
./bin/shader_runner tests/spec/arb_fragment_program_shadow/tex-shadow2drect.shader_test -auto
Output:
Probe at (0,25)
  Expected: 1.000000 1.000000 1.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000
Probe at (225,249)
  Expected: 1.000000 1.000000 1.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000

./bin/shader_runner tests/spec/arb_fragment_program_shadow/txp-shadow2drect.shader_test -auto
Output:
Probe at (0,25)
  Expected: 1.000000 1.000000 1.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000
Probe at (225,249)
  Expected: 1.000000 1.000000 1.000000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000

Should this have its own bug?
Comment 6 Gordon Jin 2013-02-18 05:07:49 UTC
(In reply to comment #1)
> This will probably be WONTFIX, but I'll leave it open for a while in case I
> find something easy to trim.

Can you justify WONTFIX? Shall we keep it as high priority bug?
Comment 7 fangxun 2013-02-21 08:04:23 UTC
It still happens on latest mesa master and 9.1 branch.
Comment 8 Eric Anholt 2013-02-22 19:57:00 UTC
Patch series is out on the mailing list.  Memory usage of this testcase goes from 4gb to 255MB.
Comment 9 Eric Anholt 2013-03-11 19:32:03 UTC
commit 11b8df0c0141c5759025985ba99e782a2dfd720c
Author: Eric Anholt <eric@anholt.net>
Date:   Tue Feb 19 17:01:41 2013 -0800

    mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).
    
    After the previous fix that almost removes an allocation of 4*n^2
    bytes, we can use a bitset to reduce another allocation from n^2 bytes
    to n^2/8 bytes.
    
    Between the previous commit and this one, the peak heap size for an
    oglconform ARB_fragment_program max instructions test on i965 goes from
    4GB to 255MB.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Before you ask, I'm not planning on pulling this to stable.
Comment 10 lu hua 2013-03-18 02:33:17 UTC
It fixed on mesa master branch, still exists on 9.1 branch.
Comment 11 lu hua 2013-04-07 07:55:28 UTC
Verified. Fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.