Bug 93374

Summary: [radeonsi] Tonga (Radeon R9 380) hangs on running hello world OpenCL program
Product: Mesa Reporter: Vedran Miletić <vedran>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: blocker    
Priority: medium CC: EoD
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Vedran Miletić 2015-12-14 18:36:32 UTC
I'm running a hello world PyOpenCL program from here: http://documen.tician.de/pyopencl/ I get the same issue with a different OpenCL program, e.g. one from bug 93370.

The message shown in the console is the same as bug 93264, but without the VM line:

amdgpu 0000:01:00.0: GPU fault detected: 147 0x04588402
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROECTION_FAULT_ADDR   0x0004008B
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROECTION_FAULT_STATUS 0x08084002

The system is unresponsive and one has to do hard reset.

I tried reverting to one commit before c0a189c3792865257c1383f176e5401373ed2270 mentioned in the bug 93264, that is 26ddca196954ccfa697102b46118956ad616073a. However, this did not fix the problem.
Comment 1 Michel Dänzer 2015-12-15 07:21:57 UTC
Basically, OpenCL support for VI isn't implemented yet.
Comment 2 Vedran Miletić 2015-12-15 07:46:56 UTC
What is missing? Can it at least not crash?
Comment 3 EoD 2016-01-03 00:10:30 UTC
Is it possible to work around the lock?

I ran into the same issue, when I tried running clpeak ( https://github.com/krrishnarraj/clpeak ) on kernel 4.4.0-rc7 with current mesa-git and an R9 380X.

kernel: amdgpu 0000:01:00.0: IH ring buffer overflow (0x000C01D0, 0x00000B40, 0x000001E0)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x05f88802
kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0004009C
kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088002
kernel: VM fault (0x02, vmid 5) at page 262300, read from 'TC9' (0x54433900) (136)
Comment 4 EoD 2016-01-04 10:41:29 UTC
After upgrading to current llvm 3.8-git (2921ff9ffcfd09db1c), the program ran fine:

$ ./clpeak 

Platform: Clover
  Device: AMD TONGA (DRM 3.1.0, LLVM 3.8.0)
    Driver version  : 11.2.0-devel (Linux x64)
    Compute units   : 32
    Clock frequency : 0 MHz

    Global memory bandwidth (GBPS)
      float   : 100.53
      float2  : 112.67
      float4  : 105.17
      float8  : 104.72
      float16 : 69.72

    Single-precision compute (GFLOPS)
      float   : 4006.34
      float2  : 4039.47
      float4  : 4030.48
      float8  : 4007.96
      float16 : 3973.85

    Double-precision compute (GFLOPS)
      double   : 265.21
      double2  : 265.09
      double4  : 264.79
      double8  : 264.05
      double16 : 263.11

    Integer compute (GIOPS)
      int   : 841.70
      int2  : 841.59
      int4  : 841.39
      int8  : 841.17
      int16 : 840.63

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 2.57
      enqueueReadBuffer          : 1.92
      enqueueMapBuffer(for read) : 1813.75
        memcpy from mapped ptr   : 1.92
      enqueueUnmap(after write)  : 1545.40
        memcpy to mapped ptr     : 1.92

    Kernel launch latency : 267.44 us
Comment 5 Michel Dänzer 2016-01-06 00:43:59 UTC
Fixed in LLVM 3.8 SVN.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.