Bug 93374 - [radeonsi] Tonga (Radeon R9 380) hangs on running hello world OpenCL program
Summary: [radeonsi] Tonga (Radeon R9 380) hangs on running hello world OpenCL program
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-14 18:36 UTC by Vedran Miletić
Modified: 2016-01-06 00:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Vedran Miletić 2015-12-14 18:36:32 UTC
I'm running a hello world PyOpenCL program from here: http://documen.tician.de/pyopencl/ I get the same issue with a different OpenCL program, e.g. one from bug 93370.

The message shown in the console is the same as bug 93264, but without the VM line:

amdgpu 0000:01:00.0: GPU fault detected: 147 0x04588402
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROECTION_FAULT_ADDR   0x0004008B
amdgpu 0000:01:00.0:   VM_CONTEXT1_PROECTION_FAULT_STATUS 0x08084002

The system is unresponsive and one has to do hard reset.

I tried reverting to one commit before c0a189c3792865257c1383f176e5401373ed2270 mentioned in the bug 93264, that is 26ddca196954ccfa697102b46118956ad616073a. However, this did not fix the problem.
Comment 1 Michel Dänzer 2015-12-15 07:21:57 UTC
Basically, OpenCL support for VI isn't implemented yet.
Comment 2 Vedran Miletić 2015-12-15 07:46:56 UTC
What is missing? Can it at least not crash?
Comment 3 EoD 2016-01-03 00:10:30 UTC
Is it possible to work around the lock?

I ran into the same issue, when I tried running clpeak ( https://github.com/krrishnarraj/clpeak ) on kernel 4.4.0-rc7 with current mesa-git and an R9 380X.

kernel: amdgpu 0000:01:00.0: IH ring buffer overflow (0x000C01D0, 0x00000B40, 0x000001E0)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x05f88802
kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0004009C
kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088002
kernel: VM fault (0x02, vmid 5) at page 262300, read from 'TC9' (0x54433900) (136)
Comment 4 EoD 2016-01-04 10:41:29 UTC
After upgrading to current llvm 3.8-git (2921ff9ffcfd09db1c), the program ran fine:

$ ./clpeak 

Platform: Clover
  Device: AMD TONGA (DRM 3.1.0, LLVM 3.8.0)
    Driver version  : 11.2.0-devel (Linux x64)
    Compute units   : 32
    Clock frequency : 0 MHz

    Global memory bandwidth (GBPS)
      float   : 100.53
      float2  : 112.67
      float4  : 105.17
      float8  : 104.72
      float16 : 69.72

    Single-precision compute (GFLOPS)
      float   : 4006.34
      float2  : 4039.47
      float4  : 4030.48
      float8  : 4007.96
      float16 : 3973.85

    Double-precision compute (GFLOPS)
      double   : 265.21
      double2  : 265.09
      double4  : 264.79
      double8  : 264.05
      double16 : 263.11

    Integer compute (GIOPS)
      int   : 841.70
      int2  : 841.59
      int4  : 841.39
      int8  : 841.17
      int16 : 840.63

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 2.57
      enqueueReadBuffer          : 1.92
      enqueueMapBuffer(for read) : 1813.75
        memcpy from mapped ptr   : 1.92
      enqueueUnmap(after write)  : 1545.40
        memcpy to mapped ptr     : 1.92

    Kernel launch latency : 267.44 us
Comment 5 Michel Dänzer 2016-01-06 00:43:59 UTC
Fixed in LLVM 3.8 SVN.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.