Bug 81896 - [clover/sumo] GPU reset when running some "John the Ripper" (+ jumbo patch, from Git) OpenCL tests
Summary: [clover/sumo] GPU reset when running some "John the Ripper" (+ jumbo patch, ...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 99553
  Show dependency treegraph
 
Reported: 2014-07-30 04:36 UTC by Jose P.
Modified: 2019-09-18 19:16 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
mixed outputs (172.06 KB, application/gzip)
2014-07-30 04:36 UTC, Jose P.
Details
output from R600_DEBUG=cs only (168.71 KB, application/gzip)
2014-07-30 04:38 UTC, Jose P.
Details
all tests, mixed outputs (14.18 KB, application/gzip)
2014-07-30 04:48 UTC, Jose P.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jose P. 2014-07-30 04:36:38 UTC
Created attachment 103669 [details]
mixed outputs

I get GPU reset when running some opencl tests from this program from TTY. If I'm running a DE when the lockup happens, the system stops responding.
Attached are:
- the mixed output from /var/log/syslog and this command:
$ R600_DEBUG=cs LWS=128 GWS=9216 ./john --test --format=PBKDF2-HMAC-SHA256-opencl

- an attempt to clean it, only leaving the output from "R600_DEBUG=cs"

Also notice, there is random garbage after "Build log:", which comes from clGetProgramBuildInfo ( https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/src/common-opencl.c#L754 , prints "Build log:" in line 768)

My system:
Hardware: HP dv6, CPU: AMD A6-3400M, GPUs: Radeon HD 6520g (evergreen) + 6750M (north islands)
OS: Kubuntu 14.04
Kernel: 3.15.7, and 3.13, both amd64
Mesa 10.3~git1407290730.9a53f9 (from oibaf's PPA, https://launchpad.net/~oibaf/+archive/graphics-drivers ), compiled with LLVM 3.5-rc1.

________________
To reproduce, here's a mini "how to run" for "John The Ripper" + "jumbo" patches:
$ git clone https://github.com/magnumripper/JohnTheRipper
$ cd JohnTheRipper/src
$ ./configure && make
$ cd ../run
$ ./john --help # prints usage help
$ # the following command prints a list of hash format tests that use opencl, separated by spaces. remove the awk part to get one per line:
$ ./john --help | grep -Eo '[a-zA-Z0-9-]*-opencl' | awk '{printf "%s ", $0}'; echo

To run a test:
$ ./john --test --format=$format
where $format is one of the formats listed from the previous command

The program does verify the output. When it fails, it should print something like "FAILED (get_hash[0](0))".
Comment 1 Jose P. 2014-07-30 04:38:42 UTC
Created attachment 103670 [details]
output from R600_DEBUG=cs only
Comment 2 Jose P. 2014-07-30 04:48:24 UTC
Created attachment 103671 [details]
all tests, mixed outputs

This file shows mixed output from all the opencl tests (+ /var/log/syslog + messages from the driver and/or LLVM). It could help as a log of all the tests that are failing:

$ zgrep 'Benchmarking\|GPU lockup' john-tests.txt.gz 
Benchmarking: ODF-AES-opencl [SHA256 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: ODF-opencl [SHA1 Blowfish OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: PBKDF2-HMAC-SHA256-opencl, OpenCL [PBKDF2-HMAC-SHA256]... Device 0: AMD SUMO
Jul 29 21:37:47 hostname kernel: [ 2826.076735] radeon 0000:00:01.0: GPU lockup CP stall for more than 10096msec
Jul 29 21:37:47 hostname kernel: [ 2826.080595] radeon 0000:00:01.0: GPU lockup (waiting for 0x00000000000794df last fence id 0x00000000000794de on ring 0)
Jul 29 21:38:00 hostname kernel: [ 2839.581180] radeon 0000:00:01.0: GPU lockup CP stall for more than 10472msec
Jul 29 21:38:00 hostname kernel: [ 2839.581995] radeon 0000:00:01.0: GPU lockup (waiting for 0x00000000000794fd last fence id 0x00000000000794fc on ring 0)
Benchmarking: RAKP-opencl [IPMI 2.0 RAKP (RMCP+) OpenCL]... Device 0: AMD SUMO
Benchmarking: Raw-MD4-opencl [MD4 OpenCL (inefficient, development use only)]... Device 0: AMD SUMO
Benchmarking: Raw-MD5-opencl [MD5 OpenCL (inefficient, development use only)]... Device 0: AMD SUMO
Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL (inefficient, development use only)]... Device 0: AMD SUMO
Benchmarking: Raw-SHA256-opencl [SHA256 OpenCL (inefficient, development use mostly)]... Device 0: AMD SUMO
Benchmarking: Raw-SHA512-opencl [SHA512 OpenCL (inefficient, development use mostly)]... Device 0: AMD SUMO
Benchmarking: XSHA512-opencl, Mac OS X 10.7 salted [SHA512 OpenCL (inefficient, development use mostly)]... Device 0: AMD SUMO
Benchmarking: agilekeychain-opencl, 1Password Agile Keychain [PBKDF2-SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: bcrypt-opencl ("$2a$05", 32 iterations) [Blowfish OpenCL]... Device 0: AMD SUMO
Jul 29 21:38:25 hostname kernel: [ 2864.185290] radeon 0000:00:01.0: GPU lockup CP stall for more than 10464msec
Jul 29 21:38:25 hostname kernel: [ 2864.190068] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079525 last fence id 0x0000000000079524 on ring 0)
Jul 29 21:38:36 hostname kernel: [ 2874.688711] radeon 0000:00:01.0: GPU lockup CP stall for more than 10172msec
Jul 29 21:38:36 hostname kernel: [ 2874.688719] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000007952c last fence id 0x000000000007952b on ring 0)
Jul 29 21:38:46 hostname kernel: [ 2885.192241] radeon 0000:00:01.0: GPU lockup CP stall for more than 10176msec
Jul 29 21:38:46 hostname kernel: [ 2885.193841] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079533 last fence id 0x0000000000079532 on ring 0)
Jul 29 21:38:57 hostname kernel: [ 2895.695613] radeon 0000:00:01.0: GPU lockup CP stall for more than 10160msec
Jul 29 21:38:57 hostname kernel: [ 2895.697871] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000007953a last fence id 0x0000000000079539 on ring 0)
Jul 29 21:39:07 hostname kernel: [ 2906.199178] radeon 0000:00:01.0: GPU lockup CP stall for more than 10168msec
Jul 29 21:39:07 hostname kernel: [ 2906.203821] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079541 last fence id 0x0000000000079540 on ring 0)
Jul 29 21:39:18 hostname kernel: [ 2916.702548] radeon 0000:00:01.0: GPU lockup CP stall for more than 10172msec
Jul 29 21:39:18 hostname kernel: [ 2916.704807] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079548 last fence id 0x0000000000079547 on ring 0)
Jul 29 21:39:28 hostname kernel: [ 2927.206123] radeon 0000:00:01.0: GPU lockup CP stall for more than 10176msec
Jul 29 21:39:28 hostname kernel: [ 2927.210750] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000007954f last fence id 0x000000000007954e on ring 0)
Jul 29 21:39:39 hostname kernel: [ 2937.709487] radeon 0000:00:01.0: GPU lockup CP stall for more than 10156msec
Jul 29 21:39:39 hostname kernel: [ 2937.711741] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079556 last fence id 0x0000000000079555 on ring 0)
Jul 29 21:39:49 hostname kernel: [ 2948.213050] radeon 0000:00:01.0: GPU lockup CP stall for more than 10140msec
Jul 29 21:39:49 hostname kernel: [ 2948.215273] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000007955d last fence id 0x000000000007955c on ring 0)
Jul 29 21:40:00 hostname kernel: [ 2958.716427] radeon 0000:00:01.0: GPU lockup CP stall for more than 10192msec
Jul 29 21:40:00 hostname kernel: [ 2958.720969] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079564 last fence id 0x0000000000079563 on ring 0)
Jul 29 21:40:10 hostname kernel: [ 2969.219996] radeon 0000:00:01.0: GPU lockup CP stall for more than 10156msec
Jul 29 21:40:10 hostname kernel: [ 2969.224451] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000007956b last fence id 0x000000000007956a on ring 0)
Benchmarking: blockchain-opencl, blockchain My Wallet [PBKDF2-SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: descrypt-opencl, traditional crypt(3) [DES OpenCL]... Device 0: AMD SUMO
Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: encfs-opencl, EncFS [PBKDF2-SHA1 AES/Blowfish OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: grub-opencl, grub-opencl [PBKDF2-SHA512]... Device 0: AMD SUMO
Benchmarking: keychain-opencl, Mac OS X Keychain [PBKDF2-SHA1 3DES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: keyring-opencl, GNOME Keyring [SHA256 AES OPENCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4 HMAC-MD5 RC4 OpenCL]... Device 0: AMD SUMO
Jul 29 21:40:40 hostname kernel: [ 2999.009813] radeon 0000:00:01.0: GPU lockup CP stall for more than 10264msec
Jul 29 21:40:40 hostname kernel: [ 2999.012964] radeon 0000:00:01.0: GPU lockup (waiting for 0x00000000000009ff last fence id 0x00000000000009fe on ring 3)
Benchmarking: krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18 [OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: lotus5-opencl, Lotus Notes/Domino 5 [8/64]... (4xOMP) Device 0: AMD SUMO
Benchmarking: md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]... Device 0: AMD SUMO
Benchmarking: mscash2-opencl, MS Cache Hash 2 (DCC2) [PBKDF2-SHA1 OpenCL]... Device 0: AMD SUMO
Benchmarking: mysql-sha1-opencl, MySQL 4.1+ [SHA1 OpenCL (inefficient, development use only)]... Device 0: AMD SUMO
Benchmarking: nt-opencl, NT [MD4 OpenCL (inefficient, development use only)]... Device 0: AMD SUMO
Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL]... Device 0: AMD SUMO
Benchmarking: o5logon-opencl, Oracle O5LOGON protocol [OpenCL-SHA1 AES 32/64]... Device 0: AMD SUMO
Benchmarking: office2007-opencl, MS Office 2007 (50,000 iterations) [SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: office2010-opencl, MS Office 2010 (100,000 iterations) [SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: office2013-opencl, MS Office 2013 (100,000 iterations) [SHA512 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: phpass-opencl ($P$9 lengths 0 to 15) [MD5 OpenCL]... Device 0: AMD SUMO
Benchmarking: pwsafe-opencl, Password Safe [SHA256 OpenCL]... Device 0: AMD SUMO
Jul 29 21:41:20 hostname kernel: [ 3038.842771] radeon 0000:00:01.0: GPU lockup CP stall for more than 10472msec
Jul 29 21:41:20 hostname kernel: [ 3038.842912] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000079596 last fence id 0x0000000000079595 on ring 0)
Benchmarking: rar-opencl, RAR3 (length 4) [SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Device 0: AMD SUMO
Benchmarking: sha512crypt-opencl, crypt(3) $6$ (rounds=5000) [SHA512 OpenCL]... Device 0: AMD SUMO
Benchmarking: ssha-opencl, Netscape LDAP {SSHA} [SHA1 OpenCL (inefficient, development use mostly)]... Device 0: AMD SUMO
Benchmarking: strip-opencl, STRIP Password Manager [PBKDF2-SHA1 OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: sxc-opencl, StarOffice .sxc [PBKDF2-SHA1 Blowfish OpenCL]... (4xOMP) Device 0: AMD SUMO
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... Device 0: AMD SUMO
Benchmarking: zip-opencl, ZIP [PBKDF2-SHA1 AES OpenCL]... (4xOMP) Device 0: AMD SUMO
Comment 3 GitLab Migration User 2019-09-18 19:16:48 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/518.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.