Created attachment 144321 [details]
Excerpt of `/var/log/syslog`.
My computer sometimes turns off when I max out* all its CPUs for more than a few minutes. I would estimate that it turns off something like 1 out of 10 times when the CPUs are maxed out for more than five minutes, with the probability increasing with the amount of time under load. It happens pretty reliably (maybe 2 out of 3 times? more often?) when they're working for more than an hour. Normally when it turns off I just rerun my job with fewer CPUs, but this time I opened up the system log and found a report of a GPU hang. I'm attaching part of the system log, the GPU crash dump, and some system information. Please let me know if there's more information I can provide.
* By this I mean that `top` reports CPU usage at or near 100%. I doubt that much of what I'm doing is CPU-bound.
Created attachment 144322 [details]
Compressed GPU crash dump (`/sys/class/drm/card0/error`).
Created attachment 144323 [details]