Running GROMACS [1] with the following patch diff --git a/src/gromacs/gpu_utils/gpu_utils_ocl.cpp b/src/gromacs/gpu_utils/gpu_utils_ocl.cpp index 2084d8c..8928582 100644 --- a/src/gromacs/gpu_utils/gpu_utils_ocl.cpp +++ b/src/gromacs/gpu_utils/gpu_utils_ocl.cpp @@ -131,6 +131,8 @@ static int is_gmx_supported_gpu_id(struct gmx_device_info_t *ocl_gpu_device) return egpuCompatible; case OCL_VENDOR_AMD: return runningOnCompatibleOSForAmd() ? egpuCompatible : egpuIncompatible; + case OCL_VENDOR_INTEL: + return egpuCompatible; default: return egpuIncompatible; } results in: $ gmx mdrun -deffnm em :-) GROMACS - gmx mdrun, 2016-dev-20160222-29943fe-dirty-unknown (-: GROMACS is written by: Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2015, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx mdrun, version 2016-dev-20160222-29943fe-dirty-unknown Executable: /usr/local/gromacs/bin/gmx Data prefix: /usr/local/gromacs Command line: gmx mdrun -deffnm em Back Off! I just backed up em.log to ./#em.log.39# X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 Running on 1 node with total 4 cores, 4 logical cores, 2 compatible GPUs Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz SIMD instructions most likely to fit this hardware: AVX_256 SIMD instructions selected at GROMACS compile time: AVX_256 GPU info: Number of GPUs detected: 2 #0: name: AMD TONGA (DRM 3.1.0, LLVM 3.9.0), vendor: AMD, device version: OpenCL 1.1 MESA 11.2.0-devel, stat: compatible #1: name: Intel(R) HD Graphics IvyBridge GT1, vendor: Intel, device version: OpenCL 1.2 beignet 1.1.1 (git-9043d32), stat: compatible Reading file em.tpr, VERSION 5.1-dev-20150219-7c30fcf-unknown (single precision) Note: file tpx version 100, software tpx version 109 Using 2 MPI threads Using 2 OpenMP threads per tMPI thread 2 compatible GPUs are present, with IDs 0,1 2 GPUs auto-selected for this run. Mapping of GPU IDs to the 2 PP ranks in this node: 0,1 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [2] param: 4, val: 0 Selecting generic kernel ASSERTION FAILED: Not supported at file /builddir/build/BUILD/Beignet-1.1.1-Source/backend/src/./ir/context.hpp, function gbe::ir::ImmediateIndex gbe::ir::Context::newIntegerImmediate(int64_t, gbe::ir::Type), line 95 Trace/breakpoint trap (core dumped) Regardless of "DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument", this assert failure should not happen, right? [1] http://www.gromacs.org/
Another machine, without ioctl failures, same issue: $ gmx mdrun -deffnm em -v :-) GROMACS - gmx mdrun, 2016-dev-20160222-29943fe-dirty-unknown (-: GROMACS is written by: Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2015, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx mdrun, version 2016-dev-20160222-29943fe-dirty-unknown Executable: /home/vedranm/software/bin/gmx Data prefix: /home/vedranm/software Command line: gmx mdrun -deffnm em -v Back Off! I just backed up em.log to ./#em.log.20# Running on 1 node with total 4 cores, 4 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz SIMD instructions most likely to fit this hardware: AVX2_256 SIMD instructions selected at GROMACS compile time: AVX2_256 GPU info: Number of GPUs detected: 1 #0: name: Intel(R) HD Graphics Haswell GT2 Mobile, vendor: Intel, device version: OpenCL 1.2 beignet 1.1.1 (git-2eea2c9), stat: compatible Reading file em.tpr, VERSION 5.1-dev-20150219-7c30fcf-unknown (single precision) Note: file tpx version 100, software tpx version 109 Using 1 MPI thread Using 4 OpenMP threads 1 compatible GPU is present, with ID 0 1 GPU auto-selected for this run. Mapping of GPU ID to the 1 PP rank in this node: 0 Selecting generic kernel ASSERTION FAILED: Not supported at file /builddir/build/BUILD/Beignet-1.1.1-Source/backend/src/./ir/context.hpp, function gbe::ir::ImmediateIndex gbe::ir::Context::newIntegerImmediate(int64_t, gbe::ir::Type), line 95 Zamka za praćenje/prekidnu točku (jezgra izbačena)
Hi Vedran, Are the problem still with the GROMACS? I have tried the GROMACS with our mater branch and the patch https://gerrit.gromacs.org/#/c/5752/ The GROMACS can run but I did not know if the result is right. Thanks Xiuli
(In reply to Xiuli Pan from comment #2) > Hi Vedran, > > Are the problem still with the GROMACS? > I have tried the GROMACS with our mater branch and the patch > https://gerrit.gromacs.org/#/c/5752/ > The GROMACS can run but I did not know if the result is right. > > Thanks > Xiuli I tried, but got blocked by bug 95239.
Managed to get a Fedora 23 x86_64 machine with GCC 5.3.1 to test. This machine has no Ivy Bridge, but Haswell: vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz stepping : 3 microcode : 0x1e cpu MHz : 2560.156 cache size : 3072 KB 00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06) With custom compiled LLVM/CLang 3.8, latest Beignet from git, and GROMACS with the patch you linked and the patch from comment 1, I get Running on 1 node with total 4 cores, 4 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz SIMD instructions most likely to fit this hardware: AVX2_256 SIMD instructions selected at GROMACS compile time: AVX2_256 Hardware topology: Basic GPU info: Number of GPUs detected: 1 #0: name: Intel(R) HD Graphics Haswell GT2 Mobile, vendor: Intel, device version: OpenCL 1.2 beignet 1.2 (git-8dfec54), stat: compatible Reading file em.tpr, VERSION 5.1-dev-20150219-7c30fcf-unknown (single precision) Note: file tpx version 100, software tpx version 110 Using 1 MPI thread Using 4 OpenMP threads 1 compatible GPU is present, with ID 0 1 GPU auto-selected for this run. Mapping of GPU ID to the 1 PP rank in this node: 0 Selecting generic kernel Back Off! I just backed up em.trr to ./#em.trr.14# Back Off! I just backed up em.edr to ./#em.edr.14# Steepest Descents: Tolerance (Fmax) = 1.00000e+03 Number of steps = 50000 ClERROR! -49 Step= 0, Dmax= 1.0e-02 nm, Epot= 3.54278e+04 Fmax= 4.65376e+03, atom= 1309 ClERROR! -49 Step= 1, Dmax= 1.0e-02 nm, Epot= 2.97022e+04 Fmax= 7.33237e+03, atom= 1309 ClERROR! -49 According to [1], error code -49 is CL_INVALID_ARG_INDEX. So, at least on Haswell, the original issue is not there, but it still does not work. I have updated summary accordingly. [1] https://streamcomputing.eu/blog/2013-04-28/opencl-error-codes/
I had a build of gromacs of the master branch on HASWELL, but I have the some thing run with following result and it just stuck then. What is this gmx mdrun -deffnm em used for? what is this em stand for? What file I need to run this test? ./gmx mdrun -deffnm em :-) GROMACS - gmx mdrun, 2016-dev-20160405-2c62aed-dirty (-: GROMACS is written by: Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2015, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx mdrun, version 2016-dev-20160405-2c62aed-dirty Executable: /home/pxl/gromacs/build/bin/./gmx Data prefix: /home/pxl/gromacs (source tree) Command line: gmx mdrun -deffnm em
(In reply to Xiuli Pan from comment #5) > I had a build of gromacs of the master branch on HASWELL, but I have the > some thing run with following result and it just stuck then. > What is this gmx mdrun -deffnm em used for? what is this em stand for? What > file I need to run this test? It's a molecular dynamics run, and EM stands for energy minimization. The reason you have to produce these files is that the only part of GROMACS that actually calls OpenCL is mdrun. These files are very easy to produce, for example you can use the instructions at [1]. [1] http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/index.html
I followed the guide to the em part and here is the log: --------------------------------------------------------------------------- ./gmx mdrun -v -deffnm em :-) GROMACS - gmx mdrun, 2016-dev-20160405-2c62aed-dirty (-: GROMACS is written by: Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2015, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx mdrun, version 2016-dev-20160405-2c62aed-dirty Executable: /home/pxl/gromacs/build/bin/./gmx Data prefix: /home/pxl/gromacs (source tree) Command line: gmx mdrun -v -deffnm em Back Off! I just backed up em.log to ./#em.log.1# Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Xeon(R) CPU E3-1286 v3 @ 3.70GHz SIMD instructions most likely to fit this hardware: AVX2_256 SIMD instructions selected at GROMACS compile time: AVX2_256 Hardware topology: Full, with devices GPU info: Number of GPUs detected: 1 #0: name: Intel(R) HD Graphics Haswell GT2 Server, vendor: Intel, device version: OpenCL 1.2 beignet 1.2 (git-b39d875), stat: compatible Reading file em.tpr, VERSION 2016-dev-20160405-2c62aed-dirty (single precision) Using 1 MPI thread Using 8 OpenMP threads 1 compatible GPU is present, with ID 0 1 GPU auto-selected for this run. Mapping of GPU ID to the 1 PP rank in this node: 0 Setting up defines for kernel types for FastGen -DGMX_OCL_FASTGEN_ADD_TWINCUT -DEL_EWALD_ANA -DEELNAME=_ElecEw -DLJ_COMB_GEOM -DVDWNAME=_VdwLJCombGeom Selecting kernel source automatically Selecting generic kernel Setting up kernel vendor spec definitions: -D_WARPLESS_SOURCE_ The OpenCL compilation log has been saved in "nbnxn_ocl_kernels.cl.SUCCEEDED" Back Off! I just backed up em.trr to ./#em.trr.1# Back Off! I just backed up em.edr to ./#em.edr.1# Steepest Descents: Tolerance (Fmax) = 1.00000e+03 Number of steps = 50000 Step= 0, Dmax= 1.0e-02 nm, Epot= -3.31075e+06 Fmax= 6.48321e+04, atom= 2732 Step= 1, Dmax= 1.0e-02 nm, Epot= -3.31357e+06 Fmax= 6.36930e+04, atom= 2732 Step= 4, Dmax= 3.0e-03 nm, Epot= -3.31544e+06 Fmax= 6.18313e+04, atom= 2732 Step= 16, Dmax= 1.8e-06 nm, Epot= -3.30329e+06 Fmax= 6.18301e+04, atom= 2732 Energy minimization has stopped, but the forces have not converged to the requested precision Fmax < 1000 (which may not be possible for your system). It stopped because the algorithm tried to make a new step whose size was too small, or there was no change in the energy since last step. Either way, we regard the minimization as converged to within the available machine precision, given your starting configuration and EM parameters. Double precision normally gives you higher accuracy, but this is often not needed for preparing to run molecular dynamics. You might need to increase your constraint accuracy, or turn off constraints altogether (set constraints = none in mdp file) writing lowest energy coordinates. Back Off! I just backed up em.gro to ./#em.gro.1# Steepest Descents converged to machine precision in 17 steps, but did not reach the requested Fmax < 1000. Potential Energy = -3.3154425e+06 Maximum force = 6.1831297e+04 on atom 2732 Norm of force = 7.5550275e+02 NOTE: The GPU has >20% more load than the CPU. This imbalance causes performance loss, consider using a shorter cut-off and a finer PME grid. NOTE: 16 % of the run time was spent in pair search, you might want to increase nstlist (this has no effect on accuracy) --------------------------------------------------------------------------- I actually don't know if this is right, could you have a look here? Thanks Xiuli
(In reply to Xiuli Pan from comment #7) > I actually don't know if this is right, could you have a look here? > Thanks > Xiuli I have seen the "Energy minimization has stopped, but the forces have not converged to the requested precision Fmax < 1000 (which may not be possible for your system)." happen on Radeon previously, and I am not sure if it is correct. Can you try latest git of both GROMACS and Beignet? Then we would have a comparable system software and hardware, and hopefully you will also hit ERROR -49.
Hi Vedran, I am using the master branch of beignet and some commit on 2016-dev-20160405 of GROMACS, you can pick a commit with that error that I can try to reproduce the problem. Thanks Xiuli
I tried the same version, on Fedora 24 x86_64 and I believe I am getting incorrect result. It does not change whether I use the version of GROMACS you use or the latest git. I get Steepest Descents: Tolerance (Fmax) = 1.00000e+03 Number of steps = 50000 Energy minimization has stopped, but the forces have not converged to the requested precision Fmax < 1000 (which may not be possible for your system). It stopped because the algorithm tried to make a new step whose size was too small, or there was no change in the energy since last step. Either way, we regard the minimization as converged to within the available machine precision, given your starting configuration and EM parameters. Double precision normally gives you higher accuracy, but this is often not needed for preparing to run molecular dynamics. You might need to increase your constraint accuracy, or turn off constraints altogether (set constraints = none in mdp file) writing lowest energy coordinates. Back Off! I just backed up em.gro to ./#em.gro.4# Steepest Descents converged to machine precision in 32 steps, but did not reach the requested Fmax < 1000. Potential Energy = -5.6638625e+05 Maximum force = 1.0924077e+04 on atom 1515 Norm of force = 3.5640625e+02 Another simulation, which works on a machine with Radeon/NVIDIA GPUs, gives me starting mdrun 'UBIQUITIN in water' 500000 steps, 1000.0 ps. step 14: One or more water molecules can not be settled. Check for bad contacts and/or reduce the timestep if appropriate. Wrote pdb files with previous and current coordinates Segmentation fault (core dumped) Is there any info I can provide that would help debugging?
Hi Vedran, Could you provide us with what exact commit gromacs you are using? I know how to run with guide you give but I could not know if there is any bugs from the result. If there is some unit test for gromacs that can check the result or something like that. Sometimes the result may got very wrong. And hi Szilárd, here is also some bug with gromacs, what do you think it is about? Should we use the patch you sent? Thanks Xiuli
(In reply to Xiuli Pan from comment #11) > Hi Vedran, Could you provide us with what exact commit gromacs you are using? > I know how to run with guide you give but I could not know if there is any > bugs from the result. If there is some unit test for gromacs that can check > the result or something like that. > Sometimes the result may got very wrong. > > And hi Szilárd, here is also some bug with gromacs, what do you think it is > about? Should we use the patch you sent? > > Thanks > Xiuli I'm using Beignet git 4585453a41bd1f88e0225785201927b69591d570 with LLVM/Clang 3.8 on Fedora 24 on 00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06). As for GROMACS, I'm using release-2016 branch, latest commit which right now is 0722e465c5bc31e493c41359917821545a4f9423. The following patch is necessary to allow using OpenCL on Intel iGPUs: diff --git a/src/gromacs/gpu_utils/gpu_utils_ocl.cpp b/src/gromacs/gpu_utils/gpu_utils_ocl.cpp index 2084d8c..8928582 100644 --- a/src/gromacs/gpu_utils/gpu_utils_ocl.cpp +++ b/src/gromacs/gpu_utils/gpu_utils_ocl.cpp @@ -131,6 +131,8 @@ static int is_gmx_supported_gpu_id(struct gmx_device_info_t *ocl_gpu_device) return egpuCompatible; case OCL_VENDOR_AMD: return runningOnCompatibleOSForAmd() ? egpuCompatible : egpuIncompatible; + case OCL_VENDOR_INTEL: + return egpuCompatible; default: return egpuIncompatible; } Tried both without and with this patch https://gerrit.gromacs.org/#/c/5752/ -- no change. For testing, I use ftp://ftp.gromacs.org/pub/benchmarks/water_GMX50_bare.tar.gz by going to one of the directories (say 0003) and issuing $ gmx grompp -f pme.mdp $ gmx mdrun -s topol.tpr -notunepme -v Simulation segfaults or warns about water being unable to settle.
Hi all, Sorry for the delay. Let me add some clarifications. I'd like to note that it hasn't been verified yet whether the GROMACS kernels in question do execute correctly on Intel iGPU hardware, I have planned to try the proprietary OpenCL stack, but haven't had the time. Hence, the incorrect results observed (recently reproduced on Haswell and Skylake) could also be due to a bug on the GROMACS side. If somebody has a setup with the Intel proprietary OpenCL stack and could help out with running a test or two, it would be greatly appreciated. Secondly, the repro steps that Vedran provided are sufficient to show the issue, but not necessary (or convenient) to verify correctness. To do that it's instead more convenient to run the unit and regression tests (configure with -DGMX_BUILD_UNITTESTS=ON for the former, -DREGRESSIONTEST_DOWNLOAD=ON for the latter) and run "make check". Alternatively, a more dev-oriented, but somewhat a quick and dirty (and limited) check is to use run a single test case (e.g. the one Vedran provided) gmx mdrun -nb cpu -nsteps 0 -g ref-cpu.log; gmx mdrun -nb gpu -nsteps 0 -g gpu.log and compare the first step energy values, i.e. the outputs of: grep -m 1 -A 4 ' Energies (kJ/mol)' ref-cpu.log; grep -m 1 -A 4 ' Energies (kJ/mol)' gpu.log Cheers, Sz.
A few more things to add after a bit of testing: - I suspect the reduction might be the culprit, need to think a bit about it. Is there any way to change or at least know the SIMD execution width? Setting it to 32 would provide helpful feedback. - The previously noted issue with quoted include seems to still not be fixed.
Hi Szilárd, Thanks for the method to check the result, I will try to check the result and find if anything wrong with the kernel or resutl. Thanks Xiuli
(In reply to Szilárd Páll from comment #14) > A few more things to add after a bit of testing: > - I suspect the reduction might be the culprit, need to think a bit about > it. Is there any way to change or at least know the SIMD execution width? > Setting it to 32 would provide helpful feedback. > > - The previously noted issue with quoted include seems to still not be fixed. If you want to get simd width, you can use code like this: size_t workgroupSize_used; err = clGetKernelWorkGroupInfo(kernel, device, CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, sizeof(size_t), &workgroupSize_used, NULL); We support 16 or 8 for beignet.
(In reply to Szilárd Páll from comment #14) > - The previously noted issue with quoted include seems to still not be fixed. Related patch for Gallium is here: https://lists.freedesktop.org/archives/mesa-dev/2016-July/122864.html
(In reply to Xiuli Pan from comment #15) > Thanks for the method to check the result, I will try to check the result > and find if anything wrong with the kernel or resutl. Thanks. Do you have the possibility to compare with the proprietary Intel OpenCL stack?
(In reply to Szilárd Páll from comment #14) > A few more things to add after a bit of testing: > - I suspect the reduction might be the culprit, need to think a bit about > it. Is there any way to change or at least know the SIMD execution width? > Setting it to 32 would provide helpful feedback. > > - The previously noted issue with quoted include seems to still not be fixed. Hi Szilárd, I have tried the patch: https://cgit.freedesktop.org/beignet/commit/?h=Release_v1.1&id=8e9ef20 But this patch somehow was not in master branch but only in release branch, I will ask maintainer to merge this patch in master as well. Thanks Xiuli
Hi, I have tried the dev dirty way to get some data, it seems beignet and the proprietary Intel OpenCL driver got both wrong result. Env: LLVM 3.6.2/3.8; beignet master(dff184) with quote patch; Gromacs release-2016(ebf2a5) with workaround patch. It seems something also wrong with LLVM version, but the GPU results are all wrong. Is the kernels logic related to SIMD size? ref-cpu: Energies (kJ/mol) LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 1.47860e+05 -8.96047e+05 3.62010e+03 -7.44566e+05 0.00000e+00 Total Energy Conserved En. Temperature Pressure (bar) -7.44566e+05 -7.44566e+05 0.00000e+00 2.65552e+04 gpu-beg 3.6.2: Energies (kJ/mol) LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 1.48497e+05 -1.55098e+06 3.62010e+03 -1.39887e+06 0.00000e+00 Total Energy Conserved En. Temperature Pressure (bar) -1.39887e+06 -1.39887e+06 0.00000e+00 4.48803e+04 gpu-beg 3.8: Energies (kJ/mol) LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 1.49634e+05 -1.60296e+06 3.62010e+03 -1.44971e+06 0.00000e+00 Total Energy Conserved En. Temperature Pressure (bar) -1.44971e+06 -1.44971e+06 0.00000e+00 4.09773e+04 gpu-intel: Energies (kJ/mol) LJ (SR) Coulomb (SR) Coul. recip. Potential Kinetic En. 1.52678e+05 -1.94930e+06 3.62010e+03 -1.79300e+06 0.00000e+00 Total Energy Conserved En. Temperature Pressure (bar) -1.79300e+06 -1.79300e+06 0.00000e+00 3.06926e+04 Thanks Xiuli
Xiuli, thanks for the feedback and the testing! > It seems something also wrong with LLVM version, but the GPU results are all wrong. Do you mean that additionally to the result being off something else is also wrong in the LLVM version? > Is the kernels logic related to SIMD size? It is. Although the algorithms is designed to be quite general (allowing tuning parameters like arithmetic intensity, data reuse, register use etc), the kernel was originally implemented and tuned for the NVIDIA (in CUDA) and later ported to OpenCL for AMD. The current port was my "simple" attempt to remove the execution width >=32 assumptions. I thought the current version has already accomplished that, but that's seem to not be the case given the incorrect results with both OpenCL stacks. I'll give it a thought and will try to find the source of the issue; I think I'll start with eliminating some conditionals to be able to use mem fencing... By the way, is it possible to do cross-lane operations in Intel with beignet (e.g. using intrinsics)?
(In reply to Szilárd Páll from comment #21) > I'll give it a thought and will try to find the source of the issue; I think > I'll start with eliminating some conditionals to be able to use mem > fencing... > Szilard, any news here?
Xiuli, any news from Intel? Any interesting changes since August? Would it be worth testing again?
(In reply to Vedran Miletić from comment #23) > Xiuli, any news from Intel? Any interesting changes since August? Would it > be worth testing again? The program can run here when I test with it. But the result seems not correct. I do not know if the kernel has some problem or there are something wrong with the driver. If there is anything can help me figure out which part is broke, I may have some solution or patches. Thanks Xiuli
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/beignet/beignet/issues/66.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.