Created attachment 140880 [details]
dmesg with drm.debug=0xff
OS: Ubuntu 16.04.4 LTS
CPU: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
MotherBoard: SuperMicro X11SAT-F
The i915 kernel driver is not able to initialize the Intel GPU unless the Aperture size in the BIOS is set to 128MB or less.
Doing this results in quite some problems for our use case though.
Attached logging (dmesg with drm.debug=0xff) and other information about the system.
Created attachment 140881 [details]
Created attachment 140882 [details]
Created attachment 140883 [details]
Created attachment 140884 [details]
Created attachment 140885 [details]
Created attachment 140886 [details]
Created attachment 140887 [details]
[ 1.265582] [drm:i915_ggtt_probe_hw [i915]] GMADR size = 0M
and with aperture set to 128MiB?
Created attachment 140888 [details]
dmesg with aperture set to 128MB
As expected, then we get
[ 1.444006] [drm:i915_ggtt_probe_hw [i915]] GMADR size = 128M
Aside, how does having mappable limited to 128MiB affect you?
And can you please attach lspci -vvv -s 0:0:2 for both configs?
Created attachment 140890 [details]
128MB: lspci -vvv -s 0:0:2
Created attachment 140891 [details]
256MB: lspci -vvv -s 0:0:2
The mappable limit of 128MB is affecting us, because we are trying to run 4 different HD video encoders using the Intel Media SDK.
For running all these video encoders, we are required to have numerous VA Surface buffers and we need more than 128MB of them.
(In reply to Krist from comment #14)
> Created attachment 140891 [details]
> 256MB: lspci -vvv -s 0:0:2
Ok, we are not going completely mad, your BIOS is buggy.
(In reply to Krist from comment #15)
> The mappable limit of 128MB is affecting us, because we are trying to run 4
> different HD video encoders using the Intel Media SDK.
> For running all these video encoders, we are required to have numerous VA
> Surface buffers and we need more than 128MB of them.
Mappable has no impact on the amount of usable memory. It impacts the number of surfaces that can be directly mapped through the very slow indirect GTT. If userspace is using that as it's preferred means of access, it needs to be shouted at. Still, that is handled as 2MiB chunks, the size of surface and the number of them have little impact, the size of the aperture basically means how many concurrent access you can handle before thrashing forces eviction.
The biggest consume of mappable is the display engine, but even that can handle unmmappable memory for the main part (some features like FBC not withstanding).
Ok, I see. I thought it was related because of VA API call failures when trying to allocate buffers.
But perhaps this is also being caused because of the buggy BIOS?
Could you explain why the BIOS is buggy, so I can contact the manufacturer with that information?
The BIOS is responsible for setting up the IO regions of the gfx PCI device,
Region 2: Memory at <unassigned> (64-bit, prefetchable)
is the mappable aperture PCI BAR.
If you can get some specifics on how userspace fails with only 128MiB, we should follow that up as well.
I'm not sure how to provide more info, but I can say that when I'm trying to initialize the Intel encoder with API call:
It is returning result:
It does seem to get past the VA Surface allocation though, so perhaps I remembered incorrectly where the userspace issue is.
Sometimes I can initialize one encoder successfully, but then the second one is failing etc.
Attached will be the dmesg log when all this is happening and failing.
Created attachment 140892 [details]
dmesg when encoder is being initialized
Created attachment 140893 [details]
dmesg when the encoder eventually succesfully initializes
When we're trying to only use 1 encoder, eventually it will be able to initialize succesfully (after sometimes 500 times the above stated failure).
I have attached the dmesg log where at the end, the single encoder has been successfully initialized and is running.
SDK functions return MFX_ERR_DEVICE_LOST or MFX_ERR_DEVICE_FAILED to indicate that there is a complete failure in
hardware acceleration. The application must close and reinitialize the SDK function class. If the application has provided a
hardware acceleration device handle to the SDK, the application must reset the device.
Yet the kernel didn't report any failures. I think it's all in the imagination of the media-driver.
Created attachment 140894 [details]
Encoder debug logging and backtrace when init fails
Not sure if this is applicable to the i915 driver anymore.
I will also continue contact with the Media SDK support team.
(In reply to Krist from comment #26)
> Not sure if this is applicable to the i915 driver anymore.
> I will also continue contact with the Media SDK support team.
Did you get any response from MSDK?
We went on to continue to debug using more debugging in userspace via MSDK. This resulted in no further results, unfortunately.
During contact with manufacturer, we also found out that this motherboard, X11SAT-F, has built in firmware which is apparently some kind of a PCIe multiplexer, able do more PCIe lanes than the CPU is actually supposed to be able to.
However, this is most likely now the cause of all kinds of instabilities and issues with the onboard graphics card.
Therefore, we have sent back mobo to factory for them to have another look at it.
I don't think this is an i915 / drm issue and will thus close the bug.