Summary: | [BAT][BRW] NULL pointer dereference in snd_hda_codec_generic | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> | ||||
Component: | DRM/Intel | Assignee: | Martin Peres <martin.peres> | ||||
Status: | CLOSED NOTOURBUG | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | intel-gfx-bugs, jeeja.kp, libin.yang, ricardo.vega | ||||
Version: | DRI git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | ReadyForDev ; Upstream bug=https://bugzilla.kernel.org/show_bug.cgi?id=196045 | ||||||
i915 platform: | I965G | i915 features: | display/audio | ||||
Attachments: |
|
Description
Martin Peres
2017-05-31 10:24:29 UTC
Cc Libin, Jeeja. To whom should we assign audio bugs we find in our CI? Should we create an audio component in bugzilla? i965G is not our team support platform. We didn't any development on it. Jeeja, will your team support this platform. Hi Martin, I notice it is in the generic audio driver. It's interesting. How can we reproduce this bug? Could you send out the `alsa-info --stdout`? Created attachment 131709 [details]
alsa-info --stdout output
Here is the output of alsa-info.
We managed to hit this issue twice in 5 days / 20 runs by running the IGT test igt@drv_module_reload@basic-reload-inject.
This is using Linux 4.12-rc3 as a base for our driver.
From the info, it seems the hd audio module is loaded, and nodes are created. But aplay -l can't find the devices. This is pretty strange. Please try to reload the hda modules, and try playback again. And send the dmesg. (In reply to Libin Yang from comment #4) > From the info, it seems the hd audio module is loaded, and nodes are > created. But aplay -l can't find the devices. This is pretty strange. Please > try to reload the hda modules, and try playback again. And send the dmesg. Sorry, but this is quite irrelevant to this bug, isn't it? Misconfiguration should never cause a NULL pointer dereference, so can we concentrate on that? Here is the function: static void call_hp_automute(struct hda_codec *codec, struct hda_jack_callback *jack) { struct hda_gen_spec *spec = codec->spec; if (spec->hp_automute_hook) spec->hp_automute_hook(codec, jack); else snd_hda_gen_hp_automute(codec, jack); } Given that there is a NULL pointer dereference at the offset 0xa20, it means we are trying to read a value at the offset 0xa20 of a structure to which we have a NULL pointer. In this case, there are only two de-references, one on codec and one on spec. struct hda_codec_driver is a pretty small structure while spec is pretty gigantic and the field hp_automute_hook is down at the bottom. To I would go for that theory first. Now, in what cases could codec->spec be NULL? Is this something you make sure about or are there rare cases where this could happen? I see the following places where this happens: - snd_hda_codec_cleanup_for_unbind - snd_hda_gen_free - Too many other places that do not directly assign NULL, but a parameter (unlikely to be NULL by mistake). So, based on this, the fact that it happened twice in 26 runs and the fact that it happened during i915's reload test, it looks quite likely to be a use-after-free in snd_hda_codec_cleanup_for_unbind. Especially since the call came from a workitem and there is no such thing as reference counting in the codec structure, as far as I can tell. Could this be fixed? I can hide the bug for our CI by adding a NULL pointer check, but please fix this for real by making sure that codec is valid until all users are done with their work. My suggestion test is to try to find why it is using generic driver. However, if you can fix it in generic driver, please make a patch and send it to community. (In reply to Libin Yang from comment #6) > My suggestion test is to try to find why it is using generic driver. Hmm, I get your point, but whatever the reason why, it should still not make any null pointer de-reference. However, did you see that this is for the i965G platform, which is 10 years old? Is the non-generic driver supporting this platform? > However, if you can fix it in generic driver, please make a patch and send > it to community. Well, I already wrote a patch that papers around the issue and will still report it in the logs[1], but this is not a proper patch. I do not have any background in the sound subsystem, so I do not know what are the correct assumptions to have, especially around the binding and unbinding of the driver. I am merely doing some QA/CI job here, and reporting the issue to you, who own this code. I am however willing to help, but not by configuring the system differently in order to get rid of the problem. This would be a workaround, not a fix. [1] https://patchwork.freedesktop.org/patch/160525/ (In reply to Martin Peres from comment #7) > (In reply to Libin Yang from comment #6) > > My suggestion test is to try to find why it is using generic driver. > > Hmm, I get your point, but whatever the reason why, it should still not make > any null pointer de-reference. Yes, there seems 2 issues: 1) why using generic; 2) the null pointer. For the second issue, it seems you've already find a solution. I recommend you submit a patch. > > However, did you see that this is for the i965G platform, which is 10 years > old? Is the non-generic driver supporting this platform? We did not do any test on i965G platform. It's not our scope. > > > However, if you can fix it in generic driver, please make a patch and send > > it to community. > > Well, I already wrote a patch that papers around the issue and will still > report it in the logs[1], but this is not a proper patch. > > I do not have any background in the sound subsystem, so I do not know what > are the correct assumptions to have, especially around the binding and > unbinding of the driver. I am merely doing some QA/CI job here, and > reporting the issue to you, who own this code. > > I am however willing to help, but not by configuring the system differently > in order to get rid of the problem. This would be a workaround, not a fix. > > [1] https://patchwork.freedesktop.org/patch/160525/ Please feel free to make the patch and submit it. Takashi is very professional and nice. I did think we need check the null pointer. But maybe it is not as your patch. Maybe it is in another place. (In reply to Libin Yang from comment #8) > (In reply to Martin Peres from comment #7) > > (In reply to Libin Yang from comment #6) > > > My suggestion test is to try to find why it is using generic driver. > > > > Hmm, I get your point, but whatever the reason why, it should still not make > > any null pointer de-reference. > > Yes, there seems 2 issues: 1) why using generic; 2) the null pointer. > > For the second issue, it seems you've already find a solution. I recommend > you submit a patch. I will wait to actually catch the bug red-handed. I pushed the patch in our CI, we should have a clear culprit in the coming days. > > > > > However, did you see that this is for the i965G platform, which is 10 years > > old? Is the non-generic driver supporting this platform? > > We did not do any test on i965G platform. It's not our scope. That's why automated CI is good ;) > > > > > > However, if you can fix it in generic driver, please make a patch and send > > > it to community. > > > > Well, I already wrote a patch that papers around the issue and will still > > report it in the logs[1], but this is not a proper patch. > > > > I do not have any background in the sound subsystem, so I do not know what > > are the correct assumptions to have, especially around the binding and > > unbinding of the driver. I am merely doing some QA/CI job here, and > > reporting the issue to you, who own this code. > > > > I am however willing to help, but not by configuring the system differently > > in order to get rid of the problem. This would be a workaround, not a fix. > > > > [1] https://patchwork.freedesktop.org/patch/160525/ > > Please feel free to make the patch and submit it. Takashi is very > professional and nice. I did think we need check the null pointer. But maybe > it is not as your patch. Maybe it is in another place. What mailing list should I start the conversation in? I really think that the unbind in hda-generic.c is very sketchy, but I may be missing out on something. Thanks for your feedback! The community mail list is: alsa-devel@alsa-project.org. Takashi's email is: tiwai@suse.de. There is a document 'submitting-patches' in kenrel documentation folder telling you how to submit patches to kernel. Please read it before submitting patch. There are Intel internal mail lists: linux-audio-intel@eclists.intel.com and linux-drivers-review@eclists.intel.com. You can submit the patch to these mail lists for internal review before submitting patches to kernel. (In reply to Libin Yang from comment #10) > The community mail list is: alsa-devel@alsa-project.org. Takashi's email is: > tiwai@suse.de. > > There is a document 'submitting-patches' in kenrel documentation folder > telling you how to submit patches to kernel. Please read it before > submitting patch. > > There are Intel internal mail lists: linux-audio-intel@eclists.intel.com and > linux-drivers-review@eclists.intel.com. You can submit the patch to these > mail lists for internal review before submitting patches to kernel. Thanks, but I really don't think the patch is useful. I instead filed a bug on the official bugtracker of Alsa, and added all the details I could. Here it is: https://bugzilla.kernel.org/show_bug.cgi?id=196045 Martin will you leave this bug open until the problem from https://bugzilla.kernel.org/show_bug.cgi?id=196045 is solved? should this bug be closed? (In reply to Ricardo from comment #12) > Martin will you leave this bug open until the problem from > https://bugzilla.kernel.org/show_bug.cgi?id=196045 is solved? should this > bug be closed? I will leave it open as we can't track bugs from bugzilla.kernel.org in CI-bug-log. Maybe I should change the config file there, so as we could. Let me see. (In reply to Martin Peres from comment #13) > (In reply to Ricardo from comment #12) > > Martin will you leave this bug open until the problem from > > https://bugzilla.kernel.org/show_bug.cgi?id=196045 is solved? should this > > bug be closed? > > I will leave it open as we can't track bugs from bugzilla.kernel.org in > CI-bug-log. Maybe I should change the config file there, so as we could. > > Let me see. OK, done. Let's close this bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.