Bug 94009

Summary: libreoffice broken by harfbuzz 1.1.3: commit 6f2e6de1fae0ab2269b472a750788817de6c2a6e at fault
Product: HarfBuzz Reporter: Nix <nix>
Component: srcAssignee: Behdad Esfahbod <freedesktop>
Status: RESOLVED NOTABUG QA Contact:
Severity: normal    
Priority: medium CC: dr.khaled.hosny, freedesktop, mst.fdo
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Nix 2016-02-05 17:28:46 UTC
(Tested with a system with freetype 2.6.2, fontconfig from git HEAD with the not-yet-applied patch I submitted a few weeks ago to hugely speed up fontconfig clean-cache startup with NFS-mounted font dirs: libreoffice still started up when that patch was applied, so it cannot be the problem.)

All versions of libreoffice I have tested (4.x and 5.0.x both) dump core with this release, with a backtrace like

0x00007fffe43f2ee3 in hb_ft_get_font_h_extents(hb_font_t*, void*, hb_font_extents_t*, void*) () from /usr/lib/libharfbuzz.so.0
#0  0x00007fffe43f2ee3 in hb_ft_get_font_h_extents(hb_font_t*, void*, hb_font_extents_t*, void*) () from /usr/lib/libharfbuzz.so.0
#1  0x00007ffff26a0c93 in hb_font_funcs_set_glyph_v_origin_func () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#2  0x00007fffe43f350f in _hb_ft_font_set_funcs(hb_font_t*, FT_FaceRec_*, bool) () from /usr/lib/libharfbuzz.so.0
#3  0x00007fffe43f376f in hb_ft_font_create () from /usr/lib/libharfbuzz.so.0
#4  0x00007fffeaa493e3 in af_face_globals_new () from /usr/lib/libfreetype.so.6
#5  0x00007fffeaa49be5 in af_autofitter_load_glyph () from /usr/lib/libfreetype.so.6
#6  0x00007fffeaa0691e in FT_Load_Glyph () from /usr/lib/libfreetype.so.6
#7  0x00007ffff262e79a in ServerFont::InitGlyphData(unsigned int, GlyphData&) const () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#8  0x00007ffff2630525 in ServerFont::FetchFontMetric(ImplFontMetricData&, long&) const () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#9  0x00007ffff265a9ae in CairoTextRender::GetFontMetric(ImplFontMetricData*, int) () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#10 0x00007ffff24375d2 in OutputDevice::ImplNewFont() const () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#11 0x00007ffff243a300 in OutputDevice::ImplLayout(rtl::OUString const&, int, int, Point const&, long, long const*, SalLayoutFlags, vcl::TextLayoutCache const*) const () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#12 0x00007ffff24414b8 in OutputDevice::DrawText(Point const&, rtl::OUString const&, int, int, std::vector<Rectangle, std::allocator<Rectangle> >*, rtl::OUString*) () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#13 0x00007fffd2fcedbb in (anonymous namespace)::SplashScreenWindow::Paint(OutputDevice&, Rectangle const&) () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/../program/libspllo.so
#14 0x00007fffd2fcf1b4 in (anonymous namespace)::SplashScreenWindow::Redraw() () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/../program/libspllo.so
#15 0x00007fffd2fcf350 in (anonymous namespace)::SplashScreen::setValue(int) () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/../program/libspllo.so
#16 0x00007ffff7916d24 in desktop::Desktop::Main() () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libsofficeapp.so
#17 0x00007ffff259dd91 in ImplSVMain() () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#18 0x00007ffff259dde2 in SVMain() () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libvcllo.so
#19 0x00007ffff793cfb2 in soffice_main () from /pkg/libreoffice/5.0.3.2.0-4/lib/libreoffice/program/libsofficeapp.so
#20 0x000000000040071b in main ()

Commit 6f2e6de1fae0ab2269b472a750788817de6c2a6e jumps out of the commit log as probably responsible, and bisection confirms it. The problem is that LibreOffice manipulates the hb_font_funcs_t itself, using its own reimplementation of _hb_ft_font_set_funcs (which it calls vcl/generic/glyphs/gcach_layout.cxx:getFontFuncs()). This more or less means that LibreOffice will crash when *anything* is added to the font funcs :( I can't see an obvious way to fix this: you can't just make corresponding additions to LibreOffice, since that means you have to update LibreOffice and HarfBuzz in parallel.
Comment 1 Nix 2016-02-05 17:34:19 UTC
This change may also break Pango, XeTeX, WebKit (GTK and Qt), Qt 5, libass, chromium, mplayer, and the Enlightenment Foundation Libraries, all of which also call hb_font_funcs_create() themselves.
Comment 2 Behdad Esfahbod 2016-02-05 20:02:05 UTC
> The problem is that LibreOffice manipulates the hb_font_funcs_t itself

That sounds utterly wrong.  The declaration for hb_font_funcs_t is in a private header for a reason.

CC'ing Khaled who can definitely figure out what's going on.
Comment 3 Nix 2016-02-05 20:29:42 UTC
I wondered if it was wrong to do that, but when codesearch.debian.net came up with so many other packages that were doing the same thing, I came to the conclusion that it must be kosher. (Or, if not, a whole bunch of packages need fixing.)
Comment 4 Behdad Esfahbod 2016-02-05 20:38:38 UTC
(In reply to Nix from comment #3)
> I wondered if it was wrong to do that, but when codesearch.debian.net came
> up with so many other packages that were doing the same thing, I came to the
> conclusion that it must be kosher. (Or, if not, a whole bunch of packages
> need fixing.)

Ok, I'm not sure we are talking about the same thing.  Which function call are you talking about again?  If it's in a hb*.h file, it's kosher.  If it's not, I'll go hunt them down.
Comment 5 Khaled Hosny 2016-02-05 20:38:53 UTC
LibreOffice calls hb_font_funcs_create() and hb_font_funcs_set_*_func() functions, no direct manipulation of font functions or anything:
https://gerrit.libreoffice.org/gitweb?p=core.git;a=blob;f=vcl/unx/generic/glyphs/gcach_layout.cxx;hb=HEAD#l283


I have HarfBuzz 1.1.3 and LibreOffice 5.0.4 here and it starts normally, no crashes.
Comment 6 Behdad Esfahbod 2016-02-05 20:41:39 UTC
(In reply to Khaled Hosny from comment #5)
> LibreOffice calls hb_font_funcs_create() and hb_font_funcs_set_*_func()
> functions, no direct manipulation of font functions or anything:
> https://gerrit.libreoffice.org/gitweb?p=core.git;a=blob;f=vcl/unx/generic/
> glyphs/gcach_layout.cxx;hb=HEAD#l283
> 
> 
> I have HarfBuzz 1.1.3 and LibreOffice 5.0.4 here and it starts normally, no
> crashes.

Thanks Khaled.  That's what I have had expected.  So, Nix is seeing something unusual.  Needs debugging on his side.
Comment 7 Khaled Hosny 2016-02-05 20:46:52 UTC
The crash in the traceback above is traced to FreeType’s use of HarfBuzz not LibreOffice’s, af_face_globals_new() specifically.
Comment 8 Nix 2016-02-05 20:48:07 UTC
Ah, maybe we're not. The culprit is here:

<https://github.com/LibreOffice/core/blob/libreoffice-5-0-5/vcl/generic/glyphs/gcach_layout.cxx#L273>

LibreOffice hasn't actually *reimplemented* _hb_ft_font_set_funcs() -- it is not relying on ELF interposition or anything: that would clearly be wrong. Rather, it has its own function which does the same thing as that function does, calling hb_font_funcs_create() and then filling out the font funcs itself. The API tests do this, so it looks like it should work, to me. But with Simon's change, it crashes.
Comment 9 Nix 2016-02-05 20:48:53 UTC
Possibly it's the FreeType version? Khaled, are you using FreeType 2.6.2?
Comment 10 Nix 2016-02-05 20:52:15 UTC
btw, I am not the only person this has happened to. Random distro bug report of clearly the same crash: <https://crux.nu/bugs/index.php?do=details&task_id=1286>
Comment 11 Khaled Hosny 2016-02-05 20:53:22 UTC
(In reply to Nix from comment #9)
> Possibly it's the FreeType version? Khaled, are you using FreeType 2.6.2?

FreeType 2.6.2 here as well.
Comment 12 Behdad Esfahbod 2016-02-05 21:06:29 UTC
Humm.  Is it possible that two versions of HarfBuzz are being pulled in, one by LibreOffice, perhaps included, and one by FreeType, from the system?
Comment 13 Khaled Hosny 2016-02-05 21:07:41 UTC
Just speculating here, but there is something fishy in this backtrace, why the called hb_font_funcs_set_glyph_v_origin_func() is commming from a LibreOffice library? Looks like LibreOffice is statically linked with HarfBuzz and the dynamic loader is resolving some symbols from the LibreOffice version and a version mismatch is causing the crash here. If this is true, it strikes me as a bad idea to statically link HarfBuzz in this case given FreeType’s dependency on it.
Comment 14 Nix 2016-02-05 21:10:05 UTC
Ugh. If it was any other software, I would guess not.

But this is LibreOffice. And it does indeed bundle its own copy of HarfBuzz, and does not visibly allow you to stop it doing that. And it does indeed build it in.

I bet that's it. This also explains why hb_font_funcs_set_glyph_v_origin_func() is showing up as inside LO itself.

I'll try forcibly upgrading its bundled tarball, and do a LO rebuild, just to verify (this is clearly not an acceptable fix, though!). This may take some time...

What a horror show :(
Comment 15 Behdad Esfahbod 2016-02-05 21:15:52 UTC
Sorry for your pain... Have felt it with Chrome before.
Comment 16 Khaled Hosny 2016-02-05 21:35:06 UTC
Is this a distro package or TDF build? The later indeed statically links HarfBuzz for the obvious reason but not FreeType, so I think this should be reported against LibreOffice. If it is a distro package they can (and should) build with system HarfBuzz.
Comment 17 Nix 2016-02-06 00:18:56 UTC
OK, confirmed fixed by passing --with-system-harfbuzz to LibreOffice configure, which is not the default even if a system freetype is in use, and judging by that distro bug report I am not the first person to fail to notice this, not even the first distro developer. It doesn't help that this wrong default happened to work for years, until the first (blameless) hb_font_funcs_t change hit HarfBuzz!

Normally, I'd spot something as obvious as building in a system library and then linking to it dynamically too, and I've certainly never made the mistake of reporting the consequences as a bug before, but LO is so huge that overlooking it was horrifyingly easy. I have scripting which should have detected this case, along with other cases of unintended symbol interposition, and it didn't warn, also causing me to mistakenly consider this a real bug: in fact, an elfutils upgrade subtly changed the output of eu-readelf and broke said scripting...

Set to NOTABUG accordingly. Sorry for wasting your time.
Comment 18 Khaled Hosny 2016-02-06 09:52:05 UTC
Distros usually build with --with-system-libs which does not use any of the bundled libraries. I still think a bug should be reported against LibreOffice to fix the static linking case.
Comment 19 Michael Stahl 2016-02-18 13:28:34 UTC
the TDF binary releases of LO have to run on basically any GNU/Linux system
that's newer than some baseline; at the time when harfbuzz was added as a
dependency that was RHEL5 and since LO 5.1 it's RHEL6.

i was going to say that if harfbuzz ABI stability is good and the version in
RHEL6 is "new enough" to support all the features LO needs then TDF builds
could use the system harfbuzz like is done for freetype/fontconfig ...
but alas, RHEL6 doesn't have a harfbuzz package.

so what we often do in such cases is to link the dependency statically
and hide its symbols, so that the LO bundled one does not collide
with a system one in the awesome glorious ELF global namespace
(btw, other OS don't have this problem...).

(this is also the only way you can even ship ABI unstable garbage
like OpenSSL, liked statically several times into all its dependents...
but i digress)

it looks like for harfbuzz we got the static linking part but missed
the hiding symbols part, and for years it worked by accident. sigh.

... i've tried to fix this with this commit
https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=549130ab5d9616f7eb5504db31546b386737ccb2

now at least this is empty:
nm --defined-only --dynamic instdir/program/libvcllo.so | grep hb_

i'm not sure if there are conventions with autotools regarding
symbol visibility, perhaps --enable-static --disable-shared
should use hidden visibility by default?  i believe some other
libraries that we bundle work that way.
Comment 20 aceman 2016-05-24 19:23:36 UTC
Hi, in which LO version was this fixed?
Comment 21 Michael Stahl 2016-05-24 19:43:22 UTC
fixed in LO 5.2
backported fix in LO 5.1.2.1 (commit 37f7d9ee1d9399c5aa5e9975579f319b1fa045dc)
backported fix in LO 5.0.6.1 (commit 395995f03dd640aee28767f6d920901d91dd3bee)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.