|Summary:||UTF-8 Italian text recognized as ISO-8859-1 Portuguese|
|Status:||RESOLVED MOVED||QA Contact:|
|i915 platform:||i915 features:|
|Bug Depends on:||101218|
Description Jehan 2017-08-18 12:11:28 UTC
Created attachment 133604 [details] UTF-8 text. See: https://github.com/BYVoid/uchardet/issues/36#issuecomment-323316171 The attached text is UTF-8 Italian, but since commit e138839f0753e223f7aa2733e8ed829b47a67cac (Portuguese support for ISO-8859-1), this text is recognized as ISO-8859-1. Not sure though if there is a proper solution apart from removing Portuguese support on short-term and adding actual language detection to UTF-8, longer term (see bug 101218). Also obviously the fact that the file just holds 2 words make it a difficult guess for a system based on statistics.
Comment 1 GitLab Migration User 2018-10-12 21:35:15 UTC
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/uchardet/uchardet/issues/6.