Bug 10445 - UTF-8 support is next to nonexistent
Summary: UTF-8 support is next to nonexistent
Status: NEW
Alias: None
Product: Sisyphus
Classification: Development
Component: coreutils (show other bugs)
Version: unstable
Hardware: all Linux
: P3 normal
Assignee: placeholder@altlinux.org
QA Contact: qa-sisyphus
URL: http://lists.altlinux.org/pipermail/d...
Keywords:
: 10520 (view as bug list)
Depends on:
Blocks: 10446
  Show dependency tree
 
Reported: 2006-12-18 12:58 MSK by Michael Shigorin
Modified: 2019-08-01 10:54 MSK (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Shigorin 2006-12-18 12:58:02 MSK
It's known that GNU coreutils don't really work with Unicode at this time (even
utilities like tr and sed); there are limited attempts to remedy this at RH:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=120933

and there's a corresponding bug in Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861

Probably something could be cleanroomed from the Heirloom Toolchest which claims
to support UTF-8 and at least tr and sed do:
http://heirloom.sourceforge.net/tools.html

Regarding tr:
http://mail.nl.linux.org/linux-utf8/2003-08/msg00224.html

Some more (aging) patches are available here:
http://www.openi18n.org/subgroups/utildev/dli18npatch2.html
Comment 1 Michael Shigorin 2006-12-26 11:25:33 MSK
*** Bug 10520 has been marked as a duplicate of this bug. ***
Comment 2 Dmitry V. Levin 2008-04-24 15:24:48 MSD
current state of things:
http://lists.gnu.org/archive/html/bug-coreutils/2008-04/msg00231.html

BTW, multibyte support in grep is awkward (grep works several _magnitudes_
slower in UTF-8), so I have to disable this "support" in scripts:
http://git.altlinux.org/people/ldv/packages/?p=hasher.git;a=commit;h=1.2.5-alt1-15-gd1764b6
Comment 3 Sergey V Turchin 2019-08-01 10:53:18 MSK
2 LDV: redmine#6375,6581

P.S.
Можно тоже удалить.
Comment 4 Sergey V Turchin 2019-08-01 10:54:50 MSK
Извиняюсь. Не заметил, что там дело уже пошло.