It's known that GNU coreutils don't really work with Unicode at this time (even utilities like tr and sed); there are limited attempts to remedy this at RH: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=120933 and there's a corresponding bug in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861 Probably something could be cleanroomed from the Heirloom Toolchest which claims to support UTF-8 and at least tr and sed do: http://heirloom.sourceforge.net/tools.html Regarding tr: http://mail.nl.linux.org/linux-utf8/2003-08/msg00224.html Some more (aging) patches are available here: http://www.openi18n.org/subgroups/utildev/dli18npatch2.html
*** Bug 10520 has been marked as a duplicate of this bug. ***
current state of things: http://lists.gnu.org/archive/html/bug-coreutils/2008-04/msg00231.html BTW, multibyte support in grep is awkward (grep works several _magnitudes_ slower in UTF-8), so I have to disable this "support" in scripts: http://git.altlinux.org/people/ldv/packages/?p=hasher.git;a=commit;h=1.2.5-alt1-15-gd1764b6
2 LDV: redmine#6375,6581 P.S. Можно тоже удалить.
Извиняюсь. Не заметил, что там дело уже пошло.