Bug 36065

Summary: segfault during IO cleanup when glibc 2.0 compatibilty is used
Product: Sisyphus Reporter: Ivan A. Melnikov <iv>
Component: glibc-coreAssignee: placeholder <placeholder>
Status: CLOSED FIXED QA Contact: qa-sisyphus
Severity: normal    
Priority: P3 CC: aen, glebfm, jqt4, ldv, placeholder, sin
Version: unstable   
Hardware: all   
OS: Linux   
Bug Depends on:    
Bug Blocks: 26799    
Attachments:
Description Flags
/bin/cat from Debian 2.0 mentioned in the description
none
The mipsel binary it all started with (for me) none

Description Ivan A. Melnikov 2019-02-08 15:59:27 MSK
I was building fpc for sisyphus_mipsel. During the build, one of the
intermediate tools (build with fpc) crashed with segmentaion fault:

$ ./gdbver
GDB version is 8.2.50.20180917-alt2.0.mips1 (ALT Sisyphus_mipsel)
Runtime error 216 at $77CC06B0
  $77CC06B0

This executable prints the version of GDB (it's linked with libgdb to get it) and exit with zero code.

Running the executable under gdb with glibc-core-debuginfo installed, I discovered that in one of the glibc's exithandlers. Here is the backtrace:

(gdb) run
Starting program: /home/iv/fpc/gdbver
GDB version is 8.2.50.20180917-alt2.0.mips1 (ALT Sisyphus_mipsel)

Program received signal SIGSEGV, Segmentation fault.
0x77e846b0 in __GI__IO_wsetb (f=0x77fa3dd4 <_IO_stdout_>, b=0x0, eb=0x0, a=0) at wgenops.c:96

(gdb) bt
#0  0x77e846b0 in __GI__IO_wsetb (f=0x77fa3dd4 <_IO_stdout_>, b=0x0, eb=0x0, a=0) at wgenops.c:96
#1  0x77e92e30 in _IO_unbuffer_all () at genops.c:883
#2  _IO_cleanup () at genops.c:930
#3  0x77e4beac in __run_exit_handlers (status=255, listp=0x77fa33dc <__exit_funcs>, run_list_atexit=<optimized out>, run_dtors=<optimized out>) at exit.c:130
#4  0x77e4bf2c in __GI_exit (status=<optimized out>) at exit.c:139
#5  0x77e321a8 in __libc_start_main (main=0x402ee8, argc=1, argv=0x7fbf74a4, init=<optimized out>, fini=0x40dd64 <__libc_csu_fini>, rtld_fini=0x77fd1900 <_dl_fini>, stack_end=0x7fbf7480) at libc-start.c:342
#6  0x00402ee8 in ?? ()

Further investigation revealed that fpc generates binaries that don't provide _IO_stdin_used symbol, so on the platforms where glibc strives to be binary compatible with glibc 2.0, a special compatibilty code is used for such binaries. Notably, mipsel and i586 are among such platforms, but
x86_64 is not.

For such platform, _IO_stdin, _IO_stdout, _IO_stderr and _IO_list_all global variables are set to point to special variables that are defined using old _IO_FILE definition compatible with glibc 2.0. This variant of _IO_FILE structure is smaller (80 vs 160 bytes in struct since glibc 2.1 on mipsel) and does not contain any fields required for wide I/O. Those variables are updated during glibc initialization in _IO_check_libio function:

http://git.altlinux.org/people/iv/packages/glibc.git?p=glibc.git;a=blob;f=libio/oldstdfiles.c;h=92d0f4a0d3a58d9f193cd3ba1ec1eaa767120019;hb=b687516b1f437cfb2302989739fd27e95bd4fe63#l78

But _IO_unbuffer_all called by _IO_cleanup does not check if it deals with old-style structures; so, if _IO_list_all contains pointers to old-style _IO_FILEs, it just reads some memory behind the the structure (in the case of gdbver binary, that happens to be some GOT entries), which leads to crash here:

http://git.altlinux.org/people/iv/packages/glibc.git?p=glibc.git;a=blob;f=libio/genops.c;h=d6f80506690a6a921686b7bd6a606e4060a040a4;hb=b687516b1f437cfb2302989739fd27e95bd4fe63#l883

I tried to reproduce the problem on i586. For that, I found a pretty old cat binary from Debian archives:

http://archive.debian.org/debian/dists/Debian-2.0/main/binary-i386/base/textutils_1.22-2.4.deb

In i586 hasher it runs normally, but under valgrind it crashes:

[builder@localhost ~]$ valgrind ./cat /dev/null
==26578== Memcheck, a memory error detector
==26578== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==26578== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==26578== Command: ./cat /dev/null
==26578==
==26578== Invalid read of size 4
==26578==    at 0x40B0AFA: _IO_wsetb (wgenops.c:96)
==26578==    by 0x40BB681: _IO_unbuffer_all (genops.c:883)
==26578==    by 0x40BB681: _IO_cleanup (genops.c:930)
==26578==    by 0x4076E01: __run_exit_handlers (exit.c:130)
==26578==    by 0x4076E40: exit (exit.c:139)
==26578==    by 0x804991F: ??? (in /usr/src/cat)
==26578==    by 0x8048EB9: ??? (in /usr/src/cat)
==26578==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==26578==
==26578==
==26578== Process terminating with default action of signal 11 (SIGSEGV)
==26578==  Access not within mapped region at address 0x18
==26578==    at 0x40B0AFA: _IO_wsetb (wgenops.c:96)
==26578==    by 0x40BB681: _IO_unbuffer_all (genops.c:883)
==26578==    by 0x40BB681: _IO_cleanup (genops.c:930)
==26578==    by 0x4076E01: __run_exit_handlers (exit.c:130)
==26578==    by 0x4076E40: exit (exit.c:139)
==26578==    by 0x804991F: ??? (in /usr/src/cat)
==26578==    by 0x8048EB9: ??? (in /usr/src/cat)
==26578==  If you believe this happened as a result of a stack
==26578==  overflow in your program's main thread (unlikely but
==26578==  possible), you can try to increase the size of the
==26578==  main thread stack using the --main-stacksize= flag.
==26578==  The main thread stack size used in this run was 8388608.
==26578== Invalid free() / delete / delete[] / realloc()
==26578==    at 0x403561D: free (vg_replace_malloc.c:530)
==26578==    by 0x4177B39: free_mem (in /lib/libc-2.27.so)
==26578==    by 0x4177EC9: __libc_freeres (in /lib/libc-2.27.so)
==26578==    by 0x402A1FA: _vgnU_freeres (vg_preloaded.c:77)
==26578==    by 0x41E2FFF: ??? (in /lib/libc-2.27.so)
==26578==  Address 0x804bd3d is 0 bytes inside data symbol "_nl_default_dirname"
==26578==
==26578== Invalid free() / delete / delete[] / realloc()
==26578==    at 0x403561D: free (vg_replace_malloc.c:530)
==26578==    by 0x4177EFC: __libc_freeres (in /lib/libc-2.27.so)
==26578==    by 0x402A1FA: _vgnU_freeres (vg_preloaded.c:77)
==26578==    by 0x41E2FFF: ??? (in /lib/libc-2.27.so)
==26578==  Address 0xffffffff is not stack'd, malloc'd or (recently) free'd
==26578==
==26578==
==26578== HEAP SUMMARY:
==26578==     in use at exit: 10 bytes in 1 blocks
==26578==   total heap usage: 4 allocs, 5 frees, 4,128 bytes allocated
==26578==
==26578== LEAK SUMMARY:
==26578==    definitely lost: 0 bytes in 0 blocks
==26578==    indirectly lost: 0 bytes in 0 blocks
==26578==      possibly lost: 0 bytes in 0 blocks
==26578==    still reachable: 10 bytes in 1 blocks
==26578==         suppressed: 0 bytes in 0 blocks
==26578== Rerun with --leak-check=full to see details of leaked memory
==26578==
==26578== For counts of detected and suppressed errors, rerun with: -v
==26578== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Apparently, under normal circumstances such binaries are just lucky, but valgrind reveals that the error is in fact present.

This may be related to https://bugzilla.altlinux.org/26799, I did not check yet.

From the quick look, it seems that _IO_unbuffer_all function does not do anything that would be usefull in compatiblity mode, so it should be just skipped when _IO_stdin_used is not present:

http://git.altlinux.org/people/iv/packages/glibc.git?p=glibc.git;a=commitdiff;h=9df7edc364b0a7dd993a70a9c7f434d1ccbe4711

When glibc is build with this change for sisyphus_mipsel, gdbver binary runs normally and fpc build completes successfully, but I'm not 100% sure that this is the correct solution.
Comment 1 Ivan A. Melnikov 2019-02-08 16:04:18 MSK
Created attachment 8005 [details]
/bin/cat from Debian 2.0 mentioned in the description
Comment 2 Ivan A. Melnikov 2019-02-08 16:06:24 MSK
Created attachment 8006 [details]
The mipsel binary it all started with (for me)
Comment 3 Dmitry V. Levin 2019-02-08 17:54:12 MSK
Could you forward this to upstream bugzilla, please?

wrt the fix, I think you can use

if (_IO_fwide_maybe_incompatible)
  {
    ...
  }

instead of

#if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)
if (&_IO_stdin_used == NULL)
  {
    ...
  }
#endif
Comment 4 Ivan A. Melnikov 2019-02-08 19:52:11 MSK
(In reply to comment #3)
> Could you forward this to upstream bugzilla, please?

Will do, early next week.


> wrt the fix, I think you can use
> 
> if (_IO_fwide_maybe_incompatible)
>   {
>     ...
>   }
> 
> instead of
> 
> #if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)
> if (&_IO_stdin_used == NULL)
>   {
>     ...
>   }
> #endif

Seems nice.

Also it turns out I've pushed the wrong branch (oops...), so the fix I've put in description is broken. I've updated the branch and the fix to use _IO_fwide_maybe_incompatible:

http://git.altlinux.org/people/iv/packages/glibc.git?p=glibc.git;a=commitdiff;h=a290371042c61d33dfef7272c9462b3359731e06
Comment 5 Dmitry V. Levin 2019-02-09 01:42:51 MSK
See also https://sourceware.org/bugzilla/show_bug.cgi?id=17908
Comment 6 Dmitry V. Levin 2019-02-11 02:31:16 MSK
(In reply to comment #5)
> See also https://sourceware.org/bugzilla/show_bug.cgi?id=17908

The issue with executables generated by fpc looks exactly the same as in that upstream bug report.

Given that a random old x86 binary also misbehaves, I wonder whether the libio compatibility scheme still works properly in glibc on x86 nowadays.
Comment 7 Dmitry V. Levin 2019-02-13 02:38:06 MSK
I've just checked our glibc builds starting with glibc-2.5 from 4.0 against this ancient cat executable.
All demonstrate the same "Invalid free()" issue inside data symbol "_nl_default_dirname".
The "Invalid read of size 4" issue in _IO_wsetb first appears in glibc-2.23.
I think it was introduced by commit glibc-2.22.90-33-ga601b74d31 (aka glibc-2.23~693) which is indeed problematic as it adds unconditional use of fields not available in old struct _IO_FILE.
Comment 9 Repository Robot 2019-06-21 05:27:43 MSK
glibc-6:2.27-alt9 -> sisyphus:

Thu Jun 20 2019 Dmitry V. Levin <ldv@altlinux> 6:2.27-alt9
- Updated to glibc-2.27-119-gf056ac8363 from 2.27 branch
  (closes: #36065).