Play and learn 300 000+ tabs online

Thursday, May 27, 2010

Debugging Memory Errors in C/C++

Debugging Memory Errors in C/C++




This page describes a few key techniques I've learned about how
to debug programs that are suspected of containing memory errors.
Principally, this includes using memory after it has been freed,
and writing beyond the end of an array. Memory leaks are
considered briefly at the end.


It's of course rather presumptuous to even write these up, since so
much has already been written. I'm not intending to write the be-all
and end-all article, just to write up a few of the techniques I use
since I recently had the opportunity to help a friend debug such an
error. There's also some links at the end to other resources.


Note that I'm only interested here in memory errors that trash
part of the heap. Overwriting the stack may be a cracker's favorite
technique, but when it happens in front of the programmer it's
usually very easy to track down.

Why are memory errors hard to debug?




The first thing to understand about memory errors is why they're
different from other bugs. I claim the main reason they are harder
to debug is that they are fragile. By fragile, I mean the
bug will often only show up under certain conditions, and that
attempts to isolate the bug by changing the program or its input
often mask its effects. Since the programmer is then forced to
find the needle in the haystack, and cannot use techniques to cut
down on the size of the haystack, locating the cause of the problem
is very difficult.


Consequently, the first priority when tracking down suspected memory
errors is to make the bug more robust. There is a
bug in your code, but you need to do something so that the bug's
effects cannot be masked by other actions of the program.

Making the bug more robust




I know of two main techniques for reducing the fragility of a memory
bug:

  • Don't re-use memory.
  • Put empty space between memory blocks.



Why do these techniques help? First, by not re-using memory, we can
eliminate temporal dependencies between the bug and the
surrounding program. That is, if memory is not re-used, then it no
longer matters in what order the relevant blocks are allocated and
deallocated.


Second, by putting empty space between blocks, overwriting (or
underwriting) past the end of one block won't corrupt another. Thus,
we break spatial dependencies involving the bug. The space
between the bugs should be filled with a known value, and the space
should be periodically checked (at least when free is called on
that block) to see if the known value has been
changed.


With temporal and spatial dependencies reduced, it's less likely that
a change to the program or its input will disturb the evidence of the
bug's presence.


Of course, your machine must have enough spare memory to run the
experiment. But, by making the bug more robust, we can now cut down
on the input size! Thus in the end using more space in the short term
can lead to using less space in the final, minimized input test case.


The above two techniques are easily implemented in any debug heap
implementation. I've modified Doug Lea's malloc to implement
the features; my modified version is here:
malloc.c,
ckheap.h.
To compile with the debug features
described, set the preprocessor variables DEBUG and
DEBUG_HEAP. But of course you can use any implementation,
and the debug versions can simply be wrappers around the real malloc.

Using hardware watchpoints




Intel-compatible x86 processors include debug registers capable of
watching up to four addresses. Whenever a read or write to any of
the watched addresses happens, the program traps, and the debugger
gets control. The debug registers offer a powerful way to find out
what line of code is overwriting a given byte, once you know which
byte is being overwritten.


In
gdb,
the notation for using hardware watchpoints is a little
odd, because gdb likes to think of its input as a C expression.
If you want to stop when address 0xABCDEF is accessed, then at
the gdb prompt type

  (gdb) watch *((int*)0xABCDEF)



One difficulty is that you can't begin watching an address until
the memory it refers to has been mapped (requested from the operating
system for use by the program). The usual solution is to step through
the program at a rather coarse granularity (skipping over most function
calls) until you find a point in time where the address is mapped but
has not yet been trashed. Add the watchpoint, then let the program run
until the address is accessed.

An example




Suppose I have a program with a suspected memory error. I compile
it with the debug malloc.c, and
when I run
it I see:

  $ ./tmalloc<br />  trashed 1 bytes<br />  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.<br />  Aborted<br />



I first run the program in the debugger to find the offending address:

  (gdb) run<br />  Starting program: /home/scott/wrk/cplr/smbase/tmalloc<br />  trashed 1 bytes<br />  tmalloc: malloc.c:1591: checkZones: Assertion `!"right allocated zone trashed"' failed.<br /><br />  Program received signal SIGABRT, Aborted.<br />  0x400539f1 in __kill () from /lib/libc.so.6<br />  (gdb) up<br />  #1  0x400536d4 in raise (sig=6) at ../sysdeps/posix/raise.c:27<br />  27      ../sysdeps/posix/raise.c: No such file or directory.<br />  (gdb) up<br />  #2  0x40054e31 in abort () at ../sysdeps/generic/abort.c:88<br />  88      ../sysdeps/generic/abort.c: No such file or directory.<br />  (gdb) up<br />  #3  0x4004dfd2 in __assert_fail () at assert.c:60<br />  60      assert.c: No such file or directory.<br />  (gdb) up<br />  #4  0x8048d55 in checkZones (p=0x8050838 "\016\001", bytes=270)<br />      at malloc.c:1591<br />  (gdb) print p[bytes-1-i]<br />  $1 = 7 '\a'                 <----- trashed! should be 0xAA<br />  (gdb) print p+bytes-1-i<br />  $2 = (unsigned char *) 0x80508c6 "\a", '\252' <repeats 127 times><br />  (gdb)                  ^^^^^^^^^<br />                         this is the trashed address<br />



Now I restart the program and attempt to set a hardware watchpoint:

  (gdb) break main<br />  Breakpoint 1 at 0x8048b91: file tmalloc.c, line 81.<br />  (gdb) run<br />  The program being debugged has been started already.<br />  Start it from the beginning? (y or n) y<br /><br />  Starting program: /home/scott/wrk/cplr/smbase/tmalloc<br /><br />  Breakpoint 1, main () at tmalloc.c:81<br />  (gdb) watch *((int*)0x80508c6)<br />  Cannot access memory at address 0x80508c6<br />  (gdb)<br />



Ok, the memory isn't mapped yet. Single-stepping through main a
few times, I find a place where I can insert the watchpoint but
the memory in question hasn't yet been trashed. When I then continue
the program, the debugger next stops at the bug.

  (gdb) watch *((int*)0x80508c6)<br />  Hardware watchpoint 3: *(int *) 134547654<br />  (gdb) c<br />  Continuing.<br />  Hardware watchpoint 3: *(int *) 134547654<br /><br />  Old value = -1431655766<br />  New value = -1431655929<br />  offEnd () at tmalloc.c:33<br />  (gdb) print /x -1431655766<br />  $1 = 0xaaaaaaaa              <--- what it should be<br />  (gdb) print /x -1431655929<br />  $2 = 0xaaaaaa07              <--- what it became after trashing<br />  (gdb) list<br />  28<br />  29      void offEnd()<br />  30      {<br />  31        char *p = malloc(10);<br />  32        p[10] = 7;    // oops       <--- the bug<br />  33        free(p);<br />  34      }<br />  35<br />  36      void offEndCheck()<br />  37      {<br />  (gdb)<br />



In this small program the bug would have been obvious upon inspection,
but the technique of course generalizes to cases that are much more
complicated.

Dangling references




As mentioned above, a debug heap shouldn't re-use memory. Going one
step further, my debug malloc.c
overwrites
free()'d memory with another known pattern (but does not actually free
it). Then, if the program continues to use the memory the mistake
will become clear, especially if it tries to interpret the values it
finds as pointers (they'll segfault). Double-deallocation is also
easy to identify with this scheme.

Memory leaks




I usually debug memory leaks by printing statistics about calls to
malloc and free before and after certain sections of code. If there
are more calls to malloc, but the code isn't supposed to be creating
long-lived data, then that points to a potential problem. This
doesn't easily generalize to long-running programs, but if the program
can be broken into units and the leak properties of each unit checked
in isolation, most leaks can be found relatively easily.

Conclusion




The C and C++ languages are much-maligned for lack of memory safety,
but too often this is seen as a greater problem than it really is
(setting security issues aside for the moment). Debugging memory
requires a different approach than debugging other kinds of errors,
but with a little practice they can actually be easier and faster to
find, simply because the same techniques (and tools!) can be used over
and over.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.