GC FAQ -- language interface

Implicit vs explicit heap allocation

Some languages perform heap allocation implicitly, meaning that many operations may cause storage to be allocated. For example, there are cases in Scheme where activation records and numbers will be allocated on the heap as the result of simply calling a subroutine or adding two numbers.

Other languages make the request for memory allocation explicit. Examples here are Modula-3, Java, and use of the C and C++ compatible collectors.

Invocation and control of garbage collection

Garbage collectors are run in various ways:

Weak pointers

Weak pointers are like pointers, except that references from weak pointers do not prevent garbage collection, and weak pointers must have their validity checked before they are used.

Weak pointers interact with the garbage collector because the memory to which they refer may in fact still be valid, but containing a different object than it did when the weak pointer was created. Thus, whenever a garbage collector recycles memory, it must check to see if there are any weak pointers referring to it, and mark them as invalid (this need not be implemented in such a naive way).

Finalization and Destruction

Finalization, or destruction, is problematic. It is generally useful, but people disagree on how certain and timely it should be. The difficulty here is that in practice, no garbage collector provides an absolute guarantee that it will detect every single instance of unreachable storage in a bounded amount of time. Some garbage collector designs defer collection of regions of memory likely to contain mostly live memory, some can be "fooled" by bit patterns stored in integers, and in some cases artifacts of compilation (such as register use and calling conventions) will keep memory allocated long after it "should be" recycled. Reference counting is not immune to this problem; overflowed reference counts and cycles can both prevent collection, and some reference counting algorithms defer collection.

Yet another problem with finalization is the difficulty of defining a proper order for finalization. There are numerous problems, none with a clean solution.

For instance, it makes a certain amount of sense to finalize objects with zero direct references first, discard those, and continue finalizing the new set of objects with zero direct references. That is, finalize in topological order. This has two problems, one in theory, one in practice. In theory, of course, a cycle in the graph of objects to be finalized will prevent a topological sort from succeeding. In practice, the "right" thing to do appears to be to signal an error (at least when debugging) and let the programmer clean this up. People with experience on large systems report that such cycles are in fact exceedingly rare (note, however, that some languages define "finalizers" for almost every object, and that was not the case for the large systems studied -- there, finalizers were not too common).

The practical problem is that finalizers can "revive" dead objects. Dead objects can certainly refer to live objects, and it is also entirely possible for the code in a finalizer to store a pointer to a currently-dead object into a live object, thus reviving the "dead" object, perhaps even the object on whose behalf the finalizer was being run. This means either that a write barrier must be enforced so that this can be detected, or that after each garbage collection, only those objects that have zero references can be finalized; the rest (those that ought to have zero references after the first batch is recycled) must wait for a subsequent garbage collection. Write barriers are not always an option, and deferring finalization more or less guarantees that more memory will be consumed and that resources will be slowly reclaimed.

The other approach is to declare that finalizers can be run in any order whatsoever over unreachable storage. This has the unfortunate side-effect of making it difficult to write finalizers, because the other objects to which an object Obj refers (and upon which its state may depend) may have already been finalized by the time Obj's finalizer is run. This is especially difficult when the compiler and collector are not cooperating, because what looks like separate memory objects to the collector may in fact be part of what is logically a single object at the source language level (and hence bugs may appear that the programmer has no way of preventing).

Yet another approach is to declare that finalizers shall not revive objects. With a stop-and-copy collector, this is not too hard to detect for debugging purposes; garbage is collected, finalizers are run, and the live objects are scanned for any references to just-finalized objects. Those found can be reported to the programmer. (This can fail in a conservative collector, if finalizers write pointer-like bit patterns into non-pointer data.)

In the case of Java, the approach taken was to declare that finalizers are never run more than once per object; if an object is revived in finalization, that is fine, but its finalizer will not run a second time. (It isn't clear if this is a matter of design, or merely an accident of the first implementation of the language, but it is in the specification now. Obviously, this encourages careful use of finalization, in much the same way that driving without seatbelts encourages careful driving.)

One partial solution to this problem is to encourage people to use "the right tools for the job". Languages with garbage collection often include control constructs for running finalizers at certain points in a program's execution. These can be used where timely, certain, finalization is required. Finalization associated with garbage collection can be used for those resources that are either abundant-but-not-infinite (like memory) or as a statistical backstop to reduce the loss of resources that are managed by hand (similar to the way in which garbage collection itself can be used as a backstop for manual deallocation).

GC FAQ table of contents
Techniques and algorithms
Language interfaces
Difficult topics
Silly stuff