Object resurrection

From HandWiki
(Redirected from Zombie object)
Short description: Phenomenon in object-oriented programming

In object-oriented programming languages with garbage collection, object resurrection is when an object comes back to life during the process of object destruction, as a side effect of a finalizer being executed.

Object resurrection causes a number of problems, particularly that the possibility of object resurrection – even if it does not occur – makes garbage collection significantly more complicated and slower, and is a major reason that finalizers are discouraged. Languages deal with object resurrection in various ways. In rare circumstances, object resurrection is used to implement certain design patterns, notably an object pool,[1] while in other circumstances resurrection is an undesired bug caused by an error in finalizers, and in general resurrection is discouraged.[2]

Process

Object resurrection occurs via the following process. First, an object becomes garbage when it is no longer reachable from the program, and may be collected (destroyed and deallocated). Then, during object destruction, before the garbage collector deallocates the object, a finalizer method may be run, which may in turn make that object or another garbage object (reachable from the object with a finalizer) reachable again by creating references to it, as a finalizer may contain arbitrary code. If this happens, the referenced object – which is not necessarily the finalized object – is no longer garbage, and cannot be deallocated, as otherwise the references to it would become dangling references and cause errors when used, generally program crash or unpredictable behavior. Instead, in order to maintain memory safety, the object is returned to life or resurrected.

In order to detect this, a garbage collector will generally do two-phase collection in the presence of finalizers: first finalize any garbage that has a finalizer, and then re-check all garbage (or all garbage reachable from the objects with finalizers), in case the finalizers have resurrected some garbage. This adds overhead and delays memory reclamation.

Resurrected objects

A resurrected object may be treated the same as other objects, or may be treated specially. In many languages, notably C#, Java, and Python (from Python 3.4), objects are only finalized once, to avoid the possibility of an object being repeatedly resurrected or even being indestructible; in C# objects with finalizers by default are only finalized once, but can be re-registered for finalization. In other cases resurrected objects are considered errors, notably in Objective-C; or treated identically to other objects, notably in Python prior to Python 3.4.

A resurrected object is sometimes called a zombie object or zombie, but this term is used for various object states related to object destruction, with usage depending on language and author. A "zombie object" has a specialized meaning in Objective-C, however, which is detailed below. Zombie objects are somewhat analogous to zombie processes, in that they have undergone a termination state change and are close to deallocation, but the details are significantly different.

Variants

In the .NET Framework, notably C# and VB.NET, "object resurrection" instead refers to the state of an object during finalization: the object is brought back to life (from being inaccessible), the finalizer is run, and then returned to being inaccessible (and no longer is registered for future finalization). In .NET, which objects need finalization is not tracked object-by-object, but instead is stored in a finalization "queue",[lower-alpha 1] so rather than a notion of resurrected objects in the sense of this article, one speaks of objects "queued for finalization". Further, objects can be re-enqueued for finalization via GC.ReRegisterForFinalize, taking care to not multiply enqueue objects.[2]

Mechanism

There are two main ways that an object can resurrect itself or another object: by creating a reference to itself in an object that it can reach (garbage is not reachable, but garbage can reference non-garbage objects), or by creating a reference in the environment (global variables, or in some cases static variables or variables in a closure). Python examples of both follow, for an object resurrecting itself. It is also possible for an object to resurrect other objects if both are being collected in a given garbage collection cycle, by the same mechanisms.

Resurrects itself by creating a reference in an object it can reach:

class Clingy:
    def __init__(self, ref=None) -> None:
        self.ref = ref
        
    def __del__(self):
        if self.ref:
            self.ref.ref = self
        print("Don't leave me!")

a = Clingy(Clingy())  # Create a 2-element linked list,
                      # referenced by |a|
a.ref.ref = a  # Create a cycle
a.ref = None  # Clearing the reference from the first node
              # to the second makes the second garbage
a.ref = None

Resurrects itself by creating a reference in the global environment:

c = None
class Immortal:
    def __del__(self):
        global c
        c = self
        print("I'm not dead yet.")

c = Immortal()
c = None  # Clearing |c| makes the object garbage
c = None

In the above examples, in CPython prior to 3.4, these will run finalizers repeatedly, and the objects will not be garbage-collected, while in CPython 3.4 and later, the finalizers will only be called once, and the objects will be garbage-collected the second time they become unreachable.

Problems

Object resurrection causes a large number of problems.

Complicates garbage collection
The possibility of object resurrection means that the garbage collector must check for resurrected objects after finalization – even if it does not actually occur – which complicates and slows down garbage collection.
Indestructible objects
In some circumstances an object may be indestructible: if an object is resurrected in its own finalizer (or a group of objects resurrect each other as a result of their finalizers), and the finalizer is always called when destroying the object, then the object cannot be destroyed and its memory cannot be reclaimed.
Accidental resurrection and leaks
Thirdly, object resurrection may be unintentional, and the resulting object may be semantic garbage, hence never actually collected, causing a logical memory leak.
Inconsistent state and reinitialization
A resurrected object may be in an inconsistent state, or violate class invariants, due to the finalizer having been executed and causing an irregular state. Thus resurrected objects generally need to be manually reinitialized.[1]
Unique finalization or re-finalization
In some languages (such as Java and Python 3.4+) finalization is guaranteed to happen exactly once per object, so resurrected objects will not have their finalizers called; therefore, resurrected objects must execute any necessary cleanup code outside of the finalizer. In some other languages, the programmer can force finalization to be done repeatedly; notably, C# has GC.ReRegisterForFinalize.[1]

Solutions

Languages have adopted several different methods for coping with object resurrection, most commonly by having two-phase garbage collection in the presence of finalizers, to prevent dangling references; and by only finalizing objects once, particularly by marking objects as having been finalized (via a flag), to ensure that objects can be destroyed.

Java will not free the object until it has proven that the object is once again unreachable, but will not run the finalizer more than once.[3]

In Python, prior to Python 3.4, the standard CPython implementation would treat resurrected objects identically to other objects (which had never been finalized), making indestructible objects possible.[4] Further, it would not garbage collect cycles that contained an object with a finalizer, to avoid possible problems with object resurrection. Starting in Python 3.4, behavior is largely the same as Java:[lower-alpha 2] objects are only finalized once (being marked as "already finalized"), garbage collection of cycles is in two phases, with the second phase checking for resurrected objects.[5][6]

Objective-C 2.0 will put resurrected objects into a "zombie" state, where they log all messages sent to them, but do nothing else.[7] See also Automatic Reference Counting: Zeroing Weak References for handling of weak references.

In the .NET Framework, notably C# and VB.NET, object finalization is determined by a finalization "queue",[lower-alpha 1] which is checked during object destruction. Objects with a finalizer are placed in this queue on creation, and dequeued when the finalizer is called, but can be manually dequeued (prior to finalization) with SuppressFinalize or re-enqueued with ReRegisterForFinalize. Thus by default objects with finalizers are finalized at most once, but this finalization can be suppressed, or objects can be finalized multiple times if they are resurrected (made accessible again) and then re-enqueued for finalization. Further, weak references by default do not track resurrection, meaning a weak reference is not updated if an object is resurrected; these are called short weak references, and weak references that track resurrection are called long weak references.[8]

Applications

Object resurrection is useful to handle an object pool of commonly used objects, but it obscures code and makes it more confusing.[3] It should be used only for objects that may be frequently used and where the construction/destruction of it is time-consuming. An example could be an array of random numbers, where a large number of them is created and destroyed in a short time, but where actually only a small number is in use at the same time. With object resurrection, a pooling technique would reduce the unnecessary overhead of creation and destruction. Here, a pool manager would get onto its object stack information in the form of a reference to the object, if it is currently to be destructed. The pool manager will keep the object for reuse later.[9]

See also

Notes

  1. 1.0 1.1 This is not strictly a queue, as elements can be removed from the middle by GC.SuppressFinalization.
  2. CPython uses reference counts for non-cyclic garbage, with a separate cycle detector, while most implementations of Java use a tracing garbage collector.

References

  1. 1.0 1.1 1.2 Goldshtein, Zurbalev & Flatow 2012, p. 129.
  2. 2.0 2.1 Richter 2000.
  3. 3.0 3.1 "What is resurrection (in garbage collection)?". XYZWS. http://www.xyzws.com/Javafaq/what-is-resurrection-in-garbage-collection/47. "An object that has been eligible for garbage collection may stop being eligible and return to normal life. Within a finalize() method, you can assign this to a reference variable and prevent that object's collection, an act many developers call resurrection. /The finalize() method is never called more than once by the JVM for any given object. The JVM will not invoke finalize() method again after resurrection (as the finalize() method already ran for that object)." 
  4. Tim Peters's answer to "How many times can `__del__` be called per object in Python?"
  5. What's New In Python 3.4, PEP 442: Safe Object Finalization
  6. Pitrou, Antoine (2013). "PEP 442 -- Safe object finalization". http://legacy.python.org/dev/peps/pep-0442/. 
  7. Implementing a finalize Method
  8. Goldshtein, Zurbalev & Flatow 2012, p. 131.
  9. "Object resurrection". Hesab.net. http://www.hesab.net/book/asp.net/Additional%20Documents/Object%20Resurrection.pdf. "Object resurrection is an advanced technique that's likely to be useful only in unusual scenarios, such as when you're implementing a pool of objects whose creation and destruction is time-consuming. ... The ObjectPool demo application shows that an object pool manager can improve performance when many objects are frequently created and destroyed. Assume that you have a RandomArray class, which encapsulates an array of random numbers. The main program creates and destroys thousands of RandomArray objects, even though only a few objects are alive at a given moment. Because the class creates the random array in its constructor method (a timeconsuming operation), this situation is ideal for a pooling technique. ... The crucial point in the pooling technique is that the PoolManager class contains a reference to unused objects in the pool (in the PooledObjects Stack object), but not to objects being used by the main program. In fact, the latter objects are kept alive only by references in the main program. When the main program sets a RandomArray object to Nothing (or lets it go out of scope) and a garbage collection occurs, the garbage collector invokes the object's Finalize method. The code inside the RandomArray's Finalize method has therefore an occasion to resurrect itself by storing a reference to itself in the PoolManager's PooledObjects structure. So when the NewRandomArray function is called again, the PoolManager object can return a pooled object to the client without going through the time-consuming process of creating a new one."