Don't Use Pickle For Object Serialization

Information collected during dynamic analysis shouldn't include verbatim application objects. Instead, only relevant aspects of live objects should be remembered and contained within Pythoscope descriptors. By doing this we'll avoid any future pickling problems and potentially lower memory footprint of the dynamic analysis.

Moreover, values captured by dynamic analyzer should be serialized at the time of their capture. This way we have the whole dynamic environment related to the object frozen and available for inspection, enabling correct serialization. Serialization done later than that would use environment potentially clobbered by code executed after the moment of capture.

We can't rely on pickle here, since we're building structures out of live objects, and the ability to serialize them is as important as capturing their dynamic (thus changing) relationships. Pickle works well for capturing a static snapshot of a Python object space, but is not well suited for mirroring dynamic aspects of execution, like describing whole life of an object, since its creation and initialization up to deletion and garbage collection. Pythoscope already has some facilities for capturing a state of an object (see ObjectWrapper or LiveObject classes), but that should be pushed further. Dependency on pickle is the first thing to go and a driver for refactoring. We're not interested in general-purpose object serialization. We have a clear usage scenario of saving values for later unit test generation. That should narrow down the scope enough to significantly lower complexity of writing our own serialization mechanism.

The original problem that uncovered the limitations of the pickle module for purposes of serializing dynamic objects:

import pickle
import some_module
 
o = some_module.SomeClass()
reload(some_module)
 
pickle.dumps(o)

This code raises:

pickle.PicklingError: Can't pickle <class 'some_module.SomeClass'>: it's not the same object as some_module.SomeClass

Pickle is not happy about the fact that the state of the world (classes identity in this case) changed between the time the object was created and when it was being pickled. We *could* change the time we pickle things and in some sense that's what we're planning to do - get rid of late-serialization in favor of just-in-time one. The thing is that at this point there's little sense in keeping pickle around. IMHO it's better to figure out all the things we need to know about an object and environment around it ourselves, than to rely on the pickle module. It's going to pay off once we start working on taming side effects. Pickle captures little information about the dynamic environment, which makes it especially unsuitable for the type of serialization we need.

This is not to say we'll resign from pickle completely. We may still use it to store inspection information on disk. But the descriptions of dynamic objects will be done using Pythoscope classes. We'll regain full control over serialization mechanism and use pickle for storage only. For which task it may get replaced as well in the future (see "Improve information storage performance" blueprint).

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License