Serializing Java and the corner cases
Everytime you think of Java you may think about types as of classes and primitive types. Not many, right? Well, there’s a major difference between the thing and handling it. The latter brings a lot of corner cases.
This is Serializing Java - a series about writing your own serializer in case you didn’t like other serializers.
Not only types
In my older article, I have listed types and lots of basic cases of type inspection:
→ Serializing Java: inspecting data structure
so if you haven’t read that, I suggest you go there first. This time we’ll look deeper into the all aspects of serialization instead of just inspection phase.
Non-serializable things
Yes! You can’t serialize everything!There are some non-serializable things out there. I could enlist two types of those:
- fields/objects of certain classes like Thread, Socket, Stream and probably many other that would depend on external resources (external due to JVM)
- transient fields
There are two ways to handle both of them:
- totally ignore by not mentioning them
- ignore, by possibly putting an information if it’s null or not
However, I’d prefer a mix of those approaches:
- totally ignore transient fields
- in the case of Object referencing to an external source, put an information whether it’s null or not null and nothing else
Array of arrays, array of arrays of arrays, …
As usual, there are two cases to cover:
- explicit definition of multi-dimensional array
- an implicit instance of a multi-dimensional array
Those two classes below contain same data:
public class DeepArray {
public int[][][][] arr = new int[][][][] {
new int[][][] {
new int[][] {
new int[] {
123
}
}
},
new int[][][] {
new int[][] {
new int[] {
124
}
}
}
};
}
public class DeepArrayImplicit {
public Object arr = new int[][][][] {
new int[][][] {
new int[][] {
new int[] {
123
}
}
},
new int[][][] {
new int[][] {
new int[] {
124
}
}
}
};
}
but have different structure a priori serialization.
Explicit definition of int[][][][]
in the field type is much larger structure than just Object. Why? Serialized type is information about the type and subtype. The type of int[][][][]
is Array and subtype is… Array! This type has children of type int[][][]
and has subtype int[][]
which has a subtype int[]
. Only the latest one is of type Array and subtype of primitive Integer. In the implicit case, Object
is just an Object so during the inspection phase we can’t decide anything more than this.
Of course, it’s my way of defining the data structure in my serializer. You could treat the special case of multi-dimensional arrays e.g. by writing a flat structure where type=Array
, subType=primitiveInteger
and dimensions=4
. One thing is sure - explicit case could be optimized during serialization both in serialization speed and output size.
Object references
Referencing Objects in Java is purely normal. References tend to build **graphs **which is followad by a need of way to handle cyclic references. However, as that’s obvious during serialization - it is NOT during deserialization!
Chicken or the egg problem. When we deserialize an object and some of it’s fields references to object that wasn’t deserialized yet - then we have a problem. To avoid problems we need some container that would be referenced and potentially filled later. However, it’s still not obvious how to fill those containers - how should we identify which container is for which object? Well, every single object should have an ID. That brings the topic of pointing at objects which I have also covered:
→ Serializing Java: point at without pointers
Non-static inner class
I’ve already mentioned that inner object contains a reference to the parent object. Read “Serializing Java: treat all your fields” to know more. However, what if we would actually need a reference to parent’s parent?
Let’s look at this sophisticated example of inner reality:
public class OuterClass {
InnerClass a = new InnerClass();
public OuterClass() {
a.b.c.test();
}
class InnerClass {
MoreInnerClass b = new MoreInnerClass();
class MoreInnerClass {
EvenMoreInnerClass c = new EvenMoreInnerClass();
class EvenMoreInnerClass {
boolean d = false;
EvenMoreInnerClass() {
// at this point `a`, `b` and `c` are null because there's still not constructed
}
void test() {
a.b.c.d = true;
}
}
}
}
}
EvenMoreInnerClass
can access the great-grandfather OuterClass
with ease of innocent baby’s touch.
Now observe the instantiation of this beauty:
There’s a pattern of this$<number>
. Normally, you would think that’s transitive - if object c is created in a context of b and b is created in the context of c then c is in the context of a. If so, then why c doesn’t have a direct pointer to a? In Java, it’s easy to just do a.b.c.d = true
from the deepest object (as in the snippet above) but during serialization, we can’t make this direct connection. It’s even more interesting because variables named this$<number>
are named (numbered) as if it was already considered to get any ancestor context. You may think that’s only shown be debugger but hell no - the Class.getDeclaredFields()
actually returns the same.
This case wasn’t especially important for my application or serialization algorithm but it’s worth to note that quirkiness.
Summary
This was probably the last post about basics of Serializing Java. Here are all the previous ones:
- Artemis Entity Tracker – inspecting your game state through network
- Serializing Java: why I work on new serializer?
- Serializing Java: treat all your fields
- Serializing Java: inspecting data structure
- Serializing Java: point at without pointers
- Serializing Java: cyclic dependencies
There’s much more to cover in real life (advanced) situations:
- updating source objects based on deserialized value tree object
- diffing observed objects
- dealing with various collections in efficient manner
- JIT optimization