This document answers frequently asked questions concerning the design of the
Java collections framework. It is derived from the large volume of
traffic on the collections-comments alias. It serves as a design rationale for
the collections framework.
This is the most controversial design decision in the whole API.
Clearly, static (compile time) type checking is highly desirable, and is the
norm in Java. We would have supported it if we believed it were feasible.
Unfortunately, attempts to achieve this goal cause an explosion in the
size of the interface hierarchy, and do not succeed in eliminating
the need for runtime exceptions (though they reduce it substantially).
Doug Lea, who wrote a popular Java collections package that did reflect
mutability distinctions in its interface hierarchy, no longer believes it is
a viable approach, based on user experience with his collections package. In
his words (from personal correspondence) "Much as it pains me to say it, strong
static typing does not work for collection interfaces in Java."
To illustrate the problem in gory detail, suppose you want to add the
notion of modifiability to the Hierarchy. You need four new interfaces:
ModifiableCollection, ModifiableSet, ModifiableList, and ModifiableMap.
What was previously a simple hierarchy is now a messy heterarchy. Also, you
need a new Iterator interface for use with unmodifiable Collections, that
does not contain the remove operation. Now can you do away with
UnsupportedOperationException? Unfortunately not.
Consider arrays. They implement most of the List operations, but not remove
and add. They are "fixed-size" Lists. If you want to capture this notion in
the hierarchy, you have to add two new interfaces: VariableSizeList and
VariableSizeMap. You don't have to add VariableSizeCollection and
VariableSizeSet, because they'd be identical to ModifiableCollection and
ModifiableSet, but you might choose to add them anyway for consistency's sake.
Also, you need a new variety of ListIterator that doesn't support the add and
remove operations, to go along with unmodifiable List. Now we're up to ten or
twelve interfaces, plus two new Iterator interfaces, instead of our original
four. Are we done? No.
Consider logs (such as error logs, audit logs and journals for recoverable
data objects). They are natural append-only sequences, that support all of
the List operations except for remove and set (replace). They require a new
core interface, and a new iterator.
And what about immutable Collections, as opposed to unmodifiable ones?
(i.e., Collections that cannot be changed by the client AND will never change
for any other reason). Many argue that this is the most important
distinction of all, because it allows multiple threads to access a collection
concurrently without the need for synchronization. Adding this support
to the type hierarchy requires four more interfaces.
Now we're up to twenty or so interfaces and five iterators, and it's
almost certain that there are still collections arising in practice that
don't fit cleanly into any of the interfaces. For example, the
collection-views returned by Map are natural delete-only collections.
Also, there are collections that will reject certain elements on the basis
of their value, so we still haven't done away with runtime exceptions.
When all was said and done, we felt that it was a sound engineering
compromise to sidestep the whole issue by providing a very small set of core
interfaces that can throw a runtime exception.
It was never our intention that programs should catch these exceptions: that's
why they're unchecked (runtime) exceptions. They should only arise
as a result of programming errors, in which case, your program will halt due
to the uncaught exception.
The Collection interface provides this functionality. We
are not providing any public implementations of this interface, as
we think that it wouldn't be used frequently enough to "pull its weight."
We occasionally return such Collections, which are implemented easily atop
AbstractCollection (for example, the Collection returned by Map.values).
We are extremely sympathetic to the desire for type-safe collections.
Rather than adding a "band-aid" to the framework that enforces type-safety
in an ad hoc fashion, the framework has been designed to mesh with all of
the parameterized-types proposals currently being discussed. In the event
that parameterized types are added to the language, the entire collections
framework will support compile-time type-safe usage, with no need
for explicit casts. Unfortunately, this won't happen in the the 1.2 release.
In the meantime, people who desire runtime type safety can implement their own
gating functions in "wrapper" collections surrounding JDK collections.
While the names of the new collections methods do not adhere to the "Beans
naming conventions", we believe that they are reasonable, consistent and
appropriate to their purpose. It should be remembered that the Beans naming
conventions do not apply to the JDK as a whole; the AWT did adopt these
conventions, but that decision was somewhat controversial. We suspect that
the collections APIs will be used quite pervasively, often with multiple
method calls on a single line of code, so it is important that the names be
short. Consider, for example, the Iterator methods. Currently, a loop over a
collection looks like this:
for (Iterator i = c.iterator(); i.hasNext(); )
System.out.println(i.next());
Everything fits neatly on one line, even if the Collection name is a long
expression. If we named the methods "getIterator", "hasNextElement" and
"getNextElement", this would no longer be the case. Thus, we adopted the
"traditional" JDK style rather than the Beans style.
Many Collection implementations (including all of the ones provided by the
JDK) will have a public clone method, but it would be mistake to require it
of all Collections. For example, what does it mean to clone a Collection
that's backed by a terabyte SQL database? Should the method call cause the
company to requisition a new disk farm? Similar arguments hold for
serializable.
If the client doesn't know the actual type of a Collection, it's much
more flexible and less error prone to have the client decide what type of
Collection is desired, create an empty Collection of this type, and use the
addAll method to copy the elements of the original collection into the new
one.
This is what is referred to as an "Internal Iterator" in the "Design
Patterns" book (Gamma et al.). We considered providing it, but decided
not to as it seems somewhat redundant to support internal and external
iterators, and Java already has a precedent for external iterators (with
Enumerations). The "throw weight" of this functionality is increased by the
fact that it requires a public interface to describe upcalls.
It's easy to implement this functionality atop Iterators, and the resulting
code may actually look cleaner as the user can inline the predicate.
Thus, it's not clear whether this facility pulls its weight. It could be
added to the Collections class at a later date (implemented atop Iterator),
if it's deemed useful.
Because we don't believe in using Enumerations (or Iterators) as
"poor man's collections." This was occasionally done in prior releases,
but now that we have the Collection interface, it is the preferred way to
pass around abstract collections of objects.
Again, this is an instance of an Enumeration serving as a "poor man's
collection" and we're trying to discourage that. Note however, that we
strongly suggest that all concrete implementations should have constructors
that take a Collection (and create a new Collection with the same elements).
The semantics are unclear, given that the contract for Iterator makes no guarantees about the order of iteration. Note, however, that ListIterator does provide an add operation, as it does guarantee the order of the iteration.
People were evenly divided as to whether List suggests linked lists.
Given the implementation naming convention,
<Implementation><Interface>,
there was a strong desire to keep the core interface names short. Also,
several existing names (AbstractSequentialList, LinkedList) would have been
decidedly worse if we changed List to Sequence.
The naming conflict can be dealt with by the following incantation:
This was by design. We feel that mappings are not collections and
collections are not mappings. Thus, it makes little sense for Map to
extend the Collection interface (or vice versa).
If a Map is a Collection, what are the elements? The only reasonable
answer is "Key-value pairs", but this provides a very limited (and not
particularly useful) Map abstraction. You can't ask what value a given
key maps to, nor can you delete the entry for a given key without knowing
what value it maps to.
Collection could be made to extend Map, but this raises the question:
what are the keys? There's no really satisfactory answer, and forcing one
leads to an unnatural interface.
Maps can be viewed as Collections (of keys, values, or pairs),
and this fact is reflected in the three "Collection view operations" on
Maps (keySet, entrySet, and values). While it is, in principle, possible to
view a List as a Map mapping indices to elements, this has the nasty
property that deleting an element from the List changes the Key associated
with every element before the deleted element. That's why we don't have a
map view operation on Lists.
We view the method names for Enumeration as unfortunate. They're very
long, and very frequently used. Given that we were adding a method
and creating a whole new framework, we felt that it would be foolish not to
take advantage of the opportunity to improve the names. Of course we could
support the new and old names in Iterator, but it doesn't seem worthwhile.
It can be implemented atop the current Iterators (a similar pattern
to java.io.PushbackInputStream). We believe that its use would be rare
enough that it isn't worth including in the interface that everyone has
to implement.
If you examine the goals for our Collections framework (in the Overview),
you'll see that we are not really "playing in the same space" as JGL.
Quoting from the "Design Goals" Section of the Java Collections Overview:
"Our main design goal was to produce an API that was reasonably small, both
in size, and (more importantly) in 'conceptual weight.'"
JGL consists of approximately 130 classes and interfaces; its main
goal was consistency with the C++ Standard Template Library (STL). This was
not one of our goals. Java has traditionally stayed away from C++'s
more complex features (e.g., multiple inheritance, operator overloading).
Our entire framework, including all infrastructure, contains approximately 25
classes and interfaces.
While this may cause some discomfort for some C++ programmers, we feel
that it will be good for Java in the long run. As the Java libraries mature,
they inevitably grow, but we are trying as hard as we can to keep them small
and manageable, so that Java continues to be an easy, fun language to learn
and to use.
Given that we provide core collection interfaces behind which programmers
can "hide" their own implementations, there will be aliased collections
whether the JDK provides them or not. Eliminating all views from the JDK
would greatly increase the cost of common operations like making a
Collection out of an array, and would do away with many useful facilities
(like synchronizing wrappers). One view that we see as being particularly
useful is List.subList.
The existence of this method means that people who write methods taking
List on input do not have to write secondary forms taking an offset and a
length (as they do for arrays).
Primarily, resource constraints. If we're going to commit to such an API, it
has to be something that works for everyone, that we can live with for the
long haul. We may provide such a facility some day. In the meantime, it's
not difficult to implement such a facility on top of the public APIs.