Using Stored Collections

Stored Collection and Access Methods
Stored Collections Versus Standard Java Collections
Other Stored Collection Characteristics
Why Java Collections for Berkeley DB

When a stored collection is created it is based on either a Database or a SecondaryDatabase. When a database is used, the primary key of the database is used as the collection key. When a secondary database is used, the index key is used as the collection key. Indexed collections can be used for reading elements and removing elements but not for adding or updating elements.

Stored Collection and Access Methods

The use of stored collections is constrained in certain respects as described below. Most of these restrictions have to do with List interfaces; for Map interfaces, most all access modes are fully supported since the Berkeley DB model is map-like.

Stored Collections Versus Standard Java Collections

Stored collections have the following differences with the standard Java collection interfaces. Some of these are interface contract violations.

The Java collections interface does not support duplicate keys (multi-maps or multi-sets). When the access method allows duplicate keys, the collection interfaces are defined as follows.

  • Map.entrySet() may contain multiple Map.Entry objects with the same key.

  • Map.keySet() always contains unique keys, it does not contain duplicates.

  • Map.values() contains all values including the values associated with duplicate keys.

  • Map.put() appends a duplicate if the key already exists rather than replacing the existing value, and always returns null.

  • Map.remove() removes all duplicates for the specified key.

  • Map.get() returns the first duplicate for the specified key.

  • StoredMap.duplicates() is an additional method for returning the values for a given key as a Collection.

Other differences are:

  • Collection.size() and Map.size() always throws UnsupportedOperationException. This is because the number of records in a database cannot be determined reliably or cheaply.

  • Because the size() method cannot be used, the bulk operation methods of standard Java collections cannot be passed stored collections as parameters, since the implementations rely on size(). However, the bulk operation methods of stored collections can be passed standard Java collections as parameters. storedCollection.addAll(standardCollection) is allowed while standardCollection.addAll(storedCollection) is not allowed. This restriction applies to the standard collection constructors that take a Collection parameter (copy constructors), the Map.putAll() method, and the following Collection methods: addAll(), containsAll(), removeAll() and retainAll().

  • The ListIterator.nextIndex() method returns Integer.MAX_VALUE for stored lists when positioned at the end of the list, rather than returning the list size as specified by the ListIterator interface. Again, this is because the database size is not available.

  • Comparator objects cannot be used and the SortedMap.comparator() and SortedSet.comparator() methods always return null. The Comparable interface is not supported. However, Comparators that operate on byte arrays may be specified using DatabaseConfig.setBtreeComparator.

  • The Object.equals() method is not used to determine whether a key or value is contained in a collection, to locate a value by key, etc. Instead the byte array representation of the keys and values are used. However, the equals() method is called for each key and value when comparing two collections for equality. It is the responsibility of the application to make sure that the equals() method returns true if and only if the byte array representations of the two objects are equal. Normally this occurs naturally since the byte array representation is derived from the object's fields.

Other Stored Collection Characteristics

The following characteristics of stored collections are extensions of the definitions in the java.util package. These differences do not violate the Java collections interface contract.

  • All stored collections are thread safe (can be used by multiple threads concurrently) whenever the Berkeley DB Concurrent Data Store or Transactional Data Store environment is used. Locking is handled by the Berkeley DB environment. To access a collection from multiple threads, creation of synchronized collections using the Collections class is not necessary except when using the Data Store environment. Iterators, however, should always be used only by a single thread.

  • All stored collections may be read-only if desired by passing false for the writeAllowed parameter of their constructor. Creation of immutable collections using the Collections class is not necessary.

  • A stored collection is partially read-only if a secondary index is used. Specifically, values may be removed but may not be added or updated. The following methods will throw UnsupportedOperationException when an index is used: Collection.add(), List.set(), ListIterator.set() and Map.Entry.setValue().

  • SortedMap.entrySet() and SortedMap.keySet() return a SortedSet, not just a Set as specified in Java collections interface. This allows using the SortedSet methods on the returned collection.

  • SortedMap.values() returns a SortedSet, not just a Collection, whenever the keys of the map can be derived from the values using an entity binding. Note that the sorted set returned is not really a set if duplicates are allowed, since it is technically a collection; however, the SortedSet methods (for example, subSet()), can still be used.

  • For SortedSet and SortedMap views, additional subSet() and subMap() methods are provided that allow control over whether keys are treated as inclusive or exclusive values in the key range.

  • Keys and values are stored by value, not by reference. This is because objects that are added to collections are converted to byte arrays (by bindings) and stored in the database. When they are retrieved from the collection they are read from the database and converted from byte arrays to objects. Therefore, the object reference added to a collection will not be the same as the reference later retrieved from the collection.

  • A runtime exception, RuntimeExceptionWrapper, is thrown whenever database exceptions occur which are not runtime exceptions. The RuntimeExceptionWrapper.getCause() method can be called to get the underlying exception.

  • All iterators for stored collections implement the ListIterator interface as well as the Iterator interface. This is to allow use of the ListIterator.hasPrevious() and ListIterator.previous() methods, which work for all collections since Berkeley DB provides bidirectional cursors.

  • All stored collections have a StoredCollection.iterator(boolean) method that allows creating a read-only iterator for a writable collection. For the standard Collection.iterator() method, the iterator is read-only only when the collection is read-only. Read-only iterators are important for using the Berkeley DB Concurrent Data Store environment, since only one write cursors may be open at one time.

  • Iterator stability for stored collections is greater than the iterator stability defined by the Java collections interfaces. Stored iterator stability is the same as the cursor stability defined by Berkeley DB.

  • When an entity binding is used, updating (setting) a value is not allowed if the key in the entity is not equal to the original key. For example, calling Map.put() is not allowed when the key parameter is not equal to the key of the entity parameter. Map.put(), List.set(), ListIterator.set(), and Map.Entry.setValue() will throw IllegalArgumentException in this situation.

  • Adding and removing items from stored lists is not allowed for sublists. This is simply an unimplemented feature and may be changed in the future. Currently for sublists the following methods throw UnsupportedOperationException: List.add()List.add(), List.remove()List.remove(), ListIterator.add() and ListIterator.remove()ListIterator.remove().

  • The StoredList.append(java.lang.Object) and StoredMap.append(java.lang.Object) extension methods allows adding a new record with an automatically assigned key. Record number assignment by the database itself is supported for QUEUE, RECNO and RECNO-RENUMBER databases. An application-defined PrimaryKeyAssigner is used to assign the key value.

Why Java Collections for Berkeley DB

The Java collections interface was chosen as the best Java API for DB given these requirements:

  1. Provide the Java developer with an API that is as familiar and easy to use as possible.

  2. Provide access to all, or a large majority, of the features of the underlying Berkeley DB storage system.

  3. Compared to the DB API, provide a higher-level API that is oriented toward Java developers.

  4. For ease of use, support object-to-data bindings, per-thread transactions, and some traditional database features such as foreign keys.

  5. Provide a thin layer that can be thoroughly tested and which does not significantly impact the reliability and performance of DB.

Admittedly there are several things about the Java Collections API that don't quite fit with DB or with any transactional database, and therefore there are some new rules for applying the Java Collections API. However, these disadvantages are considered to be smaller than the disadvantages of the alternatives:

  • A new API not based on the Java Collections API could have been designed that maps well to DB but is higher-level. However, this would require designing an entirely new model. The exceptions for using the Java Collections API are considered easier to learn than a whole new model. A new model would also require a long design stabilization period before being as complete and understandable as either the Java Collections API or the DB API.

  • The ODMG API or another object persistence API could have been implemented on top of DB. However, an object persistence implementation would add much code and require a long stabilization period. And while it may work well for applications that require object persistence, it would probably never perform well enough for many other applications.