Dbstl persistence

Direct database get
Change persistence
Object life time and persistence

The following sections provide information on how to achieve persistence using dbstl.

Direct database get

Each container has a begin() method which produces an iterator. These begin methods take a boolean parameter, directdb_get, which controls the caching behavior of the iterator. The default value of this parameter is true.

If directdb_get is true, then the persistent object is fetched anew from the database each time the iterator is dereferenced as a pointer by use of the star-operator (*iterator) or by use of the arrow-operator (iterator->member). If directdb_get is false, then the first dereferencing of the iterator fetches the object from the database, but later dereferences can return cached data.

With directdb_get set to true, if you call:

(*iterator).datamember1=new-value1; 
(*iterator).datamember2=new-value2;  

then the assignment to datamember1 will be lost, because the second dereferencing of the iterator would cause the cached copy of the object to be overwritten by the object's persistent data from the database.

You also can use the arrow operator like this:

iterator->datamember1=new-value1; 
iterator->datamember2=new-value2;  

This works exactly the same way as iterator::operator*. For this reason, the same caching rules apply to arrow operators as they do for star operators.

One way to avoid this problem is to create a reference to the object, and use it to access the object:

container::value_type &ref = *iterator;
ref.datamember1=new-value1;
ref.datamember2=new-value2;
...// more member function calls and datamember assignments
ref._DB_STL_StoreElement();  

The above code will not lose the newly assigned value of ref.datamember1 in the way that the previous example did.

In order to avoid these complications, you can assign to the object referenced by an iterator with another object of the same type like this:

container::value_type obj2;
obj2.datamember1 = new-value1;
obj2.datamember2 = new-value2;
*itr = obj2; 

This code snippet causes the new values in obj2 to be stored into the underlying database.

If you have two iterators going through the same container like this:

for (iterator1 = v.begin(), iterator2 = v.begin();
     iterator1 != v.end();
     ++iterator1, ++iterator2) {
        *iterator1 = new_value;
        print(*iterator2);
}  

then the printed value will depend on the value of directdb_get with which the iterator had been created. If directdb_get is false, then the original, persistent value is printed; otherwise the newly assigned value is returned from the cache when iterator2 is dereferenced. This happens because each iterator has its own cached copy of the persistent object, and the dereferencing of iterator2 refreshes iterator2's copy from the database, retrieving the value stored by the assignment to *iterator1.

Alternatively, you can set directdb_get to false and call iterator2->refresh() immediately before the dereferencing of iterator2, so that iterator2's cached value is refreshed.

If directdb_get is false, a few of the tests in dbstl's test kit will fail. This is because the above contrived case appears in several of C++ STL tests. Consequently, the default value of the directdb_get parameter in the container::begin() methods is true. If your use cases avoid such bizarre usage of iterators, you can set it to false, which makes the iterator read operation faster.

Change persistence

If you modify the object to which an iterator refers by using one of the following:

(*iterator).member_function_call()

or

(*iterator).data_member = new_value

then you should call iterator->_DB_STL_StoreElement() to store the change. Otherwise the change is lost after the iterator moves on to other elements.

If you are storing a sequence, and you modified some part of it, you should also call iterator->_DB_STL_StoreElement() before moving the iterator.

And in both cases, if directdb_get is true (this is the default value), you should call _DB_STL_StoreElement() after the change and before the next iterator movement OR the next dereferencing of the iterator by the star or arrow operators (iterator::operator* or iterator::operator->). Otherwise, you will lose the change.

If you update the element by assigning to a dereferenced iterator like this:

*iterator = new_element;

then you never have to call _DB_STL_StoreElement() because the change is stored in the database automatically.

Object life time and persistence

Dbstl is an interface to Berkeley DB, so it is used to store data persistently. This is really a different purpose from that of regular C++ STL. This difference in their goals has implications on expected object lifetime: In standard STL, when you store an object A of type ID into C++ stl vector V using V.push_back(A), if a proper copy constructor is provided in A's class type, then the copy of A (call it B) and everything in B, such as another object C pointed to by B's data member B.c_ptr, will be stored in V and will live as long as B is still in V and V is alive. B will be destroyed when V is destroyed or B is erased from V.

This is not true for dbstl, which will copy A's data and store it in the underlying database. The copy is by default a shallow copy, but users can register their object marshalling and unmarshalling functions using the DbstlElemTraits class template. So if A is passed to a db_vector container, dv, by using dv.push_back(A), then dbstl copies A's data using the registered functions, and stores data into the underlying database. Consequently, A will be valid, even if the container is destroyed, because it is stored into the database.

If the copy is simply a shallow copy, and A is later destroyed, then the pointer stored in the database will become invalid. The next time we use the retrieved object, we will be using an invalid pointer, which probably will result in errors. To avoid this, store the referred object C rather than the pointer member A.c_ptr itself, by registering the right marshalling/unmarshalling function with DbstlElemTraits.

For example, consider the following example class declaration:

class ID
{
public:
    string Name;
    int Score;
};  

Here, the class ID has a data member Name, which refers to a memory address of the actual characters in the string. If we simply shallow copy an object, id, of class ID to store it, then the stored data, idd, is invalid when id is destroyed. This is because idd and id refer to a common memory address which is the base address of the memory space storing all characters in the string, and this memory space is released when id is destroyed. So idd will be referring to an invalid address. The next time we retrieve idd and use it, there will probably be memory corruption.

The way to store id is to write a marshal/unmarshal function pair like this:

void copy_id(void *dest, const ID&elem)
{
	memcpy(dest, &elem.Score, sizeof(elem.Score));
	char *p = ((char *)dest) + sizeof(elem.Score);
	strcpy(p, elem.Name.c_str());
}

void restore_id(ID& dest, const void *srcdata)
{
	memcpy(&dest.Score, srcdata, sizeof(dest.Score));
	const char *p = ((char *)srcdata) + sizeof(dest.Score);
	dest.Name = p;
}

size_t size_id(const ID& elem)
{
	return sizeof(elem.Score) + elem.Name.size() + 
	    1;// store the '\0' char.
}  

Then register the above functions before storing any instance of ID:

DbstlElemTraits<ID>::instance()->set_copy_function(copy_id);
DbstlElemTraits<ID>::instance()->set_size_function(size_id);
DbstlElemTraits<ID>::instance()->set_restore_function(restore_id);  

This way, the actual data of instances of ID are stored, and so the data will persist even if the container itself is destroyed.