Mird, low-level database library



Introduction

Mird is a database library, for operating on simple disk-based databases with the following features:

Installation

Unpack the archive using tar:
      % tar xvzf mird-1.0.tar.gz

Configure for your system:

      % ./configure
      creating cache ./config.cache
      configuring in /users/mirar/mird/build/your_machine/
      creating cache ./config.cache
      checking for gcc... gcc
      [...]
      creating Makefile
      %

Make and install the library:

      % make install
      building in /users/mirar/mird/build/mistel.idonex.se/
      make[1]: Entering directory `/export/home/mirar/mird/build/mistel.idonex.se'
      gcc -g -O2 -g -Wall -Wno-parentheses -I. -I../../src -c ../../src/blocks.c -o blocks.o 
      [...]
      installing mird.h in /usr/local/include
      ln -sf libmird.so.1.0 /usr/local/lib/libmird.so 
      % 

Now the library is installed.

If you want it installed in some other path, run the configure with
   % ./configure --prefix=path
where path is the base path of installation (the prefix to .../lib and .../include). You can also run
   % ./configure --help
to read about the other options to configure.

Supported platforms

The library is configured by using autoconf. Most Unix- or Posix-like platforms should be supported, since the library itself does very little system-specific operations. It has been tested to compile and run on the following systems: If you succeed to run it in any other environment, please report the success to the author.

The database file generated does not have any machine dependent data, which means that the database itself can be transferred between systems with different endian. Everything is stored in network byte order.

License

The Mird database library may be freely distributed and used with these limits¹: In legalese:

The authors make NO WARRANTY or representation, either express or implied, with respect to this software, its quality, accuracy, merchantability, or fitness for a particular purpose. This software is provided "AS IS", and you, its user, assume the entire risk as to its quality and accuracy.

This software is copyright (C) 1999-2000, Mirar Hagland All Rights Reserved except as specified below.

Permission is hereby granted to use, copy, modify, and distribute this software (or portions thereof) for any purpose, without fee, subject to these conditions:
(1) If any part of the source code for this software is distributed, then this license must be included, with this copyright and no-warranty notice unaltered; and any additions, deletions, or changes to the original files must be clearly indicated in accompanying documentation.
(2) If only executable code is distributed, then the accompanying documentation must state that "this software is utilizing the Mird database".
(3) Permission for use of this software is granted only if the user accepts full responsibility for any undesirable consequences; the authors accept NO LIABILITY for damages of any kind.

These conditions apply to any software derived from or based on the Mird database code, not just to the unmodified library. If you use this work, you ought to acknowledge it.

I permit and encourage the use of this software as the basis of commercial products, provided that all warranty or liability claims are assumed by the product vendor.

The Unix configuration script "configure" was produced with GNU Autoconf. It is copyright by the Free Software Foundation but is freely distributable. The same holds for its supporting scripts (config.guess, config.sub, ltconfig, ltmain.sh). Another support script, install-sh, is copyright by M.I.T. but is also freely distributable.

¹ this legalese is stolen from the JPEG license

Workflow

Opening, syncing and closing the database

Opening the database consists of two steps, first to create the database object structure, then to create, open or restore
1 the database file.

Example:

   
      MIRD_RES res;
      struct mird *my_database;
   
      /* initialize the structure */
      if ( (res=mird_initialize("my_database_file.mird",&my_database)) )
   	 [...do something with the error...]
   
      /* change the default parameters if necessary */
      my_database->flags|=MIRD_NOCREATE;
      my_database->cache_size=1024;
   
      /* open the database file */
      if ( (res=mird_open(my_database)) )
   	 [...do something with the error...]
      
      [...]
   
      /* upon need,                                   */
      /* sync the database and flush the journal file */
      if ( (res=mird_sync(my_database)) )
   	 [...do something with the error...]
   
      [...]
   
      /* close the database */
      if ( (res=mird_close(my_database)) )
   	 [...do something with the error...]
Please note that you neither need to change or setup the database parameters, or sync the database. On a long-lived application (a server, for instance) the mird_sync operation may help to keep down the size of the files, though, since both the database and journal file only may grow up to the point where the database is synced or closed. At that point, the database starts to reuse any free space generated from the previous operations and the journal file is restarted.

Also note that the close or sync operations shouldn't be too often. The operations to find free space are summarily quicker if there is more data to work on at the same time.

1) A database file can be in three states, non-existant, clean or dirty. It is dirty when a previous process has not closed the database nicely, but rather died in some way. In this case, the journal file is still there, and the database is restored to the lastest state possible.

   

Transactions

Any change to the database must be in the scope of a transaction. To change anything in the database, you must therefore first create a transaction, in which you do the change to the database, and then close it:

The result of a transaction close can be a conflict. In this case, the transaction must be cancelled.

Example:

      struct mird_transaction *mtr;
   
      /* begin a new transaction */
      if ( (res=mird_transaction_new(my_database,&mtr)) )
   	 [...do something with the error...]
   
      [...operate on the transacton ...]
   
      /* close the transaction */
      if ( (res=mird_transaction_close(mtr)) )
      {
   	 if (!MIRD_IS_CONFLICT(res))
   	      [...do something with the error...]
   	 [...do something with the conflict...]
         mird_free_error(res);
   	 if ( (res=mird_transaction_cancel(mtr)) )
   	    [...do something with the error...]
      }
Several transactions can be going on at the same time. If you doesn't have more then one transactions at the same time, you can never get conflicts.

The transaction scope can be used to read the database, too. It keeps the scope the database had until it's closed (or synced against the current state in any other way).

Note that the database cannot be synced when there is ongoing transactions.

Writing to the database

Within a transaction, changes to the database is possible.

The database is table-oriented, so the first thing you will have to do with a new database, is to create the tables you need.
A table may either be a mapping from an integer (32-bit) key to an any-length string, or from a string to a string.
To create a table, call mird_key_new_table or mird_s_key_new_table:

       /* create table 17, integer key to string value table */
       if ( (res=mird_key_new_table(mtr,17)) )
   	  [...error...]
   
       /* create table 18, string key to string value table */
       if ( (res=mird_s_key_new_table(mtr,18)) )
   	  [...error...]
Note that if the table pre-exists, this call fails and the resulting exception will be MIRD_TABLE_EXISTS.

When the tables exists, you will want to write down key:value pairs in them. This is done by calling mird_key_store, or for string key tables, mird_s_key_store:

       /* store an integer key : string value in the table 17 */
       mird_key_t     int_key;
       unsigned char *value;
       mird_size_t    value_len;
   
       if ( (res=mird_key_store(mtr,17,int_key,value,value_len)) )
   	  [...error...]
   
       /* store a string key : string value in the table 18 */
       unsigned char *string_key;
       mird_size_t    key_len;
       unsigned char *value;
       mird_size_t    value_len;
       if ( (res=mird_s_key_store(mtr,18,string_key,key_len,value,value_len)) )
   	  [...error...]
Any key can be written any number of times in a transaction. Conflicts appear at first when the transaction is closed.

Reading from the database

Reading from the database is done in a similar way as writing to the database, by calling
mird_key_lookup or mird_s_key_lookup, for integer and string keys, respectively.
When the data read is no longer used, it must be freed by calling mird_free.
       unsigned char *value;
       mird_size_t    value_len;
   
       /* read from an integer key table (table 17) */
       if ( (res=mird_key_lookup(my_database,17,int_key,&value,&value_len)) )
   	  [...error...]
   
       if (!value)  
   	  [...action if there was no value from that key...]
   
       /* free the resulting value */    
       if (value) mird_free(value);
   
       [...]
   
       /* read from a string key table (table 18) */
       if ( (res=mird_s_key_lookup(my_database,18,string_key,key_len,&value,&value_len)) )
   	  [...error...]
   
       if (!value)  
   	  [...action if there was no value from that key...]
   
       /* free the resulting value */    
       if (value) mird_free(value);
Note that if the key doesn't exist in the table, the resulting value pointer will be NULL. If the table does not exist, or is the wrong type, this will result in the MIRDE_NO_TABLE respectively MIRDE_WRONG_TABLE exceptions.

Note also that mird_free may not fail.

If you need to read from the database in one and the same state, you can create a transaction to read in. This will keep the same state as the database were when the transaction was created, and may be read from by using mird_transaction_key_lookup and mird_transaction_s_key_lookup.

Reading a whole table

Errors and exceptions

On some occations, the database functions will return an exception. If the return value is NULL, everything went fine.

Otherwise, you may wish to handle this error in one way of other. To do this correctly, you may look at the elements in the strucure, and/or use the functions and macros available to examine an error; mird_describe_error, mird_perror and the convinience macro MIRD_IS_CONFLICT.

The returned type MIRD_RES is a pointer to a structure containing the following elements:

    int error_no;
    char *s;
    unsigned x,y,z;

error_no contains the error number the exception had, see the error list in the reference manual. s, x, y and z contains different information depending on the errors nature (from filenames and block numbers to table identifier and keys in conflicts). This is also described in the reference manual.

mird_describe_error and mird_perror converts the error to something a little bit more human-readable.

An error handling situation can look like this:

      /* do something */
      if ( (res=mird_operation([...])) )
      {
         /* check if it is something we expected */
         if (res->error_no==MIRD_NO_TABLE)
         {
            [ ... do something about that ... ]
            mird_free_error(res);
         }
	 else
         {
 	    /* it isn't something we expected,         */
	    /* print the error and quit immediately    */

            mird_perror("my_program (doing some operation)",res);
	    exit(1);
         }
      }

If you get an exception, but continues to run the application, you must always free the exception. This is done by calling mird_free_error (which may not fail).

Threads

The database library is not thread safe. Almost any operation on the database must lock the whole database, so this is the solution I recommend to use in a threaded application. The library is thread safe if you avoid working on the same database structure, so you can have more then one database open at the same time and open on both of them. (There is no global variables in the library itself.)


Implementation

This chapter is for you out there that really wants to know how this library works, and is by all means neccesary to use the library - although it could be enlightening. The database engine is divided up in several distinct areas, from low-level storage to handling transactions and tables:

   +---------------------------------------------------+
   |             low-level block storage               |
   |------------------------------------------|--------|
   |           block cache system             |        |
   |                             |------------|        |
   |                             | free block | super  |
   |-------------------------| |-|----| list  | blocks |
   |  frag block system      | |      |       |        |
   |------------------|------| |------|--| |--|        |
   |     hashtries    | data cells       | |  |        | |--------------|
   |------------------|------------------| |--|        | | journal file |
   |            tables                   | |  |        | |              |
   |  |----------------------------------| |----| |------|--------|     |
   |  |       transactions                      | |               |     |
   |--|-----------------------------------------|-|---------------|-----|
   |            database                                                |
   |--------------------------------------------------------------------|

Low-level blocks, superblocks and the free block list

The database is based on blocks. A block can be in four states, it can be All blocks are of constant size, per default 2048 bytes. This makes the database less sensitive to fragmentation.

The superblocks contain information of the database, like where to find information about the tables, what the block size are, and other various neccesary knowledge to read a database. Every 4^n-1 block is a superblock (block 0, block 3, block 15, ...). They also contain information to tell if the database is clean or in dirty mode - the later means that the journal file must be read back at database open.

The free list blocks contains lists of free blocks, that can be reused. The superblock has information on how big the file is, so if the datpabase runs out of blocks to reuse, it makes the file bigger instead.

The free list is never appended, until the database is closed or synced. No block can be reused until the database is synchronized with the disk. Unfortunately, this makes the database grow up to this point - but beyond this point, the unused space is reused.

Block cache system

Directly below the low-level block storage, there is a block cache system; various other levels use this system as both read- and write cache.

The cache system is a closed hash table of blocks, that has been read to memory. When a block is needed, it first checks the cache for the block, and if it isn't in the cache, it disposes of a block in the cache, and reuses it's space. What block is disposed of is random.

When a block is written to, it is also read to the cache, or allocated in the cache if it's a new block, the cache data pointer is returned, and the block is marked as dirty. A dirty block that has to be reused in the cache is saved to disk before reuse.

Later, when a transaction is completed, the transactions blocks still marked dirty in the cache is flushed to disk.

Fragblocks

For small segments of data, a complete block isn't used. More then one small piece of data can be stored in a special block, called fragblock. The fragblock has a table of contents at the start.

Any piece of data has an address to either a fragblock or a block of it's own. Some bits are used to address the block itself, and a few bits are used to address the data in a a fragblock, per default 27 respective 5 bits. If the later bits are zero, the address is a block of its own, but any other value is pointing to an address in a frag block. This gives 2^n-1 possible data pieces in a frag block, per default 2^5-1 = 31 pieces.

Data cells

A piece of data can either fit in a fragblock or a big block, or not fit at all. In this case, the data cell takes up more then one block (or one or some block and a piece of a fragblock). These blocks is ordered in a linked list. The allocation system is supposed to make these blocks allocated in the right order, if possible, so that readback will be forward, allowing read-cache optimization in the operative system to work more efficient.

The hashtrie

The integer-key lookup table is a heavily branching tree - I think the data structures is called hashtrie. The first node branches off on the lowest bits of the hash key, the next node branches off on the second lowest bits, etc. In the end, the leaf is always a data cell. The cell itself contains the hash key it's supposed to map from, so a full tree is never neccesary.
   			 top node
   			    |
   		  +----+----+-- ... --+ 
     bits value   0    1    2        n-1
   		  |    |    |         |
   		 node node node      node
   		       |
   	     +----+----+-- ... --+
   	     0    1    2        n-1
   	     |    |    |         |
   	    node node node      node
   		       |
   	       leaf with data cell 
   	      (in this example the key has
   	       2*n+1 as the lowest bits)

This structure makes it possible to have a huge hashtable (or integer key table, as it is in reality), without having to rewrite the whole table for any minor change. In a write to the table, the database only writes down the nodes that has change, and the new nodes neccesary to branch the tree.

It also makes a rather fast way of reading back the keys, which will be in logarithmic time. Per default there is 32 branches per node, which gives a write or readback time of approximately c + d·log32 n, where n is the number of keys in the table, and c and d is constant. This is efficient enough in about any case.

Example: We want to find the key 4711 in a database that is using 5 bits per hashtrie node (32 branches).

Writing the key is a bit more obstacled, since we can't use any other transactions already existing nodes if we have to write in them. (See
below.)
For each node we have to change, we have to allocate some data space and copy the old node there, before writing to it - if we didn't create or write in the node already. This of course makes the node directly above this node affected to change, so the whole chain back to the top node has to be rewritten.

This again makes it possible to control the whole database by knowing just the top node, and that makes it possible to cancel a transaction, which is neccesary for atomicity and isolation between transactions, which is quite useful.

String key tables

String key tables, which can be quite more useful then the normal integer key or hash key table, is handled through a special case of this hashtrie. The string key is converted to a hash key, which is used in a normal hashtrie table, but the key is stored in the data cell, and it is allowed to keep more then one key:value pair in the data cell. This is neccesary; of course more then one string can be converted to the same hash key (there are more then 2^32 strings possible, the string can be any length), but not very often - that is the main function of the string-to-hash-key conversion, so the database is still very efficient.

Master table

The database can contain more then one of these tables, and that is controlled by another special case of the above described hashtrie, called the master table. The only difference in this table is that the leafs of this table are not data, but table descriptors. The table descriptor tells us where the root of the table are, and what kind of table it is.

Journal file and transactions

Conflicts

Freeing blocks

Cleaning dirty databases



Author

The author is Mirar.

I can be contacted either on <mirar@mirar.org> or at <mirar@lysator.liu.se>.
I'm a nice guy, please praise me for my work and report bugs so the database can be even more stable.

Mirds homepage is http://www.mirar.org/mird/, where any new release can be downloaded.