The SimpleDB Database System

Edward Sciore
Computer Science Department, Boston College

[Note: The SimpleDB database system is completely unrelated to the "Amazon SimpleDB Web Service" distributed by Amazon.com.]

SimpleDB is a multi-user transactional database server written in Java, that interacts with Java client programs via JDBC.  The system is intended for pedagogical use only.  The code is clean and compact.  The APIs are straightforward.  The learning curve is relatively small.  Everything about it is geared towards improving the experience of a database system internals course.

Consequently, the system is intentionally bare-bones.  It implements only a small fraction of SQL and JDBC, and does little or no error checking.  Although it is a great teaching tool, I can't imagine that anyone would want to use it for anything else.

This page gives instructions for installing and configuring the software.  I am also in the process of writing a textbook that teaches database internals using SimpleDB.  The text is scheduled to be published by John Wiley in 2008.  If you are interested in examining the manuscript and/or teaching a course from it, please email me.  If you are interested in understanding how (and why) the software works, the best bet is to take a look at the text.  You are also invited to browse through the website for the course I am currently teaching on database system internals; that course dissects SimpleDB thoroughly.  I have also published a paper in the 2007 SIGCSE conference describing my experience with the course.  The complete website associated with a previous of the course is still available; contact me for the URL.

If you find bugs in the code or have suggestions for improving it, please email me.

Downloading the Code

Feel free to download the SimpleDB distribution file, in WinZip format. The current version is 2.6, which was uploaded on September 5, 2007.  The downloaded distribution file should unpack into a folder containing three subfolders:
SimpleDB has a client-server architecture.  You run the server code on a host machine, where it will sit and wait for connections from clients. It is able to handle multiple simultaneous requests from clients, each on possibly different machines. You can then run a client program from any machine that is able to connect to the host machine.


Installing the SimpleDB Server

1)  Install the Java 1.6 SDK.  SimpleDB will work with Java 1.5 if you make some minor changes:
2)  Decide where you want the server-side software to go.  Let’s assume that the code will go in the folder C:\javalib in Windows, or the directory ~/javalib in UNIX.

3)  Copy the simpledb folder from the distribution file to the C:\javalib\ directory. Within that folder should be subfolders containing all of the code for SimpleDB.

4)  Add simpledb’s parent folder to your classpath. In other words, the javalib folder must be mentioned in your CLASSPATH environment variable.  In UNIX, your home directory has an initialization file, typically called .bashrc.  If the file does not set CLASSPATH, add the following line to the file:  
CLASSPATH =.:~/javalib        
Here, the ‘:’ character separates folder names.  The command therefore says that the folder “.” (i.e., your current diretory) and “~/javalib” are to be searched whenever Java needs to find a class.  If the file already contains a CLASSPATH setting, modify it to include the javalib directory.
 
In Windows, you must set the CLASSPATH variable via the System control panel.  From that control panel, choose the advanced tab and click on the environment variables button.  You want to have a user variable named CLASSPATH that looks like this:
.;C:\javalib
Here, the ‘;’ character separates the two folder names.

Running the SimpleDB Server

To run the SimpleDB server, you need to execute two programs:
1)  Run the rmiregistry program (which is part of your Java SDK distribution) as a separate process.  In UNIX, this means executing the command:
> rmiregistry & 
In Windows, this means executing the command:
> start rmiregistry

2)  Run Java on the simpledb.server.Startup class.  You must pass in the name of a folder that SimpleDB will use to hold the database.  For example in Windows, if you execute the following from the command prompt:
> start java simpledb.server.Startup studentdb
then the server will run in a new window, using studentdb as the database folder.  You can be in any directory when you run the Startup class; the server will use the studentdb folder that exists in your home directory.  If a folder with that name does not exist, then one will be created automatically.
 
If everything is working correctly, when you run the server with a new database folder the following will be printed in the server’s window:
creating new database
new transaction: 1
transaction 1 committed
database server ready
If you run the server with an existing database folder, the following will be printed instead:
recovering existing database
database server ready
In either case, the server will then sit awaiting connections from clients.  As connections arrive, the server will print additional messages in its window.


Running SimpleDB Clients Locally


The studentClient folder contains two subfolders, named simpledb and derby.  The simpledb subfolder contains client code that can run with the SimpleDB database server.  (The derby subfolder contains clients for the Derby server.  This server can be downloaded from db.apache.org, but is not needed for using SimpleDB.  Its purpose is to illustrate examples used in the aforementioned text that I am writing.)  These clients connect to the server using localhost, and can be run as-is from the server machine.  Unlike the server classes, the client classes are not part of an explicit package, and thus they need to be run from the directory that they are stored in.  For example, suppose we copy the studentClient folder from the distribution file to our home directory.  In Windows we could execute the client programs as follows:

> cd C:\studentClient\simpledb
> java CreateStudentDB

The following list briefly describes the SimpleDB clients.
Running SimpleDB Clients Remotely


It is certainly convenient to run SimpleDB clients on the same machine as the server, but also unrealistic and somewhat unsatisfying.  To run a client program from a different machine, two things have to occur:
A client machine does not need every SimpleDB package installed, because it will not be running any of the server code.  Instead, it needs only the packages simpledb.sql and simpledb.remote.  The classpath of the client machine must also be configured, the same as with the server machine.
 
Each client program provided with SimpleDB connects to the database server located at “localhost”.  If the client is to be run from a different machine than the server, then its source code must be modified so that localhost is replaced by the domain name (or IP address) of the server machine.


SimpleDB SQL

SimpleDB implements a tiny subset of standard SQL, and (for simplicity) imposes restrictions not present in the SQL standard.  Here we briefly indicate what these restrictions are.
 
A query in SimpleDB consists only of select-from-where clauses in which the select clause contains a list of fieldnames (without the AS keyword), and the from clause contains a list of tablenames (without range variables).
 
The where clause is optional.  The only Boolean operator is and.  The only comparison operator is equality.  Unlike standard SQL, there are no other comparison operators, no other Boolean operators, no arithmetic operators or built-in functions, and no parentheses.  Consequently, nested queries, aggregation, and computed values are not supported.

Views can be created, but a view definition can be at most 100 characters.
 
Because there are no range variables and no renaming, all field names in a query must be disjoint.  And because there are no group by or order by clauses, grouping and sorting are not supported.  Other restrictions: Of course, the whole point of SimpleDB is to be as restrictive as possible; that way, there are that many more opportunities for you to make changes!

Examining the Code

The basic packages in SimpleDB are structured hierarchically, in the following order:
The basic server is exceptionally inefficient.  The following packages enable more efficient query processing.
More details can be found in my text.