The SimpleDB
Database System
[Note:
The SimpleDB database system is completely unrelated to the "Amazon
SimpleDB Web Service" distributed by Amazon.com.]
SimpleDB
is a multi-user transactional database server
written in Java,
that interacts with Java client programs via JDBC. The system
is
intended for pedagogical use only. The code is clean and
compact.
The APIs are straightforward. The learning curve is
relatively small. Everything about it is geared towards
improving
the experience of a database system internals course.
Consequently,
the system is intentionally bare-bones. It implements only a
small
fraction of SQL and JDBC, and does little or no error
checking. Although it is a great teaching tool, I
can't imagine that anyone would want to use it for anything else.
This page gives instructions for installing and
configuring the software. I am also in the process of writing
a textbook that teaches
database
internals using SimpleDB.
The
text is scheduled to be published by John Wiley in 2008. If
you
are interested in examining the manuscript and/or teaching a course
from it, please email me.
If you are interested in
understanding how
(and why) the software works, the best bet is to take a look at the
text. You are also invited to browse through the website
for the course I am currently teaching on database system
internals; that
course dissects SimpleDB thoroughly. I have also published a paper
in the 2007 SIGCSE conference describing my experience with the course.
The complete website associated with a previous of the course
is
still available; contact me for the URL.
If you find
bugs in the code or have suggestions for improving it, please email me.
Downloading
the Code
Feel free to download the
SimpleDB distribution file, in WinZip format. The current version is
2.6, which was uploaded on September 5, 2007. The downloaded
distribution file should unpack into a folder containing three
subfolders:
- The folder simpledb,
which contains the server-side Java code.
- The
folder javadoc,
which contains the JavaDoc documentation of the above code.
- The
folder studentClient,
which contains some client-side code for an example student-course
database.
SimpleDB
has a client-server architecture.
You run the server code on a host machine, where it will
sit and wait for connections from clients. It
is able
to handle multiple simultaneous requests from clients, each on possibly
different machines. You can then run a client program from any machine
that is able to connect to the host machine.
Installing
the
SimpleDB Server
1) Install
the Java 1.6 SDK. SimpleDB
will work with Java 1.5 if you make some minor
changes:
- In the package simpledb.remote,
the classes named xxxAdapter
provide default implementations of the interfaces in java.sql.
Java version 1.6 added several extra methods to these
interfaces.
If you are using an earlier version, just comment out those
methods. (You can tell which ones they are because you'll get
an
error when you try to compile them.)
- In the package
simpledb.remote,
the classes named SimpleXXX
call the SQLException
constructor with a Throwable argument.
This constructor is new to version 1.6. To use in
1.5,
rewrite the code "throw new SQLException(e)" to be "throw new
SQLException(e.getMessage())".
2) Decide
where you want the server-side software to go.
Let’s assume that the code will go in the folder C:\javalib
in Windows, or the directory ~/javalib in UNIX.
3) Copy the simpledb folder
from the distribution file to the C:\javalib\
directory. Within that folder should be subfolders containing
all of the code for
SimpleDB.
4) Add
simpledb’s
parent
folder to your classpath.
In other words, the javalib folder
must
be mentioned in your CLASSPATH environment variable.
In UNIX, your home directory has an
initialization file, typically called .bashrc. If the file does not set
CLASSPATH, add the
following line to the file:
CLASSPATH
=.:~/javalib
Here, the ‘:’ character separates
folder names. The
command therefore says
that the folder “.” (i.e., your current diretory) and “~/javalib” are
to be
searched whenever Java needs to find a class.
If the file already contains a CLASSPATH setting, modify
it to
include
the javalib directory.
In Windows, you must set the
CLASSPATH variable via the System
control panel. From
that control panel,
choose the advanced tab and click
on
the environment variables
button. You want to
have a user variable
named CLASSPATH that looks like this:
.;C:\javalib
Here,
the
‘;’ character separates
the two folder names.
Running the SimpleDB Server
To run the SimpleDB
server, you need to
execute two
programs:
1)
Run the rmiregistry program (which is part
of your Java SDK distribution) as a separate process.
In UNIX, this means executing the command:
>
rmiregistry &
In Windows, this means executing
the command:
> start rmiregistry
2)
Run Java on the simpledb.server.Startup
class.
You must pass in the name of a folder that SimpleDB will
use to
hold the
database. For
example in Windows, if you
execute the following from the command prompt:
>
start java simpledb.server.Startup studentdb
then
the server will run in a new
window, using studentdb as the database folder. You
can be in any directory when you run the Startup
class; the server will use the studentdb
folder
that exists in your home directory. If
a folder with that name does not exist,
then one will be created automatically.
If everything is working correctly, when you run the server
with a new database folder the following will be printed in the
server’s
window:
creating
new database
new transaction: 1
transaction 1
committed
database server ready
If
you run the server with an existing database folder, the
following will be printed instead:
recovering
existing database
database server ready
In
either case, the server will then sit awaiting
connections from clients. As
connections
arrive, the server will print additional messages in its window.
Running
SimpleDB Clients
Locally
The studentClient folder
contains two subfolders, named simpledb
and derby.
The simpledb
subfolder contains client code that can run with the SimpleDB database
server. (The derby subfolder contains clients for the Derby
server. This server can be downloaded from db.apache.org,
but is not needed for using SimpleDB. Its purpose is to
illustrate examples used in the aforementioned text that I am writing.)
These clients
connect to the
server using localhost, and can be run as-is from
the server
machine. Unlike the
server classes, the
client classes are not part of an explicit package, and thus they need
to be
run from the directory that they are stored in.
For example, suppose we copy the studentClient
folder from the
distribution file to our home directory. In Windows we could
execute the client programs as follows:
>
cd C:\studentClient\simpledb
> java
CreateStudentDB
The following list
briefly describes the SimpleDB clients.
- CreateStudentDB creates and
populates the
student database used by the other clients.
It therefore must be the first client run on a new
database.
- StudentMajors prints
a table listing the
names of students and their majors.
- FindMajors requires a command-line argument
denoting the name of a
department. The
program then prints the
name and graduation year of all students having that major.
- SQLInterpreter
repeatedly prints a prompt asking you to enter a single line of text
containing
an SQL statement. The
program then
executes that statement. If
the
statement is a query, the output table is displayed.
If the statement is an update command, then
the number of affected records is printed.
If the statement is ill formed, and error message will be
printed. SimpleDB
understands only a
limited subset of
SQL, which is described below.
- ChangeMajor changes Amy to be a drama major.
It is the only client that updates the database (although
you can use SQLInterpreter
to run update commands).
Running SimpleDB Clients Remotely
It is certainly convenient to run
SimpleDB clients on the
same machine as the server, but also unrealistic and somewhat
unsatisfying. To
run a client program
from a different machine, two things have to occur:
- The client machine must be properly
configured.
- The client program must be
modified.
A client machine does not need
every SimpleDB package
installed, because it will not be running any of the server code. Instead, it needs only the
packages simpledb.sql
and simpledb.remote. The
classpath of the client machine must also
be configured, the same as with the server machine.
Each client program provided with SimpleDB connects to the
database server located at “localhost”.
If
the client is to be run from a different machine than the server, then
its
source code must be modified so that localhost is
replaced by
the domain
name (or IP address) of the server machine.
SimpleDB SQL
SimpleDB implements a tiny subset of
standard SQL, and (for
simplicity) imposes restrictions not present in the SQL standard. Here we briefly indicate
what
these restrictions are.
A query in SimpleDB consists only of select-from-where
clauses in which the select clause
contains a list of
fieldnames (without the AS
keyword), and the from clause
contains a list of tablenames (without range variables).
The where clause
is optional. The only Boolean operator is and.
The only comparison operator is equality.
Unlike standard SQL, there are no other
comparison operators, no other Boolean operators, no arithmetic
operators or
built-in functions, and no parentheses.
Consequently, nested queries, aggregation, and computed
values
are not
supported.
Views can be created, but a view definition can be
at most 100
characters.
Because there are no range variables and no renaming, all
field names in a query must be disjoint.
And because there are no group by
or order by clauses, grouping and
sorting are not supported. Other
restrictions:
- The “*” abbreviation in
the select clause is not supported.
- There are no null
values.
- There
are no explicit joins or outer
joins in the from clause.
- The union
and except keywords are not
supported.
- Insert statements take
explicit values only, not queries.
- Update
statements can
have only one assignment in the set
clause.
Of course, the whole point of SimpleDB is to be as
restrictive as
possible; that way, there are that many more opportunities for you to
make changes!
Examining
the Code
The basic packages in SimpleDB are structured hierarchically,
in the
following order:
- file (Manages OS files as a virtual disk.)
- log (Manages the log.)
- buffer (Manages
a buffer pool of pages in memory that acts as a cache of disk
blocks.)
- tx (Implements transactions at the page
level. Does
locking and logging.)
- record (Implements
fixed-length records inside of pages.)
- metadata
(Maintains metadata in the system catalog.)
- query
(Implements relational algebra operations. Each
operation has a plan
class, used by the planner, and a scan
class, used
at runtime.)
- parse (Implements the parser.)
- planner (Implements a naive planner for SQL
statements.)
- remote (Implements the server using
RMI. This is where the JDBC driver code is.)
- server
(The place where the startup and initialization code live. The class Startup contains
the main
method.)
- sql (Not part of the
server. Implements client-side
functionality. A subset of the java.sql pckage.)
The basic server is exceptionally inefficient. The
following packages enable more efficient query processing.
- opt (Implements a heuristic query
optimizer)
- index (Implements static
hash and btree indexes, as well as
extensions to the parser and planner to take advantage of them.)
- materialize (Implements implementations of
the relational operators materialize,
sort, groupby, and mergejoin.)
- multibuffer (Implements modifications to the sort and product operators,
in order to make optimum use of available buffers.)
More details can be found in my text.