ECS 165B Spring 2010 - Database System Implementation

DavisDB Part 3: System Manager Component (due 5/11 at 11:59pm)

Introduction

The third component of DavisDB that you will implement is the Indexing component. This component provides a number of capabilities: These capabilities are exposed via a front-end command line shell interface that we have provided for you in a class called SystemParser. Your task in this part of the project is to implement the backing functionality, which is contained in a single class called SystemManager.

The Command-Line Shell

The capabilities you will implement in SystemManager are exposed via command-line shell, implemented in the provided SystemParser Once you have added the new files described in the logistics section to your CMakeLists.txt and run cmake . and make, the DavisDB executable will invoke the main method of the system parser:
quatchi:DavisDB green$ ./DavisDB
Welcome to the DavisDB command-line shell.
Type "help;" for help.
Typing help; yields the following:
help;

DavisDB command-line shell commands:

create database <dbName> ;
drop database <dbName> ;
open database <dbName> ;
close database ;
create table <tableName> (<attrName> <attrType>, ..., <attrName> <attrType>) ;
     where <attrType> is either float, int, or char(<length>)
drop table <tableName> ;
create index <tableName> (<attrName>) ;
drop index <tableName (<attrName>) ;
load <tableName> <fileName> ;
info ;
info <tableName> ;
print <tableName> ;
io print ;
io reset ;
help ;
quit ;

 => RC_OK
Note that semicolons are used to mark the end of a command. When a command is entered, a corresponding method on SystemManager is invoked to actually carry out the command:

Shell command SystemManager method Description
create database createDb Creates a new database in subdirectory <dbName>
drop database dropDb Deletes subdirectory <dbName> and its contents
open database openDb Opens the specified database
close database closeDb Closes the database
create table createTable Creates a table
drop table dropTable Drops a table
create index createIndex Creates an index on a table
drop index dropIndex Drops an index on a table
load load Bulk-loads the contents of a text file into a table
info info Lists the tables in the database
info <tableName> info(tableName) Lists the attribute names and types of a table
print print Prints the contents of a table
io print (none) Displays current page I/O statistics
io reset (none) Resets page I/O statistics
help (none) Displays the list of shell commands
quit (none) Exits the shell

For a fuller description of each of the SystemManager methods listed above, please read the online documentation carefully. As in Parts 1 and 2, you must not change any of the definitions of the public methods we have provided, as this will break our automated tests. However, you are free to add new methods to the classes (public or private) as you see fit.

The System Catalog

The internal schema information that describes the database relations, the format of their attributes, and which attributes have indices, is stored in special system-managed tables called catalogs. Like all relations, files for the catalog tables can and should be manipulated by the methods you implemented in Parts 1 and 2.

For the basic project, you will need only two catalogs: relations and attributes. Please use the pre-defined strings SystemManager::RELATION_CATALOG and SystemManager::ATTRIBUTE_CATALOG when referencing these, and do not change the names. These two catalog relations should be created when the "create database <dbName>" command is executed, and they should be stored in a subdirectory called <dbName> (each database has its own catalogs). You will create and manipulate these relations using the RecordManager and (perhaps) IndexManager component methods.

relations. The relations table is used to keep track of all relations in the database. There is one tuple in relations for each table. It is up to you to define the contents of relations, but examples of the kind of information you might choose to record in each tuple of the relations table include:

relName The name of the relation
recordSize The length of a record in bytes
nAttributes The number of attributes

You may wish to include additional information you find useful. For example, you might opt to maintain cardinality statistics in order to support more sophisticated optimization strategies in Part 4 of the project.

attributes. The attributes table is used to record the attributes of each table in the database, as well as information about which attributes are indexed. There is one tuple in attributes for each attribute of each table. Examples of the kind of information you might choose to include in each tuple of the attributes table are:

relName The name of the relation for the attribute
attrName The name of the attribute
attrNo The number of the attribute
attrType The type of the attribute
attrLength The length of the attribute in bytes
offset The offset in bytes from the beginning of the record
hasIndex Whether or not this attribute has a B+ tree index

As for relations, the actual information in attributes may vary from design to design, and you should include all information that you find useful.

You should enable the user to access the catalog tables using standard commands, as if the catalogs were regular database tables. To do so, you will need to insert descriptions for relation and attribute themselves into the tables when you create them. Users should not, however, be permitted to load tuples into or drop either of the system catalog tables.

Because the system catalogs are accessed very frequently, we suggest that you open the catalogs for a database when the database is opened, then keep the catalogs open until the database is closed. One effect of doing so is that updates to the catalogs may not be reflected onto disk immediately. Thus, if you open a catalog a second time (to implement the info command, for example, or to print the contents of a catalog), then you may not see the most current version of the catalog. One solution to this problem is to call RecordFileHandle::forceAllPages() each time a catalog is changed.

Guidelines and Suggestions

You will almost certainly find it convenient for this component (and for Part 4) to implement methods that obtain metadata information from relations and attributes for a specific relation or attribute. These methods are not included in the interface we have provided, as they depend on your organization of the metadata. However, it is recommended that you define and implement such methods now, both for clean coding in the SystemManager component and to save yourself extra work when you start Part 4 of the project.

Logistics and Testing

You will continue working on the same codebase used in Parts 1 and 2 of the project, using the same tools (cmake, suberversion, submit.sh, etc) described in the Record Manager logistics section. The new files needed for this part of the project - SystemParser.cpp, SystemParser.h, and SystemManager.h - will be placed in your team's subversion repository, along with a skeleton implementation SystemManager.cpp. You will need to update your CMakeLists.txt to reflect the addition of these new files.

In addition to the source files, you will be provided with a sample DavisDB "script", Boathouse.ddb, that exercises the basic functionality of the system manager. This can be "run" using

./DavisDB < Boathouse.ddb
You will also be provided with several sample data files for bulk loading: Sailors.csv, Boats.csv, and Reserves.csv. These files are used by Boathouse.ddb. Note that you will likely find using script files like Boathouse.ddb useful for debugging, as many debuggers do not support interactive console input very well. When using gdb, for example, you will need to use the command run params < Boathouse.ddb to run the program with the contents of the file read as input.

As usual, you will be expected to also create additional tests yourself, empahsizing corner cases and error handling.

Documentation

Again as in the earlier assignments, you will produce a short (1-2 page maximum), high-level description of your design and testing strategy in writeup.txt. Recall that the template file also asks you to answer two other questions.

Before you begin coding, please answer the first question in the text file, which asks you to estimate the total number of person-hours you think will be needed to implement your solution (including testing).

Before submitting your code using submit.sh 3, please answer the the second question, which asks how many person-hours of effort were actually required.

Submission and Grading

As stated on the main project page, the standard formula for grading project components is 80% for correctness (as verified manually and by automated testing), and 20% for the writeup and for code design and readability. For this part of the project, testing will necessarily be only semi-automated, as the output of some commands (like info;) is only loosely defined by the specification. For uniformity, please make sure you use SystemPrinter to print the contents of tables, as specified in the documentation for SystemManager.

Also, please remove or disable any printf debugging statements in your code before submitting, as these interfere with interpretation of the shell output. (At this point, you should be using the debugger nearly all of the time anyway!)

Acknowledgments

Major aspects of the DavisDB project are derived from the RedBase project developed by Jennifer Widom for use in CS 346 at Stanford, and used here with her permission.