ECS 165B Spring 2011 - Database System Implementation
DavisDB Part 3: System Manager Component (due Sunday, 5/15 at 11:59pm)
Introduction
The third component of DavisDB that you will implement is the
Indexing component. This component provides a number of
capabilities:
- Create, destroy, open, and close databases;
- Create, destroy, bulk-load, and print tables
- Create and destroy B+ tree indices on tables
- Maintain metadata about the tables and indices in the database
in system catalog tables
These capabilities are exposed via a front-end command line shell
interface that we have provided for you in a class
called SystemParser.
Your task in this part of the project is to implement the backing
functionality, which is contained in a single class
called SystemManager.
The Command-Line Shell
The capabilities you will implement
in SystemManager are
exposed via command-line shell, implemented in the
provided SystemParser
Once you have added the new files described in
the logistics section to
your CMakeLists.txt and run cmake .
and make, the DavisDB executable will invoke
the main method of the system parser:
quatchi:DavisDB green$ ./DavisDB
Welcome to the DavisDB command-line shell.
Type "help;" for help.
Typing help; yields the following:
help;
DavisDB command-line shell commands:
create database <dbName> ;
drop database <dbName> ;
open database <dbName> ;
close database ;
create table <tableName> (<attrName> <attrType>, ..., <attrName> <attrType>) ;
where <attrType> is either float, int, or char(<length>)
drop table <tableName> ;
create index <tableName> (<attrName>) ;
drop index <tableName (<attrName>) ;
load <tableName> <fileName> ;
info ;
info <tableName> ;
print <tableName> ;
io print ;
io reset ;
help ;
quit ;
=> RC_OK
Note that semicolons are used to mark the end of a command. When a
command is entered, a corresponding method on SystemManager
is invoked to actually carry out the command:
Shell command |
SystemManager method |
Description |
create database |
createDb |
Creates a
new database in subdirectory <dbName> |
drop database |
dropDb |
Deletes subdirectory <dbName> and its
contents |
open database |
openDb |
Opens the specified database |
close database |
closeDb |
Closes the database |
create table |
createTable |
Creates a table |
drop table |
dropTable |
Drops a table |
create index |
createIndex |
Creates an index on a table |
drop index |
dropIndex |
Drops an index on a table |
load |
load |
Bulk-loads the contents of a text file into a table |
info |
info |
Lists the tables in the database |
info <tableName> |
info(tableName) |
Lists the attribute names and types of a table |
print |
print |
Prints the contents of a table |
io print |
(none) |
Displays current page I/O statistics |
io reset |
(none) |
Resets page I/O statistics |
help |
(none) |
Displays the list of shell commands |
quit |
(none) |
Exits the shell |
For a fuller description of each of
the SystemManager
methods listed above, please read the online
documentation carefully. As in Parts 1 and 2, you must not change
any of the definitions of the public methods we have provided, as this
will break our automated tests. However, you are free to add new
methods to the classes (public or private) as you see fit.
The System Catalog
The internal schema information that describes the database relations,
the format of their attributes, and which attributes have indices, is
stored in special system-managed tables called catalogs.
Like all relations, files for the catalog tables can and should be
manipulated by the methods you implemented in Parts 1 and 2.
For the basic project, you will need only two
catalogs: relations and attributes. Please
use the pre-defined strings SystemManager::RELATION_CATALOG
and SystemManager::ATTRIBUTE_CATALOG when referencing these,
and do not change the names. These two catalog relations should be
created when the "create database <dbName>" command is
executed, and they should be stored in a subdirectory
called <dbName> (each database has its own catalogs).
You will create and manipulate these relations using
the RecordManager and (perhaps) IndexManager component methods.
relations. The relations table is used
to keep track of all relations in the database. There is one tuple
in relations for each table. It is up to you to define the
contents of relations, but examples of the kind of
information you might choose to record in each tuple of
the relations table include:
relName |
The name of the relation |
recordSize |
The length of a record in bytes |
nAttributes |
The number of attributes |
You may wish to include additional information you find useful.
For example, you might opt to maintain cardinality statistics in order
to support more sophisticated optimization strategies in Part 4 of the
project.
attributes. The attributes table is used
to record the attributes of each table in the database, as well as
information about which attributes are indexed. There is one tuple
in attributes for each attribute of each table. Examples of
the kind of information you might choose to include in each tuple of
the attributes table are:
relName |
The name of the relation for the attribute |
attrName |
The name of the attribute |
attrNo |
The number of the attribute |
attrType |
The type of the attribute |
attrLength |
The length of the attribute in bytes |
offset |
The offset in bytes from the beginning of the record |
hasIndex |
Whether or not this attribute has a B+ tree index |
As for relations, the actual information
in attributes may vary from design to design, and you should
include all information that you find useful.
You should enable the user to access the catalog tables using
standard commands, as if the catalogs were regular database tables.
To do so, you will need to insert descriptions for relation
and attribute themselves into the tables when you create
them. Users should not, however, be permitted to load tuples into or
drop either of the system catalog tables.
Because the system catalogs are accessed very frequently, we
suggest that you open the catalogs for a database when the database is
opened, then keep the catalogs open until the database is closed. One
effect of doing so is that updates to the catalogs may not be
reflected onto disk immediately. Thus, if you open a catalog a second
time (to implement the info command, for example, or
to print the contents of a catalog), then you may not see the
most current version of the catalog. One solution to this problem is
to call RecordFileHandle::forceAllPages() each time a catalog
is changed.
Guidelines and Suggestions
You will almost certainly find it convenient for this component (and
for Part 4) to implement methods that obtain metadata information
from relations and attributes for a specific
relation or attribute. These methods are not included in the interface
we have provided, as they depend on your organization of the
metadata. However, it is recommended that you define and implement
such methods now, both for clean coding in the SystemManager
component and to save yourself extra work when you start Part 4 of the
project.
Logistics and Testing
You will continue working on the same codebase used in Parts 1 and 2
of the project, using the same tools (cmake, suberversion,
submit.sh, etc) described in the Record Manager logistics
section. The new files needed for this part of the project -
DavisDB.cpp, SystemParser.cpp,
SystemParser.h, SystemPrinter.cpp,
SystemPrinter.h, and SystemManager.h - will be
placed in your team's subversion repository, along with a skeleton
implementation SystemManager.cpp. You will need to update
your CMakeLists.txt to reflect the addition of these new
files by adding
SystemManager.cpp
SystemManager.h
SystemParser.cpp
SystemParser.h
SystemPrinter.cpp
SystemPrinter.h
to the list of ddblib files, and by modifying the line
set (PROGRAMS TestPF TestRM TestIX)
to build the front-end shell
set (PROGRAMS TestPF TestRM TestIX DavisDB)
In addition to the source files, you will be provided with a
sample DavisDB "script", Boathouse.ddb, that exercises the
basic functionality of the system manager. This can be "run"
using ./DavisDB < Boathouse.ddb
You will also be
provided with several sample data files for bulk
loading: Sailors.csv, Boats.csv,
and Reserves.csv. These files are used
by Boathouse.ddb. Note that you will likely find using script files
like Boathouse.ddb useful for debugging, as many debuggers
do not support interactive console input very well. When
using gdb, for example, you will need to use the command run
params < Boathouse.ddb to run the program with the contents of
the file read as input.
As usual, you will be expected to also create additional tests
yourself, empahsizing corner cases and error handling.
Documentation
Again as in the earlier assignments, you will produce a short (1-2
page maximum), high-level description of your design and testing
strategy in writeup.txt. Recall that the template file also
asks you to answer two other questions.
Before you begin coding, please answer the first question in
the text file, which asks you to estimate the total number of
person-hours you think will be needed to implement your solution
(including testing).
Before submitting your code using submit.sh 3,
please answer the the second question, which asks how many
person-hours of effort were actually required.
Submission and Grading
As stated on the main project page, the
standard formula for grading project components is 80% for correctness
(as verified manually and by automated testing), and 20% for the
writeup and for code design and readability. For this part of the
project, testing will necessarily be only semi-automated, as the
output of some commands (like info;) is only loosely defined
by the specification. For uniformity, please make sure you
use SystemPrinter to
print the contents of tables, as specified in the documentation
for SystemManager.
Also, please remove or disable any printf debugging
statements in your code before submitting, as these interfere
with interpretation of the shell output. (At this point, you should
be using the debugger nearly all of the time anyway!)
Acknowledgments
Major aspects of the DavisDB project are derived from
the RedBase
project developed
by Jennifer Widom for
use in CS 346
at Stanford, and used here with her permission.