ECS 165B Spring 2010 - Database System Implementation

DavisDB Part 1: Record Management Component (due 4/18 at 11:59pm)

Introduction

The first component of DavisDB that you will implement is the Record Management component. This component provides classes and methods for managing files of unordered records (also known as heap files). In the architecture of the DavisDB system, it sits atop the Page File Component that we have provided for you. Your Record Management component will store records in paged files provided by the PageFileManager. To manage the file contents, you will probably want to use the first page of each file as a special header page. This page should contain free space information, as well as whatever other metadata you find useful in your implementation. You must decide exactly how records will be laid out on pages. Your design task is simplified by the fact that each file will contain only fixed-sized records (although record size may differ across files). More detailed implementation suggestions are given below.

Logistics

Code distribution. The code for the Page File component, and the interfaces and skeleton files for the Record Manager component, are given to you in your team's subversion repository. See the main project page for details on how to access your repository. Filenames that begin with "Record" form a skeleton version of the Record Manager component implementation that you must flesh out (see interfaces below). The other source and header files form the Page File component. We have used a Java-style convention for file names, where the file name is the same as the class name, so it should be easy for you to guess which file does what. A few of the files, such as AllocationPage.cpp, correspond to classes used internally by the Page File components, and can be ignored. The distribution also contains a file Test.cpp that contains some sample tests.

cmake. We will use cmake to generate a Linux makefile to compile your code. cmake is installed on the CSIF lab machines (and is a free download for use at home). The code distribution includes a cmake makefile called CMakeLists.txt. When adding new source files and headers to your project, please make sure to keep this file updated. To generate a Linux makefile, simply run "cmake ." from the command line. cmake can generate a number of different makefile and project formats, such as Eclipse projects, as well; see the online documentation for details.

Subversion. Team members will coordinate their efforts, and submit their code, via a dedicated subversion repository for that team. See the main project page for details.

Directory structure. Some teams may feel moved to introduce a nested directory structure to their codebase, e.g., to keep the source files for separate components in their own sub-directories. However, please do not do this! (Even if you feel good engineering practice dictates otherwise :) Our automated tests assume that all source files and headers are in the same directory.

Submission and building. When submitting your code, you need to make sure that it also builds from the command line on the CSIF machines. You will not receive any credit if your submission does not build! It's recommended that you perform your project submission from the CSIF machines; as a sanity check, the submission script, submit.sh, will automatically launch a test build after submitting your code and will warn you if the build fails.

Record Manager C++ Interface

The Record Manager component consists of four main classes: RecordManager, RecordFileHandle, RecordFileScan, and Record. The online documentation describes the public methods and roles of each class in detail; please read it carefully! The documentation describes what you need to implement for each class. You must not change any of the definitions of the public methods we have provided (nor the definitions of their accompanying data types such as RecordID), as this will break our automated tests. However, you are free to add new methods to the classes (public or private) as you see fit.

Documentation

As part of your submission, you will produce a short (1-2 page), high-level description of your design in a plain text file writeup.txt. A template file is provided as part of the code distribution you will be given. Note that the template file also asks you to answer several other questions.

Before you begin coding, please answer the first question in the text file, which asks you to estimate the total number of person-hours you think will be needed to implement your solution (including testing).

Before submitting your code using submit.sh, please answer the the second question, which asks how many person-hours of effort were actually required.

The purpose of these questions is to give you some practice at the difficult task of software project estimation, to force you to think a little bit about your design and testing strategies before you plunge into implementation, and to gather useful information about the effort required for this class that we will use to tune the difficulty of assignments in subsequent years.

Guidelines and Suggestions

As mentioned already, you must not modify the public methods of the classes that we have provided (although you may add new public methods). Beyond this basic requirement, there is a fair bit of leeway in how to actually carry out the implementation. Below are some suggestions.

File and page layout. Each record file will likely need one or more header pages, on which you store information about the file as a whole, followed by a collection of data pages. Each data page will likely also contain some header information in addition to records.

File header management. Your implementation of RecordManager::openFile will call PageFileManager::openFile to actually open the file. You will probably also want to read some header information at that time, containing information such as record size and the number of pages in the file, and copy it into your RecordFileHandle object as class data. Once you have read the header information, you can unpin the header page to free its block in the buffer pool. If changes are made later to the header information, it must be eventually written back to the file when the file is closed.

Record identifiers. The RecordID class defines unique identifiers for records within a given file. Record identifiers will serve as tuple identifiers for higher-level DavisDB components. Thus, the identifier for a given record should be permanent, and should not change if the record is updated, or when other records are inserted or deleted.

Keeping track of free space. When inserting records, you are strongly discouraged from performing a linear space through pages in order to find a page with free space. Rather, you should adopt an approach like one of those described in lecture. You should also take care to avoid placing a limit on the total number of records that can be stored in a file: each file should be allowed to grow arbitrarily large. (Of course, there is an implicit limit given by the range of the int data type used for record and page numbers, but it should be possible to change this type - say, to a long - without otherwise changing your code.)

Grading

As stated on the main project page, the standard formula for grading project components is 80% for correctness (as verified by automated testing), and 20% for the writeup and for code design and readability.

Testing and Submission

We have provided a basic set of tests in the file Test.cpp. These are just meant to get you started, and an important part of the project involves developing your own thorough tests. The grading will be done using a more comprehensive version of these basic tests. Beware, we will try hard to break your code!

Details about obtaining the code distribution, using your team's subversion repository, and submitting your code are on the main project page.

Acknowledgments

Major aspects of the DavisDB project are derived from the RedBase project developed by Jennifer Widom for use in CS 346 at Stanford, and used here with her permission.