Header page. As in Part 1, your index will be constructed using an underlying page files, and as was the case with record files, each index file will likely need a header page, on which you store information about the file as a whole and information needed to find the other pages of the index.
Nodes and pages. The order of the B+ tree is not specified a priori and is not exposed through the IndexManager interface. However, for efficiency, you will probably want to store one node per page, with the order of the tree determined by the size of the attribute being indexed and PF_PAGE_SIZE.
Store record ids, not records. You should not store the data of the record itself in the B+ tree leaves, only pointers to the records (i.e., record ids). How exactly this is done is up to you. Recall from lecture that the two basic alternatives are to store either (a) a (key, recordId) pair for each entry in the tree, or (b) a (key, recordIdList) pair for each key in the tree. The first option is more straightforward to implement, but the second is more space-efficient when duplicate keys are common.
Duplicates. You need to consider the possibility of many entries in the index having the same key. Your implementation should be able to handle these cases gracefully, even if there are so many entries with a given key that the record ids do not even fit on the same page. This has ramifications for both schemes (a) and (b) discussed above: for scheme (a), when a split is performed it may result in two leaf pages that do not have a "separating key", so you will need to use, e.g., a special key value indicating this; and for scheme (b), you will need to use overflow pages when the list of record ids gets too long. One way to avoid these headaches is to use attribute value, record id pairs as keys in the tree, rather than attribute value alone: by doing this, all entries in the tree will be guaranteed to have distinct keys, and schemes (a) and (b) become the same scheme. This also makes deleting entries from the tree faster (as lists of duplicates never need to be scanned). However, it also carries significant space overhead, hence will probably hurt your I/O performance in the bigger picture.
Deletions. Since implementing the textbook B+ tree deletion algorithm is rather tricky, you may implement with no loss of credit a simpler scheme involving tombstones. In this approach, when an entry is deleted it is replaced by a special marker indicating an empty slot (which may be reused later). Internal tree nodes are never deleted or merged. Extra credit: implement the textbook B+ tree deletion algorithm, using merging and redistribution. (Warning: we recommend you get things working with tombstones first even if you plan to tackle the textbook deletion algorithm.)
Updates during scans. Index scans will be used by higher-level components when executing selection, join, and delete operations, as well as update operations on attributes other than the indexed attribute. Thus, each index entry scanned will either be used to fetch (and possibly update) a record, or to delete a record. While making an index scan work correctly for selection, join, and non-index-key update operations is relatively straightforward, deletion operations are more complicated even when using tombstones. You must ensure that it is possible to use an index scan to find and then delete all index entries satisfying a condition.
You may assume that during a retrieval scan (for selection, join, and update operations) no index entries will be inserted or deleted. You also may assume that during a deletion scan, no other index records will be inserted or deleted, and no other scans will be underway.
Order of records returned by scans. Depending on your design, it may require some extra effort for scans to always return record ids such that the corresponding attribute values are in (nondecreasing) order. This property is not required in the project; however, you may find it useful later on in Part 4, e.g., if you decide to use index scans to produce a sorted relation.Before you begin coding, please answer the first question in the text file, which asks you to estimate the total number of person-hours you think will be needed to implement your solution (including testing).
Before submitting your code using submit.sh, please answer the the second question, which asks how many person-hours of effort were actually required.Details about obtaining the code distribution, using your team's subversion repository, and submitting your code are on the main project page.