ECS 165B Spring 2010 - Database System Implementation

Page File Component

Overview

The distribution you will receive for the first assignment includes code for the lowest-level component of DavisDB, the Page File Component. This component provides facilities for higher-level client components to perform file I/O at the granularity of pages, and manages the in-memory buffer pool of pages. Methods of the Page File classes can be used to create, open, close, and delete page files, to add and delete pages of a given file, to scan pages of a given file, and to obtain and release buffer pool pages for scratch use.

The C++ interface for this component is described in the online documentation; see in particular the entries for PageFileManager and Filehandle and references therein. The roles of these classes and of the overall component are described in more detail below.

The Buffer Pool

Accessing data on a page of a file requires first reading the page into a block of main memory from the buffer pool. While a page is in memory and its data is available for manipulation, the page is said to be pinned in the buffer pool. A pinned page remains in the buffer pool until it is explicity unpinned. Unpinning a page does not necessarily cause the page to be removed from the buffer; an unpinned page is kept in memory as long as its space in the buffer pool is not needed. Since a given page may be accessed by several higher-level components at once, a pin count is maintained for each block in the buffer pool. Thus pinning a block increments its pin count, unpinning decrements the count, and a block is considered pinned iff its pin count is greater than zero.

If a new page is to be read from disk into memory and there are no free blocks left in the buffer pool, then the Page File component will choose an unpinned page to evict from the buffer pool in order to reuse its block. The Page File component uses a least recently-used (LRU) replacement policy. When a page is evicted from the buffer pool, it is written back to disk iff the page is marked as dirty. Clients can also send explicit requests to force the contents of a particular page to be written to disk, or to force all dirty pages to disk.

It is crucial to remember to unpin pages as soon as they are no longer needed. Otherwise, the buffer pool will quickly run out of free pages, at which point you will not be able to fetch any more pages at all. Note that a page should be unpinned even if the higher-level component thinks that it will be needed again in the near future. (If the page is used again soon then it will probably still be in the buffer pool anyway.)

The standard page size in DavisDB is given by the constant PF_PAGE_SIZE=4096 and the standard buffer pool size in DavisDB is given by the constant PF_BUFFER_SIZE=40, both defined in Common.h. Please do not change the value of either constant.

Page Numbers

Pages in a file are identified by page numbers, which correspond to their physical offsets within the file on disk. When you initially create a file and allocate pages, page numbering will be sequential. Once some pages have been deleted, there may be gaps in the numbering. Subsequent page allocation will attempt to reuse the lowest-numbered free page. When you scan through a file by calling FileHandle::getFirstPage and FileHandle::getNextPage (see FileHandle), you will obtain pages in numeric order, skipping those pages that were allocated and then de-allocated. Since numeric scan order is guaranteed, and because intial page ordering is sequential, it is possible for clients to implement a policy where the first one or more pages of a file are used for header information.

Scratch Pages

Your implementation will probably store and manipulate all of its data on pages associated with files. However, if you wish to implement more advanced algorithms that require storing and manipulating pages of in temporary "scratch" memory, the buffer pool manager includes methods for doing so (see PageFileManager::allocateBlock and PageFileManger::disposeBlock). Implementing such advanced algorithms is not required, but may be attempted by teams aiming to win the DavisDB I/O Efficiency Contest.

Page File C++ Interface

The Page File component comprises two main classes, PageFileManager and FileHandle.

PageFileManager

This class handles the creation, deletion, opening, and closing of page files, the management of the buffer pool, and the allocation and disposal of scratch pages. Your program should create exactly one instance of this class, to be shared globally. The methods of the class are described in detail in the PageFileManager documentation. Note the use of ReturnCode for error codes. A code of RC_OK = 0 indicates success, while other ReturnCode values indicate various errors and exceptions.

FileHandle

This class handles allocation, disposal, and modification of pages within a page file. The methods of the class are described in detail in the FileHandle documentation.

Acknowledgments

Major aspects of the DavisDB project are derived from the RedBase project developed by Jennifer Widom for use in CS 346 at Stanford, and used here with her permission.