ECS 165B Spring 2010 - Database System Implementation
Page File Component
Overview
The distribution you will receive for the first assignment includes
code for the lowest-level component of DavisDB, the Page File
Component. This component provides facilities for higher-level
client components to perform file I/O at the granularity of pages, and
manages the in-memory buffer pool of pages. Methods of the Page File
classes can be used to create, open, close, and delete page files,
to add and delete pages of a given file, to scan pages of a given
file, and to obtain and release buffer pool pages for scratch use.
The C++ interface for this component is described in
the online documentation; see in particular the
entries
for PageFileManager
and Filehandle and
references therein. The roles of these classes and of the overall
component are described in more detail below.
The Buffer Pool
Accessing data on a page of a file requires first reading the page
into a block of main memory from the buffer pool. While a page
is in memory and its data is available for manipulation, the page is
said to be pinned in the buffer pool. A pinned page remains in
the buffer pool until it is explicity unpinned. Unpinning a
page does not necessarily cause the page to be removed from the
buffer; an unpinned page is kept in memory as long as its space in the
buffer pool is not needed. Since a given page may be accessed by
several higher-level components at once, a pin count is
maintained for each block in the buffer pool. Thus pinning a
block increments its pin count, unpinning decrements the count,
and a block is considered pinned iff its pin count is greater than zero.
If a new page is to be read from disk into memory and there are
no free blocks left in the buffer pool, then the Page File component
will choose an unpinned page to evict from the buffer pool in order to
reuse its block. The Page File component uses a least
recently-used (LRU) replacement policy. When a page is evicted
from the buffer pool, it is written back to disk iff the page is
marked as dirty. Clients can also send explicit requests to
force the contents of a particular page to be written to disk, or to
force all dirty pages to disk.
It is crucial to remember to unpin pages as soon as they are
no longer needed. Otherwise, the buffer pool will quickly run
out of free pages, at which point you will not be able to fetch any
more pages at all. Note that a page should be unpinned even if the
higher-level component thinks that it will be needed again in the near
future. (If the page is used again soon then it will probably still
be in the buffer pool anyway.)
The standard page size in DavisDB is given by the
constant PF_PAGE_SIZE=4096 and the standard buffer pool
size in DavisDB is given by the constant PF_BUFFER_SIZE=40,
both defined in Common.h. Please
do not change the value of either constant.
Page Numbers
Pages in a file are identified by page numbers, which
correspond to their physical offsets within the file on disk. When
you initially create a file and allocate pages, page numbering will be
sequential. Once some pages have been deleted, there may be gaps in
the numbering. Subsequent page allocation will attempt to reuse the
lowest-numbered free page.
When you scan through a file by calling FileHandle::getFirstPage and
FileHandle::getNextPage
(see FileHandle), you will
obtain pages in numeric order, skipping those pages that were
allocated and then de-allocated. Since numeric scan order is
guaranteed, and because intial page ordering is sequential, it is
possible for clients to implement a policy where the first one or more
pages of a file are used for header information.
Scratch Pages
Your implementation will probably store and manipulate all of its data
on pages associated with files. However, if you wish to implement
more advanced algorithms that require storing and manipulating pages
of in temporary "scratch" memory, the buffer pool manager includes
methods for doing so
(see PageFileManager::allocateBlock
and PageFileManger::disposeBlock).
Implementing such advanced algorithms is not required, but may be
attempted by teams aiming to win
the DavisDB I/O Efficiency
Contest.
Page File C++ Interface
The Page File component comprises two main classes,
PageFileManager
and FileHandle.
PageFileManager
This class handles the creation, deletion, opening, and closing of
page files, the management of the buffer pool, and the allocation and
disposal of scratch pages. Your program should create exactly one
instance of this class, to be shared globally. The methods of the
class are described in detail in the
PageFileManager
documentation. Note the use
of ReturnCode for error
codes. A code of RC_OK = 0 indicates success, while
other ReturnCode values indicate various errors and
exceptions.
FileHandle
This class handles allocation, disposal, and modification of pages
within a page file. The methods of the class are described in detail in
the FileHandle
documentation.
Acknowledgments
Major aspects of the DavisDB project are derived from
the RedBase
project developed
by Jennifer Widom for
use in CS 346
at Stanford, and used here with her permission.