Patrice Koehl
Department of Computer Science
Genome Center
Room 4319, Genome Center, GBSF
451 East Health Sciences Drive
University of California
Davis, CA 95616
Phone: (530) 754 5121
koehl@cs.ucdavis.edu




AIX008: Introduction to Data Science: Summer 2022


Data exploration


Data exploration is the initial step in any data science project, where users explore a data set to uncover possible initial patterns, characteristics, and points of interest. This process is not meant to be exhaustive, but rather to help create a broad picture of important trends that will need to be studied in greater detail.

Some elements to look at:

  • Source of the data. IS it reliable?
  • The data themselves: are the missing data? are the data reasonable?
  • Basic statistics: measures of centrality, variations. Are there outliers?

A key element of data exploration is visualization!




Lecture Notes


Download document:

Powerpoint document (click to download)
or
PDF document (click to download)
or
PDF document: 3 slides/page (click to download)


Further Reading









  Page last modified 13 July 2022 http://www.cs.ucdavis.edu/~koehl/