Computer Science Notes

Notes From CS Undergrad Courses FSU

This project is maintained by awa03

Back

What is Data Science

Systems, algorithms, and processes managing and deriving insights from heterogeneous and/or large data science.

Typical Workflow for a Data Scientist

  1. Data Collection
  2. Data Processing
  3. Exploration / Visualization
  4. Analysis / Machine Learning

Data scientists spend the majority of time performing the first two stages. Data fusion is long and tedious. Most data will be unstructured. Data management tools are very useful, and will make data management as well as processing much more efficient (AWS, Azure for example).

Structured Data

A Relation R is a subset of $S_1 x S_2 x ... x S_N$ where $S_1$ is Domain of attribute i in [1, n], and n is a number of attributes of R A Tuple t is an element of $S_1xS_2x...xS_2$

Relation Schema

Back