powered by
Center for Curriculum and Transfer Articulation
Python for Data Analysis
Course: CIS256DA

First Term: 2022 Spring
Lec + Lab   3.0 Credit(s)   4.0 Period(s)   4.0 Load  
Subject Type: Occupational
Load Formula: T- Lab Load


Description: Introduction to data analysis concepts using Python’s rich set of tools, libraries, and packages. Includes basic data analysis, creation of meaningful data visualizations, and advanced topics such as supervised and unsupervised machine learning techniques.



MCCCD Official Course Competencies
1. Utilize Python array libraries for performing element-wise computations with arrays or mathematical operations between arrays. (I)
2. Use Python libraries and tools to extract, transform, and load datasets. (II)
3. Create meaningful data visualizations using Python visualization libraries. (III)
4. Analyze and manipulate time series data. (IV)
5. Examine various data modeling algorithms. (V)
6. Determine the best modeling algorithm to be used within machine learning. (V)
7. Apply supervised and unsupervised machine learning algorithms to perform classification, regression, and clustering. (V, VI)
 
MCCCD Official Course Outline
I. Working with NumPy
   A. NumPy basics
      1. Array indexing
      2. Array selection
      3. ndarray
   B. NumPy operations
      1. Array with array
      2. Array with scalars
      3. Universal array functions
II. Working with datasets
   A. Development environments
   B. Pandas
      1. Series
      2. Data frames
         a. Filtering
         b. Sorting
         c. Ranking
         d. Data extraction
         e. Multi-indexing
         f. GroupBy object
   C. Data loading and storing
      1. Text files
      2. JSON data
      3. XML and HTML
      4. csv and Excel files
      5. Data from databases
   D. Data cleaning and preparation
      1. Handling missing data
      2. Data transformation
      3. String manipulation
   E. Data wrangling
      1. Hierarchical indexing
      2. Combining and merging datasets
      3. Reshaping and pivoting
III. Visualizing data
   A. Matplotlib
      1. Figures and subplots
      2. Colors, markers, and line styles
      3. Ticks, labels, and legends
      4. Saving plots to file
   B. Seaborn
      1. Distribution plots
      2. Categorical plots
      3. Matrix plots
      4. Regression plots
   C. Other Python visualization libraries
IV. Time series data
   A. Date and time data types and tools
   B. Date ranges, frequencies, and shifting
   C. Time zone handling
   D. Periods and period arithmetic
V. Algorithms
   A. Classification
   B. Regression
   C. Clustering
VI. Machine learning
   A. scikit-learn
   B. Supervised learning
      1. K-Nearest neighbor
      2. Logistic regression
      3. Linear regression
      4. Decision trees and random forests
      5. Naïve Bayes and Support Vector Machine (SVM)
      6. Performance evaluation of models
   C. Unsupervised learning
      1. K-means
      2. Performance evaluation of model
 
MCCCD Governing Board Approval Date: September 28, 2021

All information published is subject to change without notice. Every effort has been made to ensure the accuracy of information presented, but based on the dynamic nature of the curricular process, course and program information is subject to change in order to reflect the most current information available.