### Welcome to ## ITEC 4220 - Advanced Data Analytics #### Cengiz Gunay - Fall 2025
Let's start with a quiz!
## Overview 1. Sign up for discussion forum 1. Syllabus 1. Message in a bottle 1. How to access textbooks 1. Assignments
## Graded events 1. Several small assignments 1. One larger project throughout the semester 1. Present at local/nearby symposia 1. Final exam
## Important dates for presenting/attending GGC CREATE Symposium poster/talk presentations, date TBA Outside events: - [Southern Data Science Conference](https://www.southerndatascience.com/) - Kennesaw State Univ [Analytics Day](https://ccpe.kennesaw.edu/sasday/) - [South Hub Events](http://sbdh-prod.ideas.gatech.edu/resources/events) - [ATLytics meetup](https://www.meetup.com/ATLytiCS-Analytics-For-Good/) - [PyData Meetup](https://www.meetup.com/PyData-Atlanta/) - [Georgia Tech Hacklytics](https://hacklytics.io/)
## Semester breakdown: 1. Data analysis basics 1. Data formats and languages 1. Data Intensive Computing 1. Analysis and statistics;
some additional techniques
### Meet your instructor Instructor: [Dr. Cengiz Gunay](http://www.ggc.edu/about-ggc/directory/cengiz-gunay) Email: cgunay@ggc.edu Office: Virtual and [W-2215](https://ggc-sd.github.io/ggcmaps3/#W-2215) (by [appointment](https://cgunay.youcanbook.me/)) Phone: 678-951-9621 (also GroupMe) Discord: cengique/Dr Gunay
### About the instructor: Dr Cengiz Gunay - Went to computer science grad school on artificial intelligence neural networks - Then, worked as post-doctoral fellow at Emory Univ. on simulating models of biological neurons and large-scale biological data mining (e.g., SQL, neural nets, genetic algos) - Started teaching as visiting Faculty teaching at Emory Univ., Math & CS Dept. - At GGC, became IT faculty with Soft Dev focus and helped start a Data Science and Analytics major
### Your turn! Quickly introduce yourself now in class! - Name - Major/Year/Domain if DSA - What do you aim to achieve by taking this course - Something fun about yourself
## First: data analysis basics 1. Review: Data scrubbing/regexp - Data Munging 1. Hypothesis testing and visualization with R 1. Data sources: SQL vs noSQL 1. Interactive dashboards (Javascript and Tableau) 1. Review: Interactive computing with Python Jupyter Notebooks
## Next: Data Intensive methods 1. Review: MapReduce in Apache Hadoop 1. Querying in Apache Spark 1. Programming in Spark
## Meanwhile: projects 1. Explore dataset 1. Use Pandas for deep analysis 1. Use Spark for time consuming process 1. Matplotlib and other visualization 1. Refine hypotheses via complex manipulation
## Finally: some new techniques -
as time (and skill) permits: 1. Review: Linear Algebra basics 1. Dimensionality reduction (Principal Component Analysis) 1. Clustering (code from scratch) 1. Classification (neural nets)
### Data Science is OSEMN: Anatomy of a project (From textbook: Data Science at the Command Line) Start with a question or hypothesis that is testable with existing data: 1. (O)btain data 2. (S)crub it for relevant parts 3. (E)xplore data to understand what can be done 4. Convert question into statistical (M)odel 5. Select and use a technique to optimize or test model with data 6. i(N)terpret results: visualize, summarize, make a recommendation 1. Go back to 1 and revisit/modify/repeat
### Obtaining data - Download - Query database - Extract from sources (e.g., HTML crawl/parse) - Generate yourself (reading sensors)
### Scrubbing data - prepping for analysis: - Filtering lines - Extracting only some columns - Replacing values - Extracting words - Handling missing values - Converting data formats
### Exploring data To understand nature of data and what can be done with it - Browse and look at data - Derive statistics - Visualize
### Modeling data To predict from data - Example 1: testing whether global warming exists; check for correlation between time and temperature - Example 2: if person A buys book X, would person B also buy it?
< Home