# Data Science with R Training Curriculum

### 1. Introduction to Data Science

**Overview**– Get an introduction to Data Science in this module and see how Data Science helps to analyze large and unstructured data with different tools.

**Topics:**

- What is Data Science?
- What does Data Science involve?
- The era of Data Science
- Business Intelligence vs Data Science
- The life cycle of Data Science
- Tools of Data Science
- Introduction to Big Data and Hadoop
- Introduction to R
- Introduction to Spark
- Introduction to Machine Learning

### 2. Statistical Inference

**Overview**– In this module, you will learn about different statistical techniques and terminologies used in data analysis.

**Topics:**

- What is Statistical Inference?
- Terminologies of Statistics
- Measures of Centers
- Measures of Spread
- Probability
- Normal Distribution
- Binary Distribution

### 3. Data Extraction, Wrangling, and Exploration

**Overview**– Discuss the different sources available to extract data, arrange the data in a structured form, analyze the data, and represent the data in a graphical format.

**Topics:**

- Data Analysis Pipeline
- What is Data Extraction
- Types of Data
- Raw and Processed Data
- Data Wrangling
- Exploratory Data Analysis
- Visualization of Data

**Hands-On:**

- Loading different types of the dataset in R
- Arranging the data
- Plotting the graphs

### 4. Introduction to Machine Learning

**Overview**– Get an introduction to Machine Learning as part of this module. You will discuss the various categories of Machine Learning and implement Supervised Learning Algorithms.

**Topics:**

- What is Machine Learning?
- Machine Learning Use-Cases
- Machine Learning Process Flow
- Machine Learning Categories
- Supervised Learning algorithm: Linear Regression and Logistic Regression

**Hands-On:**

- Implementing Linear Regression model in R
- Implementing a Logistic Regression model in R

### 5. Classification Techniques

**Learning Objectives**– In this module, you should learn the Supervised Learning Techniques and the implementation of various techniques, such as Decision Trees, Random Forest classifiers, etc.

**Topics:**

- What are classification and its use cases?
- What is a Decision Tree?
- Algorithm for Decision Tree Induction
- Creating a Perfect Decision Tree
- Confusion Matrix
- What is a Random Forest?
- What is Naive Bayes?
- Support Vector Machine: Classification

**Hands-On:**

- Implementing the Decision Tree model in R
- Implementing Linear Random Forest in R
- Implementing a Naive Bayes model in R
- Implementing Support Vector Machine in R

### 6. Unsupervised Learning

**Overview**– Learn about Unsupervised Learning and the various types of clustering that can be used to analyze the data.

**Topics:**

- What is Clustering & its use cases
- What is K-means Clustering?
- What is C-means Clustering?
- What is Canopy Clustering?
- What is Hierarchical Clustering?

**Hands-On:**

- Implementing K-means Clustering in R
- Implementing C-means Clustering in R
- Implementing Hierarchical Clustering in R

### 7. Recommender Engines

**Overview** – In this module, you should learn about association rules and different types of Recommender Engines.

**Topics: **

- What is Association Rules & its use cases?
- What is the Recommendation Engine & it’s working?
- Types of Recommendations
- User-Based Recommendation
- Item-Based Recommendation
- Difference: User-Based and Item-Based Recommendation
- Recommendation use cases

** Hands-On: **

- Implementing Association Rules in R
- Building a Recommendation Engine in R

### 8. Text Mining

**Overview**– Discuss Unsupervised Machine Learning Techniques and the implementation of different algorithms, for example, TF-IDF and Cosine Similarity in this Module.

**Topics:**

- The concepts of text-mining
- Use cases
- Text Mining Algorithms
- Quantifying text
- TF-IDF
- Beyond TF-IDF

**Hands-On:**

- Implementing a Bag of Words approach in R
- Implementing Sentiment Analysis on Twitter Data using R

### 9. Time Series

**Overview**– In this module, you should learn about Time Series data, different components of Time Series data, Time Series modeling – Exponential Smoothing models, and the ARIMA model for Time Series Forecasting.

**Topics:**

- What is Time Series Data?
- Time Series variables
- Different components of Time Series data
- Visualize the data to identify Time Series Components
- Implement the ARIMA model for forecasting
- Exponential smoothing models
- Identifying different time series scenario based on which different Exponential Smoothing model can be applied
- Implement the respective ETS model for forecasting

**Hands-On:**

- Visualizing and formatting Time Series data
- Plotting decomposed Time Series data plot
- Applying ARIMA and ETS model for Time Series Forecasting
- Forecasting for a given Time period

### 10. Deep Learning

**Learning Objectives**– Get introduced to the concepts of Reinforcement learning and Deep learning in this module. These concepts are explained with the help of Use cases. You will get to discuss Artificial Neural Network, the building blocks for Artificial Neural Networks, and few Artificial Neural Network terminologies.

**Topics:**

- Reinforced Learning
- Reinforcement learning Process Flow
- Reinforced Learning Use cases
- Deep Learning
- Biological Neural Networks
- Understand Artificial Neural Networks
- Building an Artificial Neural Network
- How ANN works
- Important Terminologies of ANN’s