Data Science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured. The most important interdisciplinary field of all is Machine Learning (a branch of Artificial Intelligence). Simply speaking, Machine Learning is the field of study, that gives computers the ability to learn from data, without being explicitly programmed.

The course we will cover all necessary concepts to make a successful Data Scientist. The concepts we cover are Descriptive Statistics, Inferential Statistics, Basic Python, Pandas, NumPy, SciPy, Statistical Data Analysis, StatsModels, Scikit-Learn, Mathematics behind Machine Learning Algorithms (Gradient Descent, SVM, Kernal SVM, etc.), error analysis and most of the accuracy measures, techniques of fine tuning a model.

Course |
Location |
Mode of Class |
Duration |

Data Science | Hyderabad | Class-Room/Online | 3 Months |

Trainer with 10+ years of experience

Certified Trainer

Real Time Project Examples

Exercises after every Topic

Trainer support after completion of the course

Placement Assistance

Trainers with Industry Experience and IIIT Hyderabad

All Concepts and ML/DL Algorithms with Code Examples

• Central Tendency (mean, median and mode)

• Interquartile Range

• Variance

• Standard Deviation

• Z-Score/T-Score

• Co-variance

• Correlation

• Binomial Distribution

• Introduction to Probability

• Normal Distribution

• Bar Chart

• Histogram

• Box whisker plot

• Dot-plot

• Line plot

• Scatter Plot

• How to install python (anaconda)

• How to work with Jupyter Notebook

• How to work with Spyder IDE

• Compound data types

o Strings, Lists, Tuples, Sets, Dictionaries

• Control Flows

• Keywords (continue, break, pass)

• Functions (Formal/Positional/Keyword arguments)

• Predefined functions (range, len, enumerate, zip)

• One-dimensional Array

• Two-dimensional Array

• Predefined functions (arrange, reshape, zeros, ones, empty, eye, linespace)

• Basic Matrix operations

o Slicing, indexing, Looping, Shape Manipulation, Stacking

o Scalar addition, subtraction, multiplication, division, broadcasting

o Matrix addition, subtraction, multiplication, division and transpose, broadcasting

• Series

• DataFrame

• df.GroupBy

• df.crosstab

• df.apply

• df.map

• df.mapapply

• Central Limit Theorem

• Confidence Interval and z-distribution table

• Statistical Significance

• Hypothesis testing

• P-value

• One-tailed and Two-tailed Tests

• Chi-Square Goodness of Fit Test

• F- Statistic (ANOVA)

• Skewness, Kurtosis

• Train/Test split – Data snooping bias

• Statistical Data Analysis

• Fixing missing values

• Finding outliers

• Data quality check

• Feature transformation

• Data Visualization (Matplotlib, Seaboarn)

o Categorical to Categorical

o Categorical to Quantitative

o Quantitative to Quantitative

• Bi-Variate data analysis (Hypothesis Testing)

o Categorical and Quantitative (ANOVA)

o Categorical to Categorical (Chi-Square)

o Quantitative to Categorical (Chi-Square)

o Quantitative to Quantitative (Correlation)

• What is regression?

• Simple linear regression

• Linear Regression – a statistics perspective (statsmodels – OLS)

• Evaluation metrics (R-Squre, Adj R-Squre, MSE, RMSE)

• Mean centralization and its use in multiple linear regression

• Multiple linear regression

• P – Value based feature selection methods (Backward, Forward and Mixed)

• Linear regression assumptions (linear relations – fitted vs residuals plot, homoscedasticity, normal distribution of error term, serial correlation, multicollinearity)

• Q-Q Plot, Shapiro Wilk test – different ways to check normality of data.

• Data transformation techniques.

• Label Encoding

• One-Hot (dummy variable) encoding

• Dummy variable trap

• Scikit-Learn → Custom Transformers

• Scikit-Learn → Pipeline

• Normal Equation (Linear Algebraic way of solving linear equation)

• Gradient Descent (Calculus way of solving linear equation)

• Multiple Linear Regression (SGD Regressor)

• Feature Scaling ( Min-Max vs Mean Normalization)

• Feature Transformation

• Polynomial Regression

• Train/Validation/Test split

• K-Fold Cross Validation

• The Problem of Over-fitting (Bias-Variance tread-off)

• Learning Curve

• Regularization (Ridge, Lasso and Elastic-Net)

• Feature selection

• Hyper Parameter Tuning (GridSearchCV, RandomizedSearchCV)

• Pickle (pkl file)

• Model load from pkl file and prediction

• Logistic Regression Algorithm (SGD Classifier)

• Accuracy measurements – handling imbalanced dataset

o Accuracy score

o Confusion matrix

o Precision

o Recall

o Precision – Recall tread off curve

o ROC curve

o AUC score

• Multi-class Classification

o One-vs-One

o One-vs-All

o Softmax regression classifier

• Multi-label Classification

• Multi- output Classification

• SVM Classifier (Soft/Hard – Margin)

• Linear SVM

• Non-Linear SVM

• Kernel Trick (mathematics behind kernel trick)

• Kernel SVM

• SVM Regression

• K-means

• Hierarchical

• How to use unsupervised outcome as support to solve supervised problem.

• PCA

• Math behind PCA – Eigen vectors, eigen values, covariance matrix.

• Choosing Right Number of Dimensions or Principal Components

• Incremental PCA

• Kernel PCA

• Regression Trees vs Classification Trees

• Entropy

• Gini Index

• Information Gain

• Tree pruning

• Voting Classifiers (Heterogeneous Ensemble Models)

• Homogeneous Ensemble Models

o Random Forest

o Bagging

o Pasting

o Introduction to Boosting (Ada, Gradient)

• Bayes Theorem

• Naive Bayes Algorithm

• Introduction to Text Analytics

• Tokenization

• Text Normalization, stemming, lemmatization

• Bag of words mode

• Anomaly vs Classification

• Credit Card Fraud detection – Anomaly Detection Algorithm

• Assumptions of normality

• Overview of Hadoop architecture

• Overview of YARN architecture

• Map-Reduce example

• Overview of Spark Context (–master YARN)

• Resilient Distributed Datasets (RDDs)

• RDD Operations (Transformations, Actions)

• Spark DataFrames

• Spark ML model with Pipeline

• Classification model, MulticlassMetrics

• Perceptron, Sigmoid Neuron

• Neural Network model representation

• How it works

• Forward-Propagation

• Back-Propagation

**It is better to have minimum programming knowledge.****We are anyway providing necessary programming skills to execute Data Science projects.**

**The course will make you a solution provided for the real-time Data Science problems.****We start with Statistics, Basics of Python, data analysis with Pandas, NumPy, after this we will get deeper into Machine Learning, Error analysis and Fine tuning models.****The course duration is of three months, every day one hour. Monday to Friday.**

**Yes, we broadcast live class over the internet. One could join the class either in classroom or online.****We are also providing recorded videos for further reference.**

**Business Analysts and Data Analysts.****Database professionals, Developers, Leads, Managers from Information Technology industry.****Fresh graduates who wish to make a career in Data Science, Machine Learning, Statistical Data Analysis and Artificial Intelligence.**

**We will cover all that is needed to make you a Data Scientist (Statistics, Mathematics, Machine Learning, Python, etc..).****Only requirements are, should have studied mathematics at 10 + 2 (Intermediate) level. Good intuition and logic.**

**We would say, with in one month of finishing the course. This depends on below conditions.****The course duration will take three months, we impart all concepts with utmost clarity and depth. The course material covers answers for all most all interview questions. This will build lot of confidence to face interview. It requires dedication and lot of practice of the concepts we taught. Practice makes prefect.**