
- ML - Home
- ML - Introduction
- ML - Getting Started
- ML - Basic Concepts
- ML - Ecosystem
- ML - Python Libraries
- ML - Applications
- ML - Life Cycle
- ML - Required Skills
- ML - Implementation
- ML - Challenges & Common Issues
- ML - Limitations
- ML - Reallife Examples
- ML - Data Structure
- ML - Mathematics
- ML - Artificial Intelligence
- ML - Neural Networks
- ML - Deep Learning
- ML - Getting Datasets
- ML - Categorical Data
- ML - Data Loading
- ML - Data Understanding
- ML - Data Preparation
- ML - Models
- ML - Supervised Learning
- ML - Unsupervised Learning
- ML - Semi-supervised Learning
- ML - Reinforcement Learning
- ML - Supervised vs. Unsupervised
- Machine Learning Data Visualization
- ML - Data Visualization
- ML - Histograms
- ML - Density Plots
- ML - Box and Whisker Plots
- ML - Correlation Matrix Plots
- ML - Scatter Matrix Plots
- Statistics for Machine Learning
- ML - Statistics
- ML - Mean, Median, Mode
- ML - Standard Deviation
- ML - Percentiles
- ML - Data Distribution
- ML - Skewness and Kurtosis
- ML - Bias and Variance
- ML - Hypothesis
- Regression Analysis In ML
- ML - Regression Analysis
- ML - Linear Regression
- ML - Simple Linear Regression
- ML - Multiple Linear Regression
- ML - Polynomial Regression
- Classification Algorithms In ML
- ML - Classification Algorithms
- ML - Logistic Regression
- ML - K-Nearest Neighbors (KNN)
- ML - Naïve Bayes Algorithm
- ML - Decision Tree Algorithm
- ML - Support Vector Machine
- ML - Random Forest
- ML - Confusion Matrix
- ML - Stochastic Gradient Descent
- Clustering Algorithms In ML
- ML - Clustering Algorithms
- ML - Centroid-Based Clustering
- ML - K-Means Clustering
- ML - K-Medoids Clustering
- ML - Mean-Shift Clustering
- ML - Hierarchical Clustering
- ML - Density-Based Clustering
- ML - DBSCAN Clustering
- ML - OPTICS Clustering
- ML - HDBSCAN Clustering
- ML - BIRCH Clustering
- ML - Affinity Propagation
- ML - Distribution-Based Clustering
- ML - Agglomerative Clustering
- Dimensionality Reduction In ML
- ML - Dimensionality Reduction
- ML - Feature Selection
- ML - Feature Extraction
- ML - Backward Elimination
- ML - Forward Feature Construction
- ML - High Correlation Filter
- ML - Low Variance Filter
- ML - Missing Values Ratio
- ML - Principal Component Analysis
- Reinforcement Learning
- ML - Reinforcement Learning Algorithms
- ML - Exploitation & Exploration
- ML - Q-Learning
- ML - REINFORCE Algorithm
- ML - SARSA Reinforcement Learning
- ML - Actor-critic Method
- ML - Monte Carlo Methods
- ML - Temporal Difference
- Deep Reinforcement Learning
- ML - Deep Reinforcement Learning
- ML - Deep Reinforcement Learning Algorithms
- ML - Deep Q-Networks
- ML - Deep Deterministic Policy Gradient
- ML - Trust Region Methods
- Quantum Machine Learning
- ML - Quantum Machine Learning
- ML - Quantum Machine Learning with Python
- Machine Learning Miscellaneous
- ML - Performance Metrics
- ML - Automatic Workflows
- ML - Boost Model Performance
- ML - Gradient Boosting
- ML - Bootstrap Aggregation (Bagging)
- ML - Cross Validation
- ML - AUC-ROC Curve
- ML - Grid Search
- ML - Data Scaling
- ML - Train and Test
- ML - Association Rules
- ML - Apriori Algorithm
- ML - Gaussian Discriminant Analysis
- ML - Cost Function
- ML - Bayes Theorem
- ML - Precision and Recall
- ML - Adversarial
- ML - Stacking
- ML - Epoch
- ML - Perceptron
- ML - Regularization
- ML - Overfitting
- ML - P-value
- ML - Entropy
- ML - MLOps
- ML - Data Leakage
- ML - Monetizing Machine Learning
- ML - Types of Data
- Machine Learning - Resources
- ML - Quick Guide
- ML - Cheatsheet
- ML - Interview Questions
- ML - Useful Resources
- ML - Discussion
Machine Learning - Data Scaling
Data scaling is a pre-processing technique used in Machine Learning to normalize or standardize the range or distribution of features in the data. Data scaling is essential because the different features in the data may have different scales, and some algorithms may not work well with such data. By scaling the data, we can ensure that each feature has a similar scale and range, which can improve the performance of the machine learning model.
There are two common techniques used for data scaling −
Normalization − Normalization scales the values of a feature between 0 and 1. This is achieved by subtracting the minimum value of the feature from each value and dividing it by the range of the feature (the difference between the maximum and minimum values).
Standardization − Standardization scales the values of a feature to have a mean of 0 and a standard deviation of 1. This is achieved by subtracting the mean of the feature from each value and dividing it by the standard deviation.
Example
In Python, data scaling can be implemented using the sklearn module. The sklearn.preprocessing sub-module provides classes for scaling data. Below is an example implementation of data scaling in Python using the StandardScaler class for standardization −
from sklearn.preprocessing import StandardScaler from sklearn.datasets import load_iris import pandas as pd # Load the iris dataset data = load_iris() X = data.data y = data.target # Create a DataFrame from the dataset df = pd.DataFrame(X, columns=data.feature_names) print("Before scaling:") print(df.head()) # Scale the data using StandardScaler scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Create a new DataFrame from the scaled data df_scaled = pd.DataFrame(X_scaled, columns=data.feature_names) print("After scaling:") print(df_scaled.head())
In this example, we load the iris dataset and create a DataFrame from it. We then use the StandardScaler class to scale the data and create a new DataFrame from the scaled data. Finally, we print the dataframes to see the difference in the data before and after scaling. Note that we fit and transform the data using the fit_transform() method of the scaler object.
Output
When you execute this code, it will produce the following output −
Before scaling: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 After scaling: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 -0.900681 1.019004 -1.340227 -1.315444 1 -1.143017 -0.131979 -1.340227 -1.315444 2 -1.385353 0.328414 -1.397064 -1.315444 3 -1.506521 0.098217 -1.283389 -1.315444 4 -1.021849 1.249201 -1.340227 -1.315444