Basic Python For ML December 12 ,2024

A. Matplotlib: Visualization Simplified
 

Matplotlib is a powerful Python library used for data visualization.

  • It enables the creation of static, interactive, and animated plots to understand data better.
  • The library is highly customizable, making it suitable for both basic and advanced plotting.

Key Components of Matplotlib

  1. Pyplot Module
    • The pyplot module is the interface most commonly used for plotting.
    • It functions like MATLAB, where plotting commands can be executed step-by-step.
    • Import conventionally as: import matplotlib.pyplot as plt.
  2. Figure
    • Represents the entire canvas or "container" for your plots.
    • Created using plt.figure().
    • Can include one or more subplots.
  3. Axes and Subplots
    • Axes: The actual area where data is plotted, including the X and Y axes.
    • Subplot: Multiple plots on a single figure created using plt.subplot() or plt.subplots().
  4. Artist Layer
    • Every element in Matplotlib (lines, text, legends, etc.) is an "Artist."
    • Artists are added to the canvas to construct a plot.

Installation:
To use Matplotlib, first install it via pip:

pip install matplotlib

Basic Plotting:
The core of Matplotlib is the pyplot module, typically imported as plt.

Example: Creating a Simple Line Plot

 

import matplotlib.pyplot as plt  

# Data for plotting  
x = [1, 2, 3, 4]  
y = [10, 20, 25, 30]  

# Create a line plot  
plt.plot(x, y)  

# Add title and labels  
plt.title("Line Plot")  
plt.xlabel("X-axis")  
plt.ylabel("Y-axis")  

# Display the plot  
plt.show()  

Output:
A simple line graph representing the relationship between x and y.

Types of Plots in Matplotlib

1. Bar Charts:

  • Used to compare data across categories.
  • Vertical bars: plt.bar()
  • Horizontal bars: plt.barh()

     

Example:

categories = ['A', 'B', 'C', 'D']  
values = [3, 7, 8, 5]  

plt.bar(categories, values)  
plt.title("Bar Chart")  
plt.xlabel("Categories")  
plt.ylabel("Values")  
plt.show()  

2. Histograms:

  • Displays the frequency distribution of numerical data.
  • Useful for identifying patterns like skewness or normal distribution.
  • Created using plt.hist().

Example:

data = [1, 2, 2, 3, 3, 3, 4, 4, 5]  

plt.hist(data, bins=5, color='blue', edgecolor='black')  
plt.title("Histogram")  
plt.xlabel("Bins")  
plt.ylabel("Frequency")  
plt.show()  

3. Scatter Plots:

  • Shows relationships or correlations between two variables.
  • Created using plt.scatter().

Example:

x = [5, 7, 8, 7]  
y = [8, 5, 6, 7]  

plt.scatter(x, y, color='red')  
plt.title("Scatter Plot")  
plt.xlabel("X-axis")  
plt.ylabel("Y-axis")  
plt.show()  

4. Pie Charts:

  • Displays proportions as slices of a pie.
  • Created using plt.pie().
  • Supports customization like labels, colors, and percentage display.

Example:

labels = ['Python', 'Java', 'C++', 'Ruby']  
sizes = [40, 30, 20, 10]  

plt.pie(sizes, labels=labels, autopct='%1.1f%%')  
plt.title("Programming Language Popularity")  
plt.show()  

Advanced Features

  • Subplots: Create multiple plots in one figure.
  • Custom Styling: Use plt.style.use() to apply styles like ggplot, seaborn, etc.
  • Annotations: Add text or markers to emphasize key points in the graph.

     

Matplotlib

Function/AttributeUse
plot()Creates a 2D line plot.
scatter()Creates a scatter plot.
bar()Creates a bar plot.
hist()Creates a histogram.
pie()Creates a pie chart.
xlabel()Sets the label for the X-axis.
ylabel()Sets the label for the Y-axis.
title()Sets the title of the plot.
legend()Displays a legend for the plot.
grid()Adds a grid to the plot.
show()Displays the plot.
subplot()Creates multiple subplots in a single figure.
savefig()Saves the plot as an image file.

D. Scikit-learn: The ML Workhorse

Scikit-learn is a comprehensive library for machine learning, offering tools for data preprocessing, model building, and evaluation. It simplifies the implementation of ML algorithms.

Key Features of Scikit-learn

  1. Pre-built Algorithms:
    • Includes tools for classification (e.g., SVM, Random Forest), regression (e.g., Linear Regression), and clustering (e.g., K-means).
  2. Data Preprocessing:
    • Tools for scaling, normalization, encoding categorical variables, and splitting datasets.
  3. Model Evaluation:
    • Metrics for accuracy, precision, recall, and cross-validation.

Scikit-learn Basics

Installation:
Install Scikit-learn via pip:

pip install scikit-learn

Example: Linear Regression Model

Linear Regression is a basic supervised learning algorithm used for predicting numerical values.

Step-by-Step Implementation:

from sklearn.linear_model import LinearRegression  

# Step 1: Prepare the data  
X = [[1], [2], [3]]  # Feature data (input)  
y = [2, 4, 6]  # Target data (output)  

# Step 2: Create the model  
model = LinearRegression()  

# Step 3: Train the model  
model.fit(X, y)  

# Step 4: Make predictions  
prediction = model.predict([[4]])  
print("Prediction for X=4:", prediction)  # Output: [8]

Common ML Algorithms in Scikit-learn

1. Classification:

Classification is a supervised learning technique where the goal is to predict the category or class of given data points.

  • Input data is labeled with predefined categories, and the model learns to map inputs to these categories.

Applications:

  • Email spam detection (Spam/Not Spam).
  • Image recognition (Cat/Dog).
  • Disease diagnosis (Positive/Negative).

    Example: Logistic Regression, Decision Trees.

from sklearn.tree import DecisionTreeClassifier  

# Sample data  
X = [[0, 0], [1, 1]]  
y = [0, 1]  

# Train a decision tree classifier  
clf = DecisionTreeClassifier()  
clf.fit(X, y)  

# Predict category  
print(clf.predict([[2, 2]]))  # Output: [1]

2. Clustering:

Clustering is an unsupervised learning technique that groups data points into clusters based on their similarity.

  • Unlike classification, clustering does not use labeled data.

Applications:

  • Customer segmentation in marketing.
  • Image compression.
  • Anomaly detection.

    Example: K-means Clustering.

from sklearn.cluster import KMeans  

# Sample data  
X = [[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]  

# Apply K-means clustering  
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)  
print("Cluster centers:", kmeans.cluster_centers_)  

3. Model Evaluation:

  • Model evaluation is the process of assessing a machine learning model’s performance.
  • It helps determine how well the model generalizes to unseen data.

    Key Metrics in Scikit-learn:

  • Accuracy Score:
    • Measures the ratio of correctly predicted observations to the total observations.
    • Suitable for balanced datasets.
    • Formula: Accuracy=Number of Correct PredictionsTotal Number of Predictions\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}Accuracy=Total Number of PredictionsNumber of Correct Predictions​
  • Precision, Recall, and F1-Score:
    • Precision: Fraction of relevant instances among retrieved instances.
    • Recall: Fraction of relevant instances that were retrieved.
    • F1-Score: Harmonic mean of precision and recall.
  • Confusion Matrix:
    • A tabular representation of actual vs. predicted values.
    • Helps analyze where the model is making errors.

Example:

from sklearn.metrics import accuracy_score  

# True labels and predicted labels  
y_true = [0, 1, 1, 0]  
y_pred = [0, 1, 0, 0]  

accuracy = accuracy_score(y_true, y_pred)  
print("Accuracy:", accuracy)  # Output: 0.75

Scikit-learn

Function/AttributeUse
train_test_split()Splits data into training and testing sets.
fit()Fits a machine learning model to the training data.
predict()Predicts outputs for given input data using a trained model.
accuracy_score()Computes the accuracy of a classification model.
mean_squared_error()Computes the mean squared error for regression models.
StandardScaler()Standardizes features by removing the mean and scaling to unit variance.
PCA()Performs Principal Component Analysis for dimensionality reduction.
KMeans()Implements K-Means clustering algorithm.
cross_val_score()Evaluates a model using cross-validation.
GridSearchCV()Finds the best hyperparameters using grid search and cross-validation.
confusion_matrix()Creates a confusion matrix to evaluate classification performance.
classification_report()Provides precision, recall, and F1-score for classification models.

Key Takeaways:

Matplotlib:

  • Essential for visualizing data, identifying trends, and communicating insights.
  • Provides diverse plot types and customization options.

Scikit-learn:

  • The backbone of machine learning in Python.
  • Simplifies the implementation of algorithms and supports complete ML workflows.
     

Next Topic : Mastering Python Libraries with pip

 

Purnima
0

You must logged in to post comments.

Get In Touch

123 Street, New York, USA

+012 345 67890

techiefreak87@gmail.com

© Design & Developed by HW Infotech