Breast Cancer Classification using PyTorch: A Complete Binary Classification Project

27 minute read

Published:

Breast Cancer Classification using PyTorch: A Complete Binary Classification Project

1. Introduction

In the previous PyTorch lesson, we used a simple linear regression model to predict a continuous value.

For example:

Input:
study_hours
class_attendance
sleep_hours

Output:
student_score

That project was a regression problem because the output was a number.

In this project, we move to a new type of machine learning problem:

Binary Classification

The goal is to predict one of two possible classes.

In this project, we use the Breast Cancer Wisconsin Dataset to predict whether a tumor is:

0 = Malignant
1 = Benign

This project is useful because it shows a complete PyTorch workflow:

load dataset
prepare data
scale features
split train and test data
convert data to PyTorch tensors
create DataLoader
build neural network
train the model
evaluate the model
plot training loss
plot confusion matrix
plot ROC curve
save the model
load the model
predict unknown data

This article focuses on the project explanation.

The basic concepts such as neural networks, tensors, StandardScaler, and binary classification are already explained in the previous articles.


2. Project Goal

The main goal of this project is:

Given 30 tumor measurements,
predict whether the tumor is malignant or benign.

The model receives input features such as:

mean radius
mean texture
mean perimeter
mean area
mean smoothness

Then it predicts one output:

0 or 1

The target meaning is:

Target ValueMeaning
0Malignant
1Benign

In simple words:

Malignant = cancerous tumor
Benign = non-cancerous tumor

3. Dataset Overview

We use the dataset directly from Scikit-learn:

from sklearn.datasets import load_breast_cancer

This dataset contains measurements from breast tumor samples.

The dataset has:

ItemValue
Number of samples569
Number of input features30
Number of classes2

The target distribution is:

1    357
0    212

This means:

357 benign tumors
212 malignant tumors

So the dataset is not perfectly balanced, but it is still acceptable for a beginner binary classification project.


4. Project Workflow

The full workflow is:

Breast Cancer Dataset
        ↓
Create X and y
        ↓
Scale input features
        ↓
Split train/test data
        ↓
Convert to PyTorch tensors
        ↓
Create TensorDataset
        ↓
Create DataLoader
        ↓
Build neural network
        ↓
Train model
        ↓
Evaluate model
        ↓
Save model
        ↓
Predict unknown patient

In this project, I separated the code into three files:

data_prep.py
train_binary.py
predict.py

The purpose of each file is:

FilePurpose
data_prep.pyLoad and prepare the dataset
train_binary.pyTrain and evaluate the PyTorch model
predict.pyLoad the saved model and predict unknown data

5. Data Preparation File: data_prep.py

The first file prepares the data before training.

Full code:

#%% Packages
import pandas as pd
import joblib

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# =============================================================================
# LOAD DATASET
# =============================================================================

data = load_breast_cancer()

X = pd.DataFrame(
    data.data,
    columns=data.feature_names
)

y = pd.Series(
    data.target,
    name="target"
)

print("\nDataset Shape")
print(X.shape)

print("\nFeature Names")
print(X.columns.tolist())

print("\nTarget Distribution")
print(y.value_counts())

# =============================================================================
# FEATURE SCALING
# =============================================================================

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

joblib.dump(
    scaler,
    "scaler.pkl"
)

# =============================================================================
# TRAIN TEST SPLIT
# =============================================================================

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled,
    y,
    test_size=0.2,
    stratify=y,
    random_state=42
)

print("\nTrain/Test Split")
print(f"X_train: {X_train.shape}")
print(f"X_test : {X_test.shape}")

6. Explaining the Data Preparation Code

6.1 Import Packages

import pandas as pd
import joblib

pandas is used to create a table-like dataset.

joblib is used to save the StandardScaler object.

We save the scaler because future unknown data must be scaled using the same mean and standard deviation as the training data.


6.2 Load the Dataset

data = load_breast_cancer()

This loads the breast cancer dataset from Scikit-learn.

The object data contains:

data.data
data.target
data.feature_names
data.target_names

6.3 Create the Feature Matrix X

X = pd.DataFrame(
    data.data,
    columns=data.feature_names
)

X contains the input features.

Each row represents one tumor sample.

Each column represents one measurement.

For example:

mean radius
mean texture
mean perimeter
mean area
mean smoothness

The shape is:

(569, 30)

This means:

569 samples
30 input features

6.4 Create the Target Vector y

y = pd.Series(
    data.target,
    name="target"
)

y contains the correct answer for each sample.

The values are:

0 = malignant
1 = benign

So for each row in X, there is one matching label in y.

Example:

Tumor sampleFeaturesTarget
Sample 130 measurements0
Sample 230 measurements1
Sample 330 measurements1

6.5 Print Dataset Information

print(X.shape)
print(X.columns.tolist())
print(y.value_counts())

This helps us understand the dataset before training.

The output is similar to:

Dataset Shape
(569, 30)

Target Distribution
1    357
0    212

This tells us:

There are 569 total samples.
There are 30 input features.
There are 357 benign tumors.
There are 212 malignant tumors.

7. Feature Scaling

The model should not train directly on raw input values.

Some features may have values like:

mean radius = 14
mean area = 650
worst area = 850

The area values are much larger than radius values.

If we train directly on these raw numbers, the model may give too much attention to features with large values.

So we use:

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

StandardScaler changes the data so that each feature has approximately:

mean = 0
standard deviation = 1

This helps the model train more smoothly.


8. Saving the Scaler

joblib.dump(
    scaler,
    "scaler.pkl"
)

This saves the fitted scaler.

This is very important.

During training, we use:

fit_transform()

This calculates the mean and standard deviation from the training data.

During prediction, we must use:

transform()

This uses the same mean and standard deviation.

The correct workflow is:

Training:
fit scaler
scale training data
save scaler

Prediction:
load scaler
scale unknown data
predict using trained model

If we do not save the scaler, unknown data may be scaled differently, and the prediction may become wrong.


9. Train/Test Split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled,
    y,
    test_size=0.2,
    stratify=y,
    random_state=42
)

This separates the dataset into training and testing data.

Training data = used for learning
Testing data = used for final evaluation

Because test_size=0.2, the data is split into:

80% training
20% testing

The output is:

X_train: (455, 30)
X_test : (114, 30)

This means:

455 samples are used for training.
114 samples are used for testing.

10. Why Use stratify=y?

stratify=y

This keeps the class distribution similar in both training and testing sets.

Original dataset:

357 benign
212 malignant

If we split randomly without stratify, the test set might accidentally contain too many benign or too many malignant samples.

With stratify=y, the train and test sets keep a similar class ratio.

This makes evaluation fairer.


11. Training File: train_binary.py

After preparing the data, we train the PyTorch model.

Full code:

#%% Packages
import torch
import torch.nn as nn
import torch.optim as optim

from torch.utils.data import (
    TensorDataset,
    DataLoader
)

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    classification_report,
    roc_curve,
    auc
)

from sklearn.dummy import DummyClassifier

from data_prep import (
    X_train,
    X_test,
    y_train,
    y_test
)

# =============================================================================
# HYPERPARAMETERS
# =============================================================================

BATCH_SIZE = 16
LEARNING_RATE = 0.001
EPOCHS = 50

DEVICE = torch.device(
    "cuda" if torch.cuda.is_available() else "cpu"
)

print("Device:", DEVICE)

# =============================================================================
# DATASET
# =============================================================================

train_dataset = TensorDataset(
    torch.tensor(X_train, dtype=torch.float32),
    torch.tensor(y_train.values, dtype=torch.float32)
)

test_dataset = TensorDataset(
    torch.tensor(X_test, dtype=torch.float32),
    torch.tensor(y_test.values, dtype=torch.float32)
)

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False
)

# =============================================================================
# MODEL
# =============================================================================

class BreastCancerModel(nn.Module):

    def __init__(self, input_size):

        super().__init__()

        self.network = nn.Sequential(

            nn.Linear(input_size, 32),
            nn.ReLU(),

            nn.Linear(32, 16),
            nn.ReLU(),

            nn.Linear(16, 1)

        )

    def forward(self, x):
        return self.network(x)

# =============================================================================
# CREATE MODEL
# =============================================================================

INPUT_SIZE = X_train.shape[1]

model = BreastCancerModel(
    input_size=INPUT_SIZE
).to(DEVICE)

print(model)

# =============================================================================
# LOSS FUNCTION
# =============================================================================

loss_fn = nn.BCEWithLogitsLoss()

optimizer = optim.Adam(
    model.parameters(),
    lr=LEARNING_RATE
)

# =============================================================================
# TRAINING
# =============================================================================

train_losses = []

for epoch in range(EPOCHS):

    model.train()

    running_loss = 0

    for X_batch, y_batch in train_loader:

        X_batch = X_batch.to(DEVICE)
        y_batch = y_batch.to(DEVICE)

        optimizer.zero_grad()

        logits = model(X_batch)

        loss = loss_fn(
            logits,
            y_batch.view(-1,1)
        )

        loss.backward()

        optimizer.step()

        running_loss += loss.item()

    avg_loss = running_loss / len(train_loader)

    train_losses.append(avg_loss)

    print(
        f"Epoch [{epoch+1:02d}/{EPOCHS}] "
        f"Loss: {avg_loss:.4f}"
    )

# =============================================================================
# LOSS CURVE
# =============================================================================

plt.figure(figsize=(8,5))

plt.plot(
    range(1,EPOCHS+1),
    train_losses,
    marker='o'
)

plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss")
plt.grid()

plt.savefig(
    "breast_cancer_training_loss.png",
    dpi=300,
    bbox_inches="tight"
)

plt.show()

# =============================================================================
# EVALUATION
# =============================================================================

model.eval()

y_true = []
y_pred = []
y_prob = []

with torch.no_grad():

    for X_batch, y_batch in test_loader:

        X_batch = X_batch.to(DEVICE)

        logits = model(X_batch)

        probabilities = torch.sigmoid(logits)

        predictions = (
            probabilities > 0.5
        ).int()

        y_true.extend(
            y_batch.numpy()
        )

        y_pred.extend(
            predictions.cpu().numpy().flatten()
        )

        y_prob.extend(
            probabilities.cpu().numpy().flatten()
        )

# =============================================================================
# ACCURACY
# =============================================================================

accuracy = accuracy_score(
    y_true,
    y_pred
)

print("\nAccuracy")
print(accuracy)

# =============================================================================
# REPORT
# =============================================================================

print("\nClassification Report")

print(
    classification_report(
        y_true,
        y_pred
    )
)

# =============================================================================
# CONFUSION MATRIX
# =============================================================================

cm = confusion_matrix(
    y_true,
    y_pred
)

plt.figure(figsize=(6,5))

sns.heatmap(
    cm,
    annot=True,
    fmt='d',
    cmap='Blues'
)

plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")

plt.savefig(
    "breast_cancer_confusion_matrix.png",
    dpi=300,
    bbox_inches="tight"
)

plt.show()

# =============================================================================
# ROC CURVE
# =============================================================================

fpr, tpr, thresholds = roc_curve(
    y_true,
    y_prob
)

roc_auc = auc(
    fpr,
    tpr
)

plt.figure(figsize=(8,6))

plt.plot(
    fpr,
    tpr,
    label=f"AUC = {roc_auc:.4f}"
)

plt.plot(
    [0,1],
    [0,1],
    '--'
)

plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")

plt.legend()
plt.grid()

plt.savefig(
    "breast_cancer_roc_curve.png",
    dpi=300,
    bbox_inches="tight"
)

plt.show()

print("\nAUC Score")
print(roc_auc)

# =============================================================================
# BASELINE
# =============================================================================

baseline = DummyClassifier(
    strategy="most_frequent"
)

baseline.fit(
    X_train,
    y_train
)

baseline_pred = baseline.predict(
    X_test
)

baseline_acc = accuracy_score(
    y_test,
    baseline_pred
)

print("\nBaseline Accuracy")
print(baseline_acc)

print("\nNeural Network Accuracy")
print(accuracy)

# =============================================================================
# SAVE MODEL
# =============================================================================

torch.save(
    model.state_dict(),
    "breast_cancer_model.pth"
)

print("\nModel saved successfully!")

12. Importing PyTorch Packages

import torch
import torch.nn as nn
import torch.optim as optim

These are the main PyTorch packages.

PackagePurpose
torchTensor operations
torch.nnNeural network layers
torch.optimOptimizers such as Adam

13. Importing Dataset Tools

from torch.utils.data import (
    TensorDataset,
    DataLoader
)

TensorDataset combines input tensors and target tensors.

DataLoader helps the model train using mini-batches.

Instead of giving all 455 training samples to the model at once, DataLoader gives smaller groups of samples.


14. Hyperparameters

BATCH_SIZE = 16
LEARNING_RATE = 0.001
EPOCHS = 50

These are training settings.

Batch Size

BATCH_SIZE = 16

This means the model trains with 16 samples at a time.

Because there are 455 training samples:

455 / 16 ≈ 28.4

So one epoch has about 29 mini-batches.

One epoch means the model sees all 455 training samples once.

Epochs

EPOCHS = 50

This means the model will go through the whole training dataset 50 times.

Learning Rate

LEARNING_RATE = 0.001

The learning rate controls how large each weight update is.

A small learning rate learns slowly but safely.

A large learning rate learns faster but may overshoot the best solution.


15. CPU or GPU

DEVICE = torch.device(
    "cuda" if torch.cuda.is_available() else "cpu"
)

This checks whether GPU is available.

If CUDA is available, the model uses GPU.

Otherwise, it uses CPU.

Output:

Device: cpu

This means the training runs on the CPU.

For this dataset, CPU is fine because the dataset is small.


16. Creating PyTorch Dataset

train_dataset = TensorDataset(
    torch.tensor(X_train, dtype=torch.float32),
    torch.tensor(y_train.values, dtype=torch.float32)
)

PyTorch models cannot train directly on Pandas or NumPy data.

So we convert the data into tensors.

Each sample becomes:

(input_features, target)

Example:

[0.21, -0.52, 1.33, ...] → 1

The first tensor contains the input features.

The second tensor contains the correct label.


17. Creating DataLoader

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True
)

The DataLoader splits the training dataset into mini-batches.

Because batch_size=16, each batch contains 16 samples.

shuffle=True means the order of training samples is randomly changed each epoch.

This helps the model learn better because it does not always see the samples in the same order.

For the test loader:

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False
)

We use shuffle=False because we are only evaluating the model.


18. Neural Network Model

class BreastCancerModel(nn.Module):

This defines a custom PyTorch model.

The architecture is:

30 input features
        ↓
32 neurons
        ↓
16 neurons
        ↓
1 output

The code is:

self.network = nn.Sequential(

    nn.Linear(input_size, 32),
    nn.ReLU(),

    nn.Linear(32, 16),
    nn.ReLU(),

    nn.Linear(16, 1)

)

19. Linear Layers

nn.Linear(input_size, 32)

The first layer receives 30 input features and produces 32 outputs.

Because:

INPUT_SIZE = X_train.shape[1]

and X_train.shape[1] is 30.

So the first layer is:

Linear(30 → 32)

The second layer is:

Linear(32 → 16)

The final layer is:

Linear(16 → 1)

The final output is one number because this is binary classification.


20. ReLU Activation

nn.ReLU()

ReLU helps the neural network learn non-linear patterns.

Without activation functions, multiple linear layers would still behave like one linear layer.

ReLU changes negative values to zero and keeps positive values.

ReLU(-5) = 0
ReLU(3) = 3

21. Output Layer and Logits

The final layer outputs one raw number.

This raw number is called a logit.

Example:

logit = 4.5
logit = -2.1
logit = 0.3

A positive logit usually means the model leans toward class 1.

A negative logit usually means the model leans toward class 0.

But logits are not probabilities yet.

To convert logits into probabilities, we use sigmoid during evaluation:

probabilities = torch.sigmoid(logits)

22. Loss Function

loss_fn = nn.BCEWithLogitsLoss()

This loss function is used for binary classification.

It combines two steps:

sigmoid
binary cross entropy loss

We do not put sigmoid inside the model during training because BCEWithLogitsLoss() already includes sigmoid internally.

This is more numerically stable.


23. Optimizer

optimizer = optim.Adam(
    model.parameters(),
    lr=LEARNING_RATE
)

The optimizer updates the model weights.

The model first makes predictions.

Then the loss function measures the error.

Then backpropagation calculates gradients.

Then Adam updates the weights to reduce the loss.

The learning rate controls how large the update is.


24. Training Loop

The training loop is the heart of the project.

for epoch in range(EPOCHS):

This repeats the training process 50 times.

Inside each epoch:

model.train()

This sets the model to training mode.

Then:

running_loss = 0

This stores the total loss for the epoch.


24.1 Loop Through Mini-Batches

for X_batch, y_batch in train_loader:

This takes one mini-batch at a time.

Because batch size is 16, each batch contains 16 samples.

For each mini-batch, the model does:

forward pass
loss calculation
backward pass
weight update

24.2 Move Batch to Device

X_batch = X_batch.to(DEVICE)
y_batch = y_batch.to(DEVICE)

If the model is on GPU, the data must also be on GPU.

If the model is on CPU, the data stays on CPU.

The model and the data must be on the same device.


24.3 Clear Old Gradients

optimizer.zero_grad()

PyTorch accumulates gradients by default.

So before calculating new gradients, we clear the old gradients.

If we forget this line, gradients from previous batches will mix with gradients from the current batch.

This can make training incorrect.


24.4 Forward Pass

logits = model(X_batch)

The model receives the input batch and produces logits.

For example:

Input batch: 16 tumor samples
Output: 16 logits

Each logit is one raw prediction.


24.5 Calculate Loss

loss = loss_fn(
    logits,
    y_batch.view(-1,1)
)

This compares the predicted logits with the true labels.

The target y_batch is reshaped using:

y_batch.view(-1,1)

This makes the target shape match the output shape.

If the model output shape is:

[16, 1]

then the target should also be:

[16, 1]

24.6 Backpropagation

loss.backward()

This calculates gradients.

A gradient tells the model how each weight should change to reduce the loss.

In simple words:

loss.backward()
= calculate how wrong each weight was

24.7 Update Weights

optimizer.step()

This updates the model weights using the gradients.

This is where learning actually happens.

The weight update happens after every mini-batch, not only after each epoch.

Because there are about 29 mini-batches per epoch and 50 epochs:

29 × 50 = 1450 weight updates

So the model updates its weights around 1450 times.


24.8 Store the Loss

running_loss += loss.item()

This adds the current batch loss to the total epoch loss.

Then:

avg_loss = running_loss / len(train_loader)

calculates the average loss for that epoch.

This is what we print.


25. Training Output

Example output:

Epoch [01/50] Loss: 0.6512
Epoch [02/50] Loss: 0.4548
Epoch [03/50] Loss: 0.2650
Epoch [04/50] Loss: 0.1629
Epoch [05/50] Loss: 0.1127
Epoch [10/50] Loss: 0.0529
Epoch [20/50] Loss: 0.0282
Epoch [30/50] Loss: 0.0154
Epoch [40/50] Loss: 0.0082
Epoch [50/50] Loss: 0.0043

This shows that the model is learning.

At the beginning:

Loss = 0.6512

The model is still weak.

After a few epochs:

Loss = 0.1127

The model has learned many useful patterns.

At the end:

Loss = 0.0043

The model fits the training data very well.


26. Training Loss Graph

The training loss graph shows how the loss changes over epochs.

The graph is saved using:

plt.savefig(
    "breast_cancer_training_loss.png",
    dpi=300,
    bbox_inches="tight"
)

Example:

Training Loss

The expected shape is:

high loss at the beginning
fast decrease in the first few epochs
slow decrease later
almost flat near the end

This is a healthy training curve.

It means the model learned quickly in the beginning and then slowly fine-tuned the weights.


27. Evaluation Mode

After training, we evaluate the model.

model.eval()

This tells PyTorch that the model is no longer training.

During evaluation:

no weight update
no dropout behavior
no training-specific behavior

Even though this model does not use dropout, using model.eval() is still good practice.


28. Disable Gradient Calculation

with torch.no_grad():

During evaluation, we do not need gradients.

We only need predictions.

Using torch.no_grad():

saves memory
makes prediction faster
prevents accidental gradient calculation

29. Sigmoid and Threshold

During evaluation:

probabilities = torch.sigmoid(logits)

This converts logits into probabilities.

Example:

logit = 5
sigmoid(logit) = 0.993

Then:

predictions = (
    probabilities > 0.5
).int()

This converts probability into class label.

probability > 0.5  → class 1
probability <= 0.5 → class 0

For this dataset:

class 1 = benign
class 0 = malignant

30. Accuracy

accuracy = accuracy_score(
    y_true,
    y_pred
)

Accuracy means:

correct predictions / total predictions

Example result:

Accuracy
0.956140350877193

This means the model predicted correctly about:

95.61% of the time

31. Classification Report

Example output:

Classification Report
              precision    recall  f1-score   support

         0.0       0.91      0.98      0.94        42
         1.0       0.99      0.94      0.96        72

    accuracy                           0.96       114
   macro avg       0.95      0.96      0.95       114
weighted avg       0.96      0.96      0.96       114

Class meaning:

0 = malignant
1 = benign

For class 0:

precision = 0.91
recall = 0.98
f1-score = 0.94
support = 42

This means there were 42 malignant samples in the test set.

The model detected almost all malignant cases because recall is 0.98.

This is important because missing malignant tumors would be dangerous.

For class 1:

precision = 0.99
recall = 0.94
f1-score = 0.96
support = 72

This means there were 72 benign samples in the test set.

The model also performed very well on benign cases.


32. Confusion Matrix

The confusion matrix is created using:

cm = confusion_matrix(
    y_true,
    y_pred
)

Then plotted using seaborn:

sns.heatmap(
    cm,
    annot=True,
    fmt='d',
    cmap='Blues'
)

Example:

Confusion Matrix

A confusion matrix shows:

what the true class was
what the model predicted

For binary classification, it looks like:

 Predicted 0Predicted 1
Actual 0Correct malignantMalignant predicted as benign
Actual 1Benign predicted as malignantCorrect benign

This is more useful than accuracy alone because it shows exactly where the model makes mistakes.


33. ROC Curve

The ROC curve is created using:

fpr, tpr, thresholds = roc_curve(
    y_true,
    y_prob
)

The AUC score is calculated using:

roc_auc = auc(
    fpr,
    tpr
)

Then the graph is saved using:

plt.savefig(
    "breast_cancer_roc_curve.png",
    dpi=300,
    bbox_inches="tight"
)

Example:

ROC Curve

The ROC curve shows how well the model separates the two classes.

A random model has an AUC around:

0.5

A perfect model has an AUC of:

1.0

Example result:

AUC Score
0.9920634920634921

This is very high.

It means the model is very good at separating malignant and benign samples.


34. Baseline Model

A baseline model is a simple model used for comparison.

baseline = DummyClassifier(
    strategy="most_frequent"
)

This model always predicts the most common class.

In this dataset, the most common class is benign.

So the baseline model predicts:

everything = benign

The baseline result is:

Baseline Accuracy
0.631578947368421

This means if we always predict the most common class, we get about:

63.16% accuracy

The PyTorch neural network result is:

Neural Network Accuracy
0.956140350877193

This means the neural network performs much better than the baseline.


35. Saving the Model

After training, we save the model:

torch.save(
    model.state_dict(),
    "breast_cancer_model.pth"
)

This saves the learned weights.

It does not save the entire Python class.

That is why, when loading the model later, we must define the same model architecture again.

The saved file is:

breast_cancer_model.pth

This file contains the knowledge learned by the model.


36. Prediction File: predict.py

After training, we can use the saved model to predict unknown data.

Full code:

import torch
import torch.nn as nn
import joblib
import pandas as pd

# =============================================================================
# MODEL DEFINITION
# =============================================================================

class BreastCancerModel(nn.Module):

    def __init__(self, input_size):

        super().__init__()

        self.network = nn.Sequential(

            nn.Linear(input_size, 32),
            nn.ReLU(),

            nn.Linear(32, 16),
            nn.ReLU(),

            nn.Linear(16, 1)

        )

    def forward(self, x):
        return self.network(x)

# =============================================================================
# LOAD SCALER
# =============================================================================

scaler = joblib.load(
    "scaler.pkl"
)

# =============================================================================
# LOAD MODEL
# =============================================================================

INPUT_SIZE = 30

model = BreastCancerModel(
    INPUT_SIZE
)

model.load_state_dict(
    torch.load(
        "breast_cancer_model.pth",
        map_location="cpu"
    )
)

model.eval()

print("Model loaded successfully!")

# =============================================================================
# UNKNOWN PATIENT DATA
# =============================================================================

new_patient = [

    14.2,
    18.5,
    92.1,
    650.0,
    0.10,
    0.12,
    0.09,
    0.05,
    0.18,
    0.06,

    0.40,
    1.20,
    2.50,
    30.0,
    0.006,
    0.020,
    0.030,
    0.010,
    0.020,
    0.003,

    16.0,
    25.0,
    105.0,
    850.0,
    0.14,
    0.30,
    0.40,
    0.15,
    0.30,
    0.08
]

# =============================================================================
# CREATE DATAFRAME WITH FEATURE NAMES
# =============================================================================

new_patient_df = pd.DataFrame(
    [new_patient],
    columns=scaler.feature_names_in_
)

# =============================================================================
# SCALE DATA
# =============================================================================

new_patient_scaled = scaler.transform(
    new_patient_df
)

# =============================================================================
# CONVERT TO TENSOR
# =============================================================================

tensor = torch.tensor(
    new_patient_scaled,
    dtype=torch.float32
)

# =============================================================================
# PREDICTION
# =============================================================================

with torch.no_grad():

    logits = model(tensor)

    probability = torch.sigmoid(logits)

    probability = probability.item()

# =============================================================================
# RESULT
# =============================================================================

print("\nPrediction Probability")
print(f"{probability:.4f}")

print(f"Benign Probability: {probability:.4f}")
print(f"Malignant Probability: {1 - probability:.4f}")

if probability > 0.5:

    print("Prediction: BENIGN")

else:

    print("Prediction: MALIGNANT")

37. Why Load the Scaler?

scaler = joblib.load(
    "scaler.pkl"
)

The model was trained using scaled data.

So unknown data must also be scaled before prediction.

If the model was trained on scaled data but we give it raw data, the prediction can become incorrect.

This is one of the most common mistakes in machine learning deployment.


38. Why Define the Model Again?

class BreastCancerModel(nn.Module):

When we save using:

torch.save(model.state_dict(), "breast_cancer_model.pth")

we only save the weights.

We do not save the class definition.

So in predict.py, we must recreate the same architecture:

30 → 32 → 16 → 1

Then we load the saved weights into it.


39. Loading the Model Weights

model.load_state_dict(
    torch.load(
        "breast_cancer_model.pth",
        map_location="cpu"
    )
)

This loads the learned weights into the model.

map_location="cpu" makes sure the model loads correctly even if it was trained on GPU but predicted on CPU.


40. Why Use a DataFrame for Unknown Data?

new_patient_df = pd.DataFrame(
    [new_patient],
    columns=scaler.feature_names_in_
)

This prevents the warning:

X does not have valid feature names, but StandardScaler was fitted with feature names

The scaler was fitted using a DataFrame with column names.

So during prediction, we also provide a DataFrame with the same feature names.


41. Scale Unknown Data

new_patient_scaled = scaler.transform(
    new_patient_df
)

We use transform() instead of fit_transform().

During prediction, we do not calculate a new mean and standard deviation.

We use the saved mean and standard deviation from training.


42. Make Prediction

logits = model(tensor)

probability = torch.sigmoid(logits)

probability = probability.item()

The model first gives a logit.

Then sigmoid converts the logit into a probability.

Then .item() converts the one-value tensor into a normal Python number.


43. Interpreting the Prediction

Example output:

Model loaded successfully!

Prediction Probability
0.0035

Benign Probability: 0.0035
Malignant Probability: 0.9965

Prediction: MALIGNANT

Because:

probability = 0.0035

and the target meaning is:

1 = benign
0 = malignant

This means:

0.35% probability benign
99.65% probability malignant

Since the probability is below 0.5, the model predicts:

MALIGNANT

44. Important Warning

The new_patient values in this example are manually created.

So the prediction is only for testing the code.

In a real medical system, the values should come from real measurements.

This model is for learning purposes only.

It should not be used for real medical diagnosis.


45. Final Results

From one training run, the model produced:

MetricValue
Accuracy95.61%
AUC0.9921
Baseline Accuracy63.16%
Neural Network Accuracy95.61%

Classification report:

ClassPrecisionRecallF1-scoreSupport
0 Malignant0.910.980.9442
1 Benign0.990.940.9672

The result is strong because the neural network performs much better than the baseline model.


46. What I Learned From This Project

This project helped me understand the complete PyTorch binary classification workflow.

Important lessons:

1. Data must be prepared before training.
2. Input features should be scaled.
3. Train/test split is needed for fair evaluation.
4. PyTorch models need tensors.
5. DataLoader helps train using mini-batches.
6. BCEWithLogitsLoss is suitable for binary classification.
7. Sigmoid is used during prediction to convert logits into probabilities.
8. Accuracy alone is not enough.
9. Precision, recall, F1-score, confusion matrix, and ROC curve give better understanding.
10. The trained model and scaler must both be saved for future prediction.

47. Limitations

This project is useful for learning, but it has limitations.

First, the dataset is small.

569 samples

Second, the model architecture is simple.

30 → 32 → 16 → 1

Third, I did not perform hyperparameter tuning.

For example, I did not test many values of:

learning rate
batch size
number of hidden layers
number of neurons
dropout rate

Fourth, this model should not be used for real medical diagnosis.

It is only a learning project.


48. Future Improvements

Possible improvements include:

1. Add dropout to reduce overfitting.
2. Add validation set.
3. Use early stopping.
4. Try different learning rates.
5. Try different batch sizes.
6. Compare with Logistic Regression.
7. Compare with Random Forest.
8. Use cross-validation.
9. Deploy the model using Flask or FastAPI.
10. Build a simple web interface for prediction.

49. Conclusion

In this project, I built a complete binary classification model using PyTorch.

The project started from data preparation and ended with model saving and unknown data prediction.

The model learned to classify breast tumors as malignant or benign based on 30 input features.

The final model achieved around:

95.61% accuracy
0.9921 AUC score

The baseline accuracy was only around:

63.16%

So the neural network learned useful patterns from the data.

This project is an important step after linear regression because it introduces many real machine learning workflow concepts, including:

classification
train/test split
mini-batch training
loss curves
confusion matrix
ROC curve
AUC
model saving
prediction on unknown data

This makes the project a strong beginner-friendly example for learning PyTorch binary classification.