Difference between PCA and KPCA

April 20, 2022

Difference between PCA and KPCA

Figure: Principal Component Analysis

Figure: Kernel Principal Component Analysis

PRINCIPAL COMPONENT ANALYSIS

The transformed data obtained with PCA are called as Principal components.

This is obtained by calculating covariance between combination of variable.

ie, by performing covariance matrix.

Covariance matrix

[[Covariance(a, a), Covariance(a, b), Covariance(a, c), Covariance(a, d)],

[Covariance(b, a), Covariance(b, b), Covariance(b, c), Covariance(b, d)],

[Covariance(c, a), Covariance(c, b), Covariance(c, c), Covariance(c, d)],

[Covariance(d, a), Covariance(d, b), Covariance(d, c), Covariance(d, d)]]

We'll get N principal components from N-dimensional data. We might conclude

that the data size will not change after PCA, but the starting variables

will contain more information than the following variables. We can select

the P number of principal components from this based on

requirement. This allows us to minimize the data size while retaining as

much information as possible. This is done with the help of decision boundary.

Figure: Data distribution (linearly separable data)

Figure: Decision boundary in PCA

Figure: n principal components in PCA

The first line (blue line) is the one that has maximum variation and thus

the first principal component contains maximum information.

The second line (pink line) , on the other hand, has a bit less variance

than the first, and the information presented here is likewise a

little less. It is important to remember that the second line will be

unrelated to the first. This procedure is repeated until the desired number

of primary components has been reached (P).

(Note: P < N, P- dimension of data to be reduced ,

N is dimension of original data)

So, how this is done? This role is played by Eigen values and Eigen vectors.

The Eigen vectors contains the information about the direction of axes

which we call Principal components.

Eigen values contain the information about the amount of variance in each

principal component.

By ranking your eigenvectors in order of their eigenvalues, highest to

lowest, you get the principal components in order of significance.

Figure: Rank eigen vectors in order of their eigen values

KERNEL PRINCIPAL COMPONENT ANALYSIS

The main difference between PCA and KPCA is that, PCA can work well only in

linearly separable data.

For instance, we can say that PCA can only find the

best axis or decision boundary at which the data has maximum variation in a

linearly separable data; Whereas KPCA can work well in non-linearly separable

data too.

There is also another major difference, i.e. the covariance matrix in the

PCA is replaced with kernel matrix in KPCA.

If our objective is to perform classification or clustering of data,

then we can say that PCA is effective only if the data is linear.

If the data is not linearly separable like below figure, then we can go for

Kernel PCA.

Figure: Non linear data distribution

Kernel PCA finds boundaries even if they are non linear functions

by mapping them into higher dimensional space.

Okay, How does this work ? Let's take a look at the below figure,

Application of kernel will map the data into higher dimensional space, which thus

makes the data linearly separable, but in a higher dimension.

Figure: Data projected into higher dimension

Figure: Decision boundary in a non-linear data

This can be called as kernel trick.

We can calculate the Eigen value and Eigen vector directly

from this kernel matrix.

Thus reduced p dimensional data can be obtained

Detailed explanation about PCA here

Have a project enquiry? Do contact us

Comments

Liu22 April 2022 at 23:40
Good
ReplyDelete
Replies
Anonymous28 April 2022 at 10:52
Neat explanation
ReplyDelete
Replies
Anonymous22 August 2022 at 00:03
Well explained
ReplyDelete
Replies

Add comment

Zariah Tech

Difference between PCA and KPCA

Comments

Post a Comment

Popular Posts

Data Classification using multi-class SVM in MATLAB

Advantages of using matlab

Kernel functions of Support Vector Machines in MATLAB

Python code for COOT Optimization

Principal component analysis explanation with python

Hyperparameter tuning of Support Vector Machine in MATLAB

Matlab code for Remora Optimization algorithm

Python code for African Vulture Optimization

ROC curve for multiclass classification