Eigenface를 이용한 차원 축소와 SVM을 이용한 분류¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2021)
1. Data¶
1.1 Data Load¶
데이터는 sklearn.datasets의 fetch_lfw_people로 받을 수 있습니다.
In [2]:
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
In [3]:
data, target = faces["data"], faces["target"]
1.2 Data EDA¶
이미지의 height와 width를 확인하면 다음과 같습니다.
In [4]:
n_samples, h, w = faces.images.shape
In [5]:
n_samples, h, w
Out[5]:
(1288, 50, 37)
얼굴의 주인들의 이름을 확인해 보겠습니다.
In [6]:
target_names = faces.target_names
n_classes = target_names.shape[0]
In [7]:
target_names
Out[7]:
array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')
이미지를 실제로 확인해 보겠습니다.
In [8]:
samples = data[:10].reshape(10, h, w)
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
for idx, sample in enumerate(samples):
ax = axes[idx//5, idx%5]
ax.imshow(sample, cmap="gray")
ax.set_title(target_names[target[idx]])
1.3 Data Split¶
In [9]:
from sklearn.model_selection import train_test_split
train_data, test_data, train_target, test_target = train_test_split(
data, target, train_size=0.7, random_state=2021
)
In [10]:
print(f"train_data size: {len(train_target)}, {len(train_target)/len(data):.2f}")
print(f"test_data size: {len(test_target)}, {len(test_target)/len(data):.2f}")
train_data size: 901, 0.70
test_data size: 387, 0.30
1.4 Data Scaling¶
In [11]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
In [12]:
scaler.fit(train_data)
Out[12]:
StandardScaler(copy=True, with_mean=True, with_std=True)
In [13]:
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)
2. Eigenface¶
Eigenface란 PCA를 이용해 얼굴 사진을 축소하면 생기는 eigenvector가 얼굴 모양과 같다고 하여서 생긴 용어입니다.
직접 실습을 통해 Eigenface를 생성해 보겠습니다.
2.1 학습¶
PCA를 이용해 데이터를 압축하겠습니다.
In [14]:
from sklearn.decomposition import PCA
pca = PCA()
In [15]:
pca.fit(scaled_train_data)
Out[15]:
PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [16]:
plt.plot(pca.explained_variance_ratio_.cumsum())
plt.axhline(0.9, color="red", linestyle="--")
Out[16]:
<matplotlib.lines.Line2D at 0x7f936116ced0>
explained variance ratio가 0.9가 되는 지점의 component를 사용하겠습니다.
In [17]:
pca = PCA(n_components=0.9)
pca.fit(scaled_train_data)
Out[17]:
PCA(copy=True, iterated_power='auto', n_components=0.9, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [18]:
pca_train_data = pca.transform(scaled_train_data)
pca_test_data = pca.transform(scaled_test_data)
2.2 시각화¶
pca로 학습한 eigen vector를 시각화 해보겠습니다.
PCA를 통해 다음 eigen vector에 나오는 얼굴의 특징을 추출한다고 생각할 수 있습니다.
In [19]:
eigenfaces = pca.components_.reshape((pca.n_components_, h, w))
samples = eigenfaces[:10].reshape(10, h, w)
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
for idx, sample in enumerate(samples):
ax = axes[idx//5, idx%5]
ax.imshow(sample, cmap="gray")
ax.set_title(f"eigenface {idx}")
3. SVM¶
3.1 Raw Data¶
우선 앞선 SVM 실습에서 진행했던 Baseline의 결과를 보겠습니다.
In [21]:
from sklearn.svm import SVC
svm = SVC()
svm.fit(scaled_train_data, train_target)
Out[21]:
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
In [22]:
train_pred = svm.predict(scaled_train_data)
test_pred = svm.predict(scaled_test_data)
In [23]:
from sklearn.metrics import accuracy_score
train_acc = accuracy_score(train_target, train_pred)
test_acc = accuracy_score(test_target, test_pred)
In [24]:
print(f"train accuracy is {train_acc:.4f}")
print(f"test accuracy is {test_acc:.4f}")
train accuracy is 0.9567
test accuracy is 0.7339
3.2 Eigenface¶
이번에는 Eigenface로 추출된 특징만으로 SVM을 학습시킨 후 결과를 보겠습니다.
In [25]:
eigenface_svm = SVC()
eigenface_svm.fit(pca_train_data, train_target)
Out[25]:
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
In [26]:
pca_train_pred = eigenface_svm.predict(pca_train_data)
pca_test_pred = eigenface_svm.predict(pca_test_data)
In [27]:
pca_train_acc = accuracy_score(train_target, pca_train_pred)
pca_test_acc = accuracy_score(test_target, pca_test_pred)
In [28]:
print(f"Eigenface train accuracy is {pca_train_acc:.4f}")
print(f"Eigenface test accuracy is {pca_test_acc:.4f}")
Eigenface train accuracy is 0.9390
Eigenface test accuracy is 0.7364
4. 마무리¶
In [29]:
train_data.shape
Out[29]:
(901, 1850)
In [30]:
pca_train_data.shape
Out[30]:
(901, 72)
In [31]:
print(f"Baseline test accuracy is {test_acc:.4f}")
print(f"Eigenface test accuracy is {pca_test_acc:.4f}")
Baseline test accuracy is 0.7339
Eigenface test accuracy is 0.7364
In [ ]:
'Machine Learning > Dimension Reduction' 카테고리의 다른 글
| 차원 축소와 군집화 (0) | 2024.03.18 |
|---|---|
| 차원 축소 시각화 실습 (0) | 2024.03.18 |
