샘플 데이터와 Boosting Classification

2024. 3. 18. 19:00·Machine Learning/Boosting

 

 

 
 

샘플 데이터와 Boosting Classification¶

 
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(2021)
 
 

1. Data¶

 
 

1.1 Sample Data¶

 
 

실습에서 사용할 데이터를 생성해보겠습니다.

 
In [2]:
from sklearn.datasets import make_gaussian_quantiles


data_1, label_1 = make_gaussian_quantiles(
    cov=2, n_samples=200, n_features=2, n_classes=2, random_state=2021
)
data_2, label_2 = make_gaussian_quantiles(
    mean=(3, 3), cov=1.5, n_samples=300, n_features=2, n_classes=2, random_state=2021
)
 
In [3]:
data = np.concatenate((data_1, data_2))
label = np.concatenate((label_1, - label_2 + 1))
 
In [4]:
plt.scatter(data[:,0], data[:,1], c=label)
 
Out[4]:
<matplotlib.collections.PathCollection at 0x7f6195cea2d0>
 
 
 

1.2 Data Split¶

 
In [5]:
from sklearn.model_selection import train_test_split

train_data, test_data, train_label, test_label = train_test_split(
    data, label, train_size=0.7, random_state=2021
)
 
 

1.3 시각화 데이터¶

 
In [6]:
x_min, x_max = data[:, 0].min() - 1, data[:, 0].max() + 1
y_min, y_max = data[:, 1].min() - 1, data[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
 
 

2. Decision Tree¶

 
 

우선 기본적인 Decision Tree를 학습후 결과를 비교해 보겠습니다.

 
In [7]:
from sklearn.tree import DecisionTreeClassifier


tree = DecisionTreeClassifier(max_depth=2)
 
 

2.1 학습¶

 
In [8]:
tree.fit(train_data, train_label)
 
Out[8]:
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=2, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')
 
 

2.2 예측¶

 
In [9]:
tree_train_pred = tree.predict(train_data)
tree_test_pred = tree.predict(test_data)
 
 

2.3 평가¶

 
In [10]:
from sklearn.metrics import accuracy_score

tree_train_acc = accuracy_score(train_label, tree_train_pred)
tree_test_acc = accuracy_score(test_label, tree_test_pred)
 
In [11]:
print(f"Tree train accuray is {tree_train_acc:.4f}")
print(f"Tree test accuray is {tree_test_acc:.4f}")   
 
 
Tree train accuray is 0.7286
Tree test accuray is 0.6867
 
 

2.4 시각화¶

 
In [12]:
tree_Z = tree.predict(np.c_[xx.ravel(), yy.ravel()])
tree_Z = tree_Z.reshape(xx.shape)
 
In [13]:
plt.figure(figsize=(14, 7))
plt.subplot(121)
cs = plt.contourf(xx, yy, tree_Z, cmap=plt.cm.Paired)
plt.scatter(train_data[:,0], train_data[:,1], c=train_label)
plt.title("train data")

plt.subplot(122)
cs = plt.contourf(xx, yy, tree_Z, cmap=plt.cm.Paired)
plt.scatter(test_data[:,0], test_data[:,1], c=test_label)
plt.title("test data")
 
Out[13]:
Text(0.5, 1.0, 'test data')
 
 
 

3. AdaBoost¶

 
 

다음은 AdaBoost를 학습해 보겠습니다. AdaBoost는 sklearn.ensemble의 AdaBoostClassifier로 생성할 수 있습니다.
AdaBoostClassifier는 base_estimator를 선언해주어야 합니다.
가장 간단한 if else로 데이터가 분류 될 수 있도록 depth가 1인 tree로 base estimator로 만들겠습니다.

 
In [14]:
from sklearn.ensemble import AdaBoostClassifier


ada_boost = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1))
 
 

3.1 학습¶

 
In [15]:
ada_boost.fit(train_data, train_label)
 
Out[15]:
AdaBoostClassifier(algorithm='SAMME.R',
                   base_estimator=DecisionTreeClassifier(ccp_alpha=0.0,
                                                         class_weight=None,
                                                         criterion='gini',
                                                         max_depth=1,
                                                         max_features=None,
                                                         max_leaf_nodes=None,
                                                         min_impurity_decrease=0.0,
                                                         min_impurity_split=None,
                                                         min_samples_leaf=1,
                                                         min_samples_split=2,
                                                         min_weight_fraction_leaf=0.0,
                                                         presort='deprecated',
                                                         random_state=None,
                                                         splitter='best'),
                   learning_rate=1.0, n_estimators=50, random_state=None)
 
 

3.2 예측¶

 
In [16]:
ada_boost_train_pred = ada_boost.predict(train_data)
ada_boost_test_pred = ada_boost.predict(test_data)
 
 

3.3 평가¶

 
In [17]:
from sklearn.metrics import accuracy_score

ada_boost_train_acc = accuracy_score(train_label, ada_boost_train_pred)
ada_boost_test_acc = accuracy_score(test_label, ada_boost_test_pred)
 
In [18]:
print(f"Ada Boost train accuray is {ada_boost_train_acc:.4f}")
print(f"Ada Boost test accuray is {ada_boost_test_acc:.4f}")   
 
 
Ada Boost train accuray is 0.9486
Ada Boost test accuray is 0.8600
 
 

3.4 시각화¶

 
In [19]:
ada_boost_Z = ada_boost.predict(np.c_[xx.ravel(), yy.ravel()])
ada_boost_Z = ada_boost_Z.reshape(xx.shape)
 
In [20]:
plt.figure(figsize=(14, 7))

plt.subplot(121)
cs = plt.contourf(xx, yy, ada_boost_Z, cmap=plt.cm.Paired)
plt.scatter(train_data[:,0], train_data[:,1], c=train_label)
plt.title("train_data")

plt.subplot(122)
cs = plt.contourf(xx, yy, ada_boost_Z, cmap=plt.cm.Paired)
plt.scatter(test_data[:,0], test_data[:,1], c=test_label)
plt.title("test_data")
 
Out[20]:
Text(0.5, 1.0, 'test_data')
 
 
 

4. GradientBoost¶

 
 

다음은 Gradient Boost입니다.
Gradient Boost는 sklearn.ensemble 의 GradientBoostingClassifier로 생성할 수 있습니다.
Gradient Boost또한 간단한 if else로 만들 수 있도록 max_depth를 1로 주겠습니다.

 
In [21]:
from sklearn.ensemble import GradientBoostingClassifier

grad_boost = GradientBoostingClassifier(max_depth=1)
 
 

4.1 학습¶

 
In [22]:
grad_boost.fit(train_data, train_label)
 
Out[22]:
GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=1,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)
 
 

4.2 예측¶

 
In [23]:
grad_boost_train_pred = grad_boost.predict(train_data)
grad_boost_test_pred = grad_boost.predict(test_data)
 
 

4.3 평가¶

 
In [24]:
from sklearn.metrics import accuracy_score

grad_boost_train_acc = accuracy_score(train_label, grad_boost_train_pred)
grad_boost_test_acc = accuracy_score(test_label, grad_boost_test_pred)
 
In [25]:
print(f"Gradient Boost train accuray is {grad_boost_train_acc:.4f}")
print(f"Gradient Boost test accuray is {grad_boost_test_acc:.4f}")   
 
 
Gradient Boost train accuray is 0.8886
Gradient Boost test accuray is 0.8200
 
 

4.4 시각화¶

 
In [26]:
grad_boost_Z = grad_boost.predict(np.c_[xx.ravel(), yy.ravel()])
grad_boost_Z = grad_boost_Z.reshape(xx.shape)
 
In [27]:
plt.figure(figsize=(14, 7))

plt.subplot(121)
cs = plt.contourf(xx, yy, grad_boost_Z, cmap=plt.cm.Paired)
plt.scatter(train_data[:,0], train_data[:,1], c=train_label)
plt.title("train_data")

plt.subplot(122)
cs = plt.contourf(xx, yy, grad_boost_Z, cmap=plt.cm.Paired)
plt.scatter(test_data[:,0], test_data[:,1], c=test_label)
plt.title("test_data")
 
Out[27]:
Text(0.5, 1.0, 'test_data')
 
 
 

5. 마무리¶

 
In [28]:
print(f"Tree test accuray is {tree_test_acc:.4f}")
print(f"Gradient Boost test accuray is {grad_boost_test_acc:.4f}")
print(f"Ada Boost test accuray is {ada_boost_test_acc:.4f}")
 
 
Tree test accuray is 0.6867
Gradient Boost test accuray is 0.8200
Ada Boost test accuray is 0.8600
 
In [29]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
Z_name = [
    ("tree", tree_Z),
    ("Ada Boost", ada_boost_Z),
    ("Gradient Boost", grad_boost_Z)
]
for idx, (name, Z) in enumerate(Z_name):
    ax = axes[idx]
    ax.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    ax.scatter(train_data[:,0], train_data[:,1], c=train_label)
    ax.set_title(name)
 
 
 
In [ ]:
 

'Machine Learning > Boosting' 카테고리의 다른 글

샘플 데이터와 Stacking Classification  (0) 2024.03.18
샘플 데이터와 Stacking Regression  (0) 2024.03.18
Boosting Classification 심화 실습- 뉴스 분류  (0) 2024.03.18
Boosting Regression 심화 실습 - 부동산 가격 예측  (0) 2024.03.18
샘플 데이터와 Boosting Regression  (0) 2024.03.18
'Machine Learning/Boosting' 카테고리의 다른 글
  • 샘플 데이터와 Stacking Regression
  • Boosting Classification 심화 실습- 뉴스 분류
  • Boosting Regression 심화 실습 - 부동산 가격 예측
  • 샘플 데이터와 Boosting Regression
Juson
Juson
  • Juson
    Juson의 데이터 공부
    Juson
  • 전체
    오늘
    어제
    • 분류 전체보기 (95)
      • RAG (2)
      • AI (2)
        • NLP (0)
        • Generative Model (0)
        • Deep Reinforcement Learning (2)
        • LLM (0)
      • Logistic Optimization (0)
      • Machine Learning (37)
        • Linear Regression (2)
        • Logistic Regression (2)
        • Decision Tree (5)
        • Naive Bayes (1)
        • KNN (2)
        • SVM (2)
        • Clustering (4)
        • Dimension Reduction (3)
        • Boosting (6)
        • Abnomaly Detection (2)
        • Recommendation (4)
        • Embedding & NLP (4)
      • Reinforcement Learning (5)
      • Deep Learning (10)
        • Deep learning Bacis Mathema.. (10)
      • Optimization (2)
        • OR Optimization (0)
        • Convex Optimization (0)
        • Integer Optimization (0)
      • SNA 분석 (0)
      • 포트폴리오 최적화 공부 (0)
        • 최적화 기법 (0)
        • 금융 베이스 (0)
      • Finanancial engineering (0)
      • 프로그래머스 데브코스(Boot camp) (15)
        • SQL (9)
        • Python (5)
        • Machine Learning (1)
      • Python (22)
      • Project (0)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.4
Juson
샘플 데이터와 Boosting Classification
상단으로

티스토리툴바