샘플 데이터와 Boosting Regression

2024. 3. 18. 18:59·Machine Learning/Boosting

 

 

 
 

샘플 데이터와 Boosting Regression¶

 
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(2021)
 
 

1. Data¶

 
 

1.1 Sample Data¶

 
 

실습에서 사용할 데이터를 생성해보겠습니다.

 
In [2]:
data = np.linspace(0, 6, 150)[:, np.newaxis]

label = np.sin(data).ravel() + np.sin(6 * data).ravel()
noise = np.random.normal(data.shape[0]) * 0.01
label += noise
 
In [3]:
plt.scatter(data, label)
 
Out[3]:
<matplotlib.collections.PathCollection at 0x23226cc0590>
 
 
 

1.2 Data Split¶

 
 

시계열 데이터라고 가정하고 데이터를 나누겠습니다.

 
In [4]:
train_size = 125
train_data, test_data = data[:train_size], data[train_size:]
train_label, test_label = label[:train_size], label[train_size:]
 
In [5]:
plt.scatter(train_data, train_label)
plt.scatter(test_data, test_label, color="C1")
 
Out[5]:
<matplotlib.collections.PathCollection at 0x23226d1b6d0>
 
 
 

2. Decision Tree¶

 
 

우선 기본적인 Decision Tree를 학습후 결과를 비교해 보겠습니다.

 
In [6]:
from sklearn.tree import DecisionTreeRegressor


tree = DecisionTreeRegressor(max_depth=2)
 
 

2.1 학습¶

 
In [7]:
tree.fit(train_data, train_label)
 
Out[7]:
DecisionTreeRegressor(max_depth=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeRegressor(max_depth=2)
 
 

2.2 예측¶

 
In [8]:
tree_train_pred = tree.predict(train_data)
tree_test_pred = tree.predict(test_data)
 
 

2.3 평가¶

 
In [9]:
from sklearn.metrics import mean_squared_error

tree_train_mse = mean_squared_error(train_label, tree_train_pred)
tree_test_mse = mean_squared_error(test_label, tree_test_pred)
 
In [10]:
print(f"Tree mean squared error is {tree_train_mse:.4f}")
print(f"Tree mean squared error is {tree_test_mse:.4f}")
 
 
Tree mean squared error is 0.3669
Tree mean squared error is 1.8188
 
 

2.4 시각화¶

 
In [11]:
plt.scatter(data, label)
plt.plot(train_data, tree_train_pred)
plt.plot(test_data, tree_test_pred)
 
Out[11]:
[<matplotlib.lines.Line2D at 0x232494f0710>]
 
 
 

3. AdaBoost¶

 
 

다음은 AdaBoost를 학습해 보겠습니다. AdaBoost는 sklearn.ensemble의 AdaBoostRegressor로 생성할 수 있습니다.
다만 다른 모델들과는 다르게 base_estimator를 선언해주어야 합니다.
가장 간단한 if else로 데이터가 분류 될 수 있도록 depth가 1인 tree로 base estimator로 만들겠습니다.

 
In [12]:
from sklearn.ensemble import AdaBoostRegressor


ada_boost = AdaBoostRegressor(DecisionTreeRegressor(max_depth=1))
 
 

3.1 학습¶

 
In [13]:
ada_boost.fit(train_data, train_label)
 
Out[13]:
AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=1))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=1))
DecisionTreeRegressor(max_depth=1)
DecisionTreeRegressor(max_depth=1)
 
 

3.2 예측¶

 
In [14]:
ada_boost_train_pred = ada_boost.predict(train_data)
ada_boost_test_pred = ada_boost.predict(test_data)
 
 

3.3 평가¶

 
In [15]:
ada_boost_train_mse = mean_squared_error(train_label, ada_boost_train_pred)
ada_boost_test_mse = mean_squared_error(test_label, ada_boost_test_pred)
 
In [16]:
print(f"Ada Boost Train mean squared error is {ada_boost_train_mse:.4f}")
print(f"Ada Boost Test mean squared error is {ada_boost_test_mse:.4f}")
 
 
Ada Boost Train mean squared error is 0.4615
Ada Boost Test mean squared error is 0.5289
 
 

3.4 시각화¶

 
In [17]:
plt.scatter(data, label)
plt.plot(train_data, ada_boost_train_pred)
plt.plot(test_data, ada_boost_test_pred)
 
Out[17]:
[<matplotlib.lines.Line2D at 0x232496954d0>]
 
 
 

4. GradientBoost¶

 
 

다음은 Gradient Boost입니다.
Gradient Boost는 sklearn.ensemble 의 GradientBoostingRegressor로 생성할 수 있습니다.
Gradient Boost또한 간단한 if else로 만들 수 있도록 max_depth를 1로 주겠습니다.

 
In [18]:
from sklearn.ensemble import GradientBoostingRegressor

grad_boost = GradientBoostingRegressor(max_depth=1)
 
 

4.1 학습¶

 
In [19]:
grad_boost.fit(train_data, train_label)
 
Out[19]:
GradientBoostingRegressor(max_depth=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
GradientBoostingRegressor(max_depth=1)
 
 

4.2 예측¶

 
In [20]:
grad_boost_train_pred = grad_boost.predict(train_data)
grad_boost_test_pred = grad_boost.predict(test_data)
 
 

4.3 평가¶

 
In [21]:
grad_boost_train_mse = mean_squared_error(train_label, grad_boost_train_pred)
grad_boost_test_mse = mean_squared_error(test_label, grad_boost_test_pred)
 
In [22]:
print(f"Gradient Boost Train mean squared error is {grad_boost_train_mse:.4f}")
print(f"Gradient Boost Test mean squared error is {grad_boost_test_mse:.4f}")
 
 
Gradient Boost Train mean squared error is 0.2767
Gradient Boost Test mean squared error is 1.3215
 
 

4.4 시각화¶

 
In [23]:
plt.scatter(data, label)
plt.plot(train_data, grad_boost_train_pred)
plt.plot(test_data, grad_boost_test_pred)
 
Out[23]:
[<matplotlib.lines.Line2D at 0x232498dbd90>]
 
 
 

5. 마무리¶

 
In [24]:
print(f"Tree train mean squared error is {tree_train_mse:.4f}")
print(f"Ada Boost train mean squared error is {ada_boost_train_mse:.4f}")
print(f"Gradient Boost train mean squared error is {grad_boost_train_mse:.4f}")
 
 
Tree train mean squared error is 0.3669
Ada Boost train mean squared error is 0.4615
Gradient Boost train mean squared error is 0.2767
 
In [25]:
print(f"Tree test mean squared error is {tree_test_mse:.4f}")
print(f"Ada Boost test mean squared error is {ada_boost_test_mse:.4f}")
print(f"Gradient Boost test mean squared error is {grad_boost_test_mse:.4f}")
 
 
Tree test mean squared error is 1.8188
Ada Boost test mean squared error is 0.5289
Gradient Boost test mean squared error is 1.3215
 
In [26]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
preds = [
    ("tree", tree_train_pred, tree_test_pred),
    ("Ada Boost", ada_boost_train_pred, ada_boost_test_pred),
    ("Gradient Boost", grad_boost_train_pred, grad_boost_test_pred)
]
for idx, (name, train_pred, test_pred) in enumerate(preds):
    ax = axes[idx]
    ax.scatter(data, label)
    ax.plot(train_data, train_pred)
    ax.plot(test_data, test_pred)
    ax.set_title(name)
 
 

'Machine Learning > Boosting' 카테고리의 다른 글

샘플 데이터와 Stacking Classification  (0) 2024.03.18
샘플 데이터와 Stacking Regression  (0) 2024.03.18
Boosting Classification 심화 실습- 뉴스 분류  (0) 2024.03.18
Boosting Regression 심화 실습 - 부동산 가격 예측  (0) 2024.03.18
샘플 데이터와 Boosting Classification  (0) 2024.03.18
'Machine Learning/Boosting' 카테고리의 다른 글
  • 샘플 데이터와 Stacking Regression
  • Boosting Classification 심화 실습- 뉴스 분류
  • Boosting Regression 심화 실습 - 부동산 가격 예측
  • 샘플 데이터와 Boosting Classification
Juson
Juson
  • Juson
    Juson의 데이터 공부
    Juson
  • 전체
    오늘
    어제
    • 분류 전체보기 (95)
      • RAG (2)
      • AI (2)
        • NLP (0)
        • Generative Model (0)
        • Deep Reinforcement Learning (2)
        • LLM (0)
      • Logistic Optimization (0)
      • Machine Learning (37)
        • Linear Regression (2)
        • Logistic Regression (2)
        • Decision Tree (5)
        • Naive Bayes (1)
        • KNN (2)
        • SVM (2)
        • Clustering (4)
        • Dimension Reduction (3)
        • Boosting (6)
        • Abnomaly Detection (2)
        • Recommendation (4)
        • Embedding & NLP (4)
      • Reinforcement Learning (5)
      • Deep Learning (10)
        • Deep learning Bacis Mathema.. (10)
      • Optimization (2)
        • OR Optimization (0)
        • Convex Optimization (0)
        • Integer Optimization (0)
      • SNA 분석 (0)
      • 포트폴리오 최적화 공부 (0)
        • 최적화 기법 (0)
        • 금융 베이스 (0)
      • Finanancial engineering (0)
      • 프로그래머스 데브코스(Boot camp) (15)
        • SQL (9)
        • Python (5)
        • Machine Learning (1)
      • Python (22)
      • Project (0)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.4
Juson
샘플 데이터와 Boosting Regression
상단으로

티스토리툴바