Boosting Regression 심화 실습 - 부동산 가격 예측

2024. 3. 18. 19:00·Machine Learning/Boosting

 

 

 
 

부동산 가격 예측하기¶

 
In [16]:
pip install lightgbm
 
 
Collecting lightgbm
  Downloading lightgbm-4.1.0-py3-none-win_amd64.whl (1.3 MB)
     ---------------------------------------- 1.3/1.3 MB 3.1 MB/s eta 0:00:00
Requirement already satisfied: numpy in c:\users\sjy99\appdata\local\programs\python\python311\lib\site-packages (from lightgbm) (1.23.5)
Requirement already satisfied: scipy in c:\users\sjy99\appdata\local\programs\python\python311\lib\site-packages (from lightgbm) (1.9.3)
Installing collected packages: lightgbm
Successfully installed lightgbm-4.1.0
Note: you may need to restart the kernel to use updated packages.
 
[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip
 
In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(2021)
 
 

1. Data¶

 
 

이번 실습에서 사용할 데이터는 california 집 값을 예측하는 데이터입니다.

 
 

1.1 Data Load¶

 
 

데이터는 sklearn.datasets의 fetch_california_housing를 통해 사용할 수 있습니다.

 
In [5]:
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
 
In [6]:
data, target = housing["data"], housing["target"]
 
 

1.2 Data EDA¶

 
In [7]:
pd.DataFrame(data, columns=housing["feature_names"]).describe()
 
Out[7]:
  MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
count 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000 20640.000000
mean 3.870671 28.639486 5.429000 1.096675 1425.476744 3.070655 35.631861 -119.569704
std 1.899822 12.585558 2.474173 0.473911 1132.462122 10.386050 2.135952 2.003532
min 0.499900 1.000000 0.846154 0.333333 3.000000 0.692308 32.540000 -124.350000
25% 2.563400 18.000000 4.440716 1.006079 787.000000 2.429741 33.930000 -121.800000
50% 3.534800 29.000000 5.229129 1.048780 1166.000000 2.818116 34.260000 -118.490000
75% 4.743250 37.000000 6.052381 1.099526 1725.000000 3.282261 37.710000 -118.010000
max 15.000100 52.000000 141.909091 34.066667 35682.000000 1243.333333 41.950000 -114.310000
 
In [8]:
pd.Series(target).describe()
 
Out[8]:
count    20640.000000
mean         2.068558
std          1.153956
min          0.149990
25%          1.196000
50%          1.797000
75%          2.647250
max          5.000010
dtype: float64
 
In [9]:
fig, axes = plt.subplots(nrows=2, ncols=4, figsize=(20, 10))
for i, feature_name in enumerate(housing["feature_names"]):
    ax = axes[i // 4, i % 4]
    ax.scatter(data[:, i], target)
    ax.set_xlabel(feature_name)
    ax.set_ylabel("price")
 
 
 
 

1.3 Data Split¶

 
In [10]:
from sklearn.model_selection import train_test_split

train_data, test_data, train_target, test_target = train_test_split(
    data, target, train_size=0.7, random_state=2021
)
 
 

2. XGBoost¶

 
In [13]:
pip install xgboost
import xgboost as xgb


xgb_reg = xgb.XGBRegressor()
 
 
  Cell In [13], line 1
    pip install xgboost
        ^
SyntaxError: invalid syntax
 
 

2.1 학습¶

 
In [ ]:
xgb_reg.fit(train_data, train_target)
 
 
[08:16:58] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Out[ ]:
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)
 
 

2.2 예측¶

 
In [ ]:
xgb_train_pred = xgb_reg.predict(train_data)
xgb_test_pred = xgb_reg.predict(test_data)
 
In [ ]:
plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(xgb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(xgb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")
 
Out[ ]:
Text(0, 0.5, 'target')
 
 
 

2.3 평가¶

 
In [ ]:
from sklearn.metrics import mean_squared_error

xgb_train_mse = mean_squared_error(train_target, xgb_train_pred)
xgb_test_mse = mean_squared_error(test_target, xgb_test_pred)
 
In [ ]:
print(f"XGBoost Train MSE is {xgb_train_mse:.4f}")
print(f"XGBoost Test MSE is {xgb_test_mse:.4f}")
 
 
XGBoost Train MSE is 0.2598
XGBoost Test MSE is 0.2873
 
 

3. Light GBM¶

 
In [ ]:
import lightgbm as lgb

lgb_reg = lgb.LGBMRegressor()
 
 

3.1 학습¶

 
In [ ]:
lgb_reg.fit(train_data, train_target)
 
Out[ ]:
LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
 
 

3.2 예측¶

 
In [ ]:
lgb_train_pred = lgb_reg.predict(train_data)
lgb_test_pred = lgb_reg.predict(test_data)
 
In [ ]:
plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(lgb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(lgb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")
 
Out[ ]:
Text(0, 0.5, 'target')
 
 
 

3.3 평가¶

 
In [ ]:
lgb_train_mse = mean_squared_error(train_target, lgb_train_pred)
lgb_test_mse = mean_squared_error(test_target, lgb_test_pred)
 
In [ ]:
print(f"Light Boost Train MSE is {lgb_train_mse:.4f}")
print(f"Light Boost Test MSE is {lgb_test_mse:.4f}")
 
 
Light Boost Train MSE is 0.1543
Light Boost Test MSE is 0.2098
 
 

4. CatBoost¶

 
In [ ]:
import catboost as cb


cb_reg = cb.CatBoostRegressor()
 
 

4.1 학습¶

 
In [ ]:
cb_reg.fit(train_data, train_target, verbose=False)
 
Out[ ]:
<catboost.core.CatBoostRegressor at 0x7f134f92c4d0>
 
 

4.2 예측¶

 
In [ ]:
cb_train_pred = cb_reg.predict(train_data)
cb_test_pred = cb_reg.predict(test_data)
 
In [ ]:
plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(cb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(cb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")
 
Out[ ]:
Text(0, 0.5, 'target')
 
 
 

4.3 평가¶

 
In [ ]:
cb_train_mse = mean_squared_error(train_target, cb_train_pred)
cb_test_mse = mean_squared_error(test_target, cb_test_pred)
 
In [ ]:
print(f"Cat Boost Train MSE is {cb_train_mse:.4f}")
print(f"Cat Boost Test MSE is {cb_test_mse:.4f}")
 
 
Cat Boost Train MSE is 0.1147
Cat Boost Test MSE is 0.1927
 
 

5. 마무리¶

 
In [ ]:
print(f"XGBoost Test MSE is {xgb_test_mse:.4f}")
print(f"Light Boost Test MSE is {lgb_test_mse:.4f}")
print(f"Cat Boost Test MSE is {cb_test_mse:.4f}")
 
 
XGBoost Test MSE is 0.2873
Light Boost Test MSE is 0.2098
Cat Boost Test MSE is 0.1927
 
In [ ]:
 

'Machine Learning > Boosting' 카테고리의 다른 글

샘플 데이터와 Stacking Classification  (0) 2024.03.18
샘플 데이터와 Stacking Regression  (0) 2024.03.18
Boosting Classification 심화 실습- 뉴스 분류  (0) 2024.03.18
샘플 데이터와 Boosting Classification  (0) 2024.03.18
샘플 데이터와 Boosting Regression  (0) 2024.03.18
'Machine Learning/Boosting' 카테고리의 다른 글
  • 샘플 데이터와 Stacking Regression
  • Boosting Classification 심화 실습- 뉴스 분류
  • 샘플 데이터와 Boosting Classification
  • 샘플 데이터와 Boosting Regression
Juson
Juson
  • Juson
    Juson의 데이터 공부
    Juson
  • 전체
    오늘
    어제
    • 분류 전체보기 (95)
      • RAG (2)
      • AI (2)
        • NLP (0)
        • Generative Model (0)
        • Deep Reinforcement Learning (2)
        • LLM (0)
      • Logistic Optimization (0)
      • Machine Learning (37)
        • Linear Regression (2)
        • Logistic Regression (2)
        • Decision Tree (5)
        • Naive Bayes (1)
        • KNN (2)
        • SVM (2)
        • Clustering (4)
        • Dimension Reduction (3)
        • Boosting (6)
        • Abnomaly Detection (2)
        • Recommendation (4)
        • Embedding & NLP (4)
      • Reinforcement Learning (5)
      • Deep Learning (10)
        • Deep learning Bacis Mathema.. (10)
      • Optimization (2)
        • OR Optimization (0)
        • Convex Optimization (0)
        • Integer Optimization (0)
      • SNA 분석 (0)
      • 포트폴리오 최적화 공부 (0)
        • 최적화 기법 (0)
        • 금융 베이스 (0)
      • Finanancial engineering (0)
      • 프로그래머스 데브코스(Boot camp) (15)
        • SQL (9)
        • Python (5)
        • Machine Learning (1)
      • Python (22)
      • Project (0)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.4
Juson
Boosting Regression 심화 실습 - 부동산 가격 예측
상단으로

티스토리툴바