부동산 가격 예측하기¶

In [16]:

pip install lightgbm

Collecting lightgbm
  Downloading lightgbm-4.1.0-py3-none-win_amd64.whl (1.3 MB)
     ---------------------------------------- 1.3/1.3 MB 3.1 MB/s eta 0:00:00
Requirement already satisfied: numpy in c:\users\sjy99\appdata\local\programs\python\python311\lib\site-packages (from lightgbm) (1.23.5)
Requirement already satisfied: scipy in c:\users\sjy99\appdata\local\programs\python\python311\lib\site-packages (from lightgbm) (1.9.3)
Installing collected packages: lightgbm
Successfully installed lightgbm-4.1.0
Note: you may need to restart the kernel to use updated packages.

[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip

In [4]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


np.random.seed(2021)

1. Data¶

이번 실습에서 사용할 데이터는 california 집 값을 예측하는 데이터입니다.

1.1 Data Load¶

데이터는 sklearn.datasets의 fetch_california_housing를 통해 사용할 수 있습니다.

In [5]:

from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()

In [6]:

data, target = housing["data"], housing["target"]

1.2 Data EDA¶

In [7]:

pd.DataFrame(data, columns=housing["feature_names"]).describe()

Out[7]:

	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude
count	20640.000000	20640.000000	20640.000000	20640.000000	20640.000000	20640.000000	20640.000000	20640.000000
mean	3.870671	28.639486	5.429000	1.096675	1425.476744	3.070655	35.631861	-119.569704
std	1.899822	12.585558	2.474173	0.473911	1132.462122	10.386050	2.135952	2.003532
min	0.499900	1.000000	0.846154	0.333333	3.000000	0.692308	32.540000	-124.350000
25%	2.563400	18.000000	4.440716	1.006079	787.000000	2.429741	33.930000	-121.800000
50%	3.534800	29.000000	5.229129	1.048780	1166.000000	2.818116	34.260000	-118.490000
75%	4.743250	37.000000	6.052381	1.099526	1725.000000	3.282261	37.710000	-118.010000
max	15.000100	52.000000	141.909091	34.066667	35682.000000	1243.333333	41.950000	-114.310000

In [8]:

pd.Series(target).describe()

Out[8]:

count    20640.000000
mean         2.068558
std          1.153956
min          0.149990
25%          1.196000
50%          1.797000
75%          2.647250
max          5.000010
dtype: float64

In [9]:

fig, axes = plt.subplots(nrows=2, ncols=4, figsize=(20, 10))
for i, feature_name in enumerate(housing["feature_names"]):
    ax = axes[i // 4, i % 4]
    ax.scatter(data[:, i], target)
    ax.set_xlabel(feature_name)
    ax.set_ylabel("price")

1.3 Data Split¶

In [10]:

from sklearn.model_selection import train_test_split

train_data, test_data, train_target, test_target = train_test_split(
    data, target, train_size=0.7, random_state=2021
)

2. XGBoost¶

In [13]:

pip install xgboost
import xgboost as xgb


xgb_reg = xgb.XGBRegressor()

  Cell In [13], line 1
    pip install xgboost
        ^
SyntaxError: invalid syntax

2.1 학습¶

In [ ]:

xgb_reg.fit(train_data, train_target)

[08:16:58] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

Out[ ]:

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)

2.2 예측¶

In [ ]:

xgb_train_pred = xgb_reg.predict(train_data)
xgb_test_pred = xgb_reg.predict(test_data)

In [ ]:

plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(xgb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(xgb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")

Out[ ]:

Text(0, 0.5, 'target')

2.3 평가¶

In [ ]:

from sklearn.metrics import mean_squared_error

xgb_train_mse = mean_squared_error(train_target, xgb_train_pred)
xgb_test_mse = mean_squared_error(test_target, xgb_test_pred)

In [ ]:

print(f"XGBoost Train MSE is {xgb_train_mse:.4f}")
print(f"XGBoost Test MSE is {xgb_test_mse:.4f}")

XGBoost Train MSE is 0.2598
XGBoost Test MSE is 0.2873

3. Light GBM¶

In [ ]:

import lightgbm as lgb

lgb_reg = lgb.LGBMRegressor()

3.1 학습¶

In [ ]:

lgb_reg.fit(train_data, train_target)

Out[ ]:

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

3.2 예측¶

In [ ]:

lgb_train_pred = lgb_reg.predict(train_data)
lgb_test_pred = lgb_reg.predict(test_data)

In [ ]:

plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(lgb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(lgb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")

Out[ ]:

Text(0, 0.5, 'target')

3.3 평가¶

In [ ]:

lgb_train_mse = mean_squared_error(train_target, lgb_train_pred)
lgb_test_mse = mean_squared_error(test_target, lgb_test_pred)

In [ ]:

print(f"Light Boost Train MSE is {lgb_train_mse:.4f}")
print(f"Light Boost Test MSE is {lgb_test_mse:.4f}")

Light Boost Train MSE is 0.1543
Light Boost Test MSE is 0.2098

4. CatBoost¶

In [ ]:

import catboost as cb

cb_reg = cb.CatBoostRegressor()

4.1 학습¶

In [ ]:

cb_reg.fit(train_data, train_target, verbose=False)

Out[ ]:

<catboost.core.CatBoostRegressor at 0x7f134f92c4d0>

4.2 예측¶

In [ ]:

cb_train_pred = cb_reg.predict(train_data)
cb_test_pred = cb_reg.predict(test_data)

In [ ]:

plt.figure(figsize=(14, 7))

plt.subplot(121)
plt.scatter(cb_train_pred, train_target)
plt.title("train data")
plt.xlabel("predict")
plt.ylabel("target")

plt.subplot(122)
plt.scatter(cb_test_pred, test_target)
plt.title("test data")
plt.xlabel("predict")
plt.ylabel("target")

Out[ ]:

Text(0, 0.5, 'target')

4.3 평가¶

In [ ]:

cb_train_mse = mean_squared_error(train_target, cb_train_pred)
cb_test_mse = mean_squared_error(test_target, cb_test_pred)

In [ ]:

print(f"Cat Boost Train MSE is {cb_train_mse:.4f}")
print(f"Cat Boost Test MSE is {cb_test_mse:.4f}")

Cat Boost Train MSE is 0.1147
Cat Boost Test MSE is 0.1927

5. 마무리¶

In [ ]:

print(f"XGBoost Test MSE is {xgb_test_mse:.4f}")
print(f"Light Boost Test MSE is {lgb_test_mse:.4f}")
print(f"Cat Boost Test MSE is {cb_test_mse:.4f}")

XGBoost Test MSE is 0.2873
Light Boost Test MSE is 0.2098
Cat Boost Test MSE is 0.1927

In [ ]:

샘플 데이터와 Stacking Classification (0)	2024.03.18
샘플 데이터와 Stacking Regression (0)	2024.03.18
Boosting Classification 심화 실습- 뉴스 분류 (0)	2024.03.18
샘플 데이터와 Boosting Classification (0)	2024.03.18
샘플 데이터와 Boosting Regression (0)	2024.03.18

Boosting Regression 심화 실습 - 부동산 가격 예측

부동산 가격 예측하기¶

1. Data¶

1.1 Data Load¶

1.2 Data EDA¶

1.3 Data Split¶

2. XGBoost¶

2.1 학습¶

2.2 예측¶

2.3 평가¶

3. Light GBM¶

3.1 학습¶

3.2 예측¶

3.3 평가¶

4. CatBoost¶

4.1 학습¶

4.2 예측¶

4.3 평가¶

5. 마무리¶

'Machine Learning > Boosting' 카테고리의 다른 글

티스토리툴바