1. 머신러닝 소개¶
2. 데이터 전처리¶
In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
df = sns.load_dataset('titanic')
- 상위 5개 데이터 확인
In [3]:
df.head()
Out[3]:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 | 1 | 0 | 7.2500 | S | Third | man | True | NaN | Southampton | no | False |
| 1 | 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26.0 | 0 | 0 | 7.9250 | S | Third | woman | False | NaN | Southampton | yes | True |
| 3 | 1 | 1 | female | 35.0 | 1 | 0 | 53.1000 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35.0 | 0 | 0 | 8.0500 | S | Third | man | True | NaN | Southampton | no | True |
| Variable | Definition | Key |
|---|---|---|
| survival | Survival | 0 = No, 1 = Yes |
| pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd |
| sex | Sex | |
| Age | Age in years | |
| sibsp | # of siblings / spouses aboard the Titanic | |
| parch | # of parents / children aboard the Titanic | |
| ticket | Ticket number | |
| fare | Passenger fare | |
| embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
- 데이터 요약
In [4]:
df.describe()
Out[4]:
| survived | pclass | age | sibsp | parch | fare | |
|---|---|---|---|---|---|---|
| count | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
| mean | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
| std | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
| min | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
| 50% | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
| 75% | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
| max | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
- 데이터 정보
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 survived 891 non-null int64 1 pclass 891 non-null int64 2 sex 891 non-null object 3 age 714 non-null float64 4 sibsp 891 non-null int64 5 parch 891 non-null int64 6 fare 891 non-null float64 7 embarked 889 non-null object 8 class 891 non-null category 9 who 891 non-null object 10 adult_male 891 non-null bool 11 deck 203 non-null category 12 embark_town 889 non-null object 13 alive 891 non-null object 14 alone 891 non-null bool dtypes: bool(2), category(2), float64(2), int64(4), object(5) memory usage: 80.6+ KB
- 문제 : 처음부터 800번까지의 데이터를 학습 데이터로 이용하고, 나머지 데이터를 테스트 데이터로 이용하여 모델간의 결과를 비교하여라.
In [6]:
train_df = df[:800]
test_df = df[800:]
In [7]:
print(len(train_df))
print(len(test_df))
800 91
- pclass 와 survived 의 관계(관계 있음)
In [8]:
train_df[['pclass', 'survived']].groupby(['pclass'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[8]:
| pclass | survived | |
|---|---|---|
| 0 | 1 | 0.615385 |
| 1 | 2 | 0.481928 |
| 2 | 3 | 0.246014 |
- sex 와 survived 의 관계(관계 있음)
In [9]:
train_df[["sex", "survived"]].groupby(['sex'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[9]:
| sex | survived | |
|---|---|---|
| 0 | female | 0.745583 |
| 1 | male | 0.187621 |
- parch 와 survived 의 관계(관계가 적음)
In [10]:
train_df[["parch", "survived"]].groupby(['parch'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[10]:
| parch | survived | |
|---|---|---|
| 2 | 2 | 0.527778 |
| 1 | 1 | 0.514851 |
| 3 | 3 | 0.500000 |
| 0 | 0 | 0.350163 |
| 5 | 5 | 0.250000 |
| 4 | 4 | 0.000000 |
| 6 | 6 | 0.000000 |
- sibsp 와 survived 의 관계(관계가 적음)
In [11]:
train_df[["sibsp", "survived"]].groupby(['sibsp'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[11]:
| sibsp | survived | |
|---|---|---|
| 1 | 1 | 0.518325 |
| 2 | 2 | 0.481481 |
| 0 | 0 | 0.348708 |
| 3 | 3 | 0.266667 |
| 4 | 4 | 0.200000 |
| 5 | 5 | 0.000000 |
| 6 | 8 | 0.000000 |
- age 와 survived 의 관계
In [12]:
sns.histplot(data = train_df, x = 'age', bins = 20, hue = 'survived')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f517d9f6a50>
In [ ]:
a = sns.FacetGrid(train_df, col='survived')
a.map(plt.hist, 'age', bins=20)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x7f46972b0f10>
- pclass 에 따른 age 별 survived 유무
In [13]:
a = sns.FacetGrid(train_df, col='survived', row='pclass')
a.map(plt.hist, 'age', bins=20)
Out[13]:
<seaborn.axisgrid.FacetGrid at 0x7f517cdea0d0>
- 필요없는 필드 삭제
In [14]:
names = train_df.columns
print(names)
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
'alive', 'alone'],
dtype='object')
In [15]:
train_df = train_df.drop(names[4:], axis = 1)
In [16]:
train_df.head()
Out[16]:
| survived | pclass | sex | age | |
|---|---|---|---|---|
| 0 | 0 | 3 | male | 22.0 |
| 1 | 1 | 1 | female | 38.0 |
| 2 | 1 | 3 | female | 26.0 |
| 3 | 1 | 1 | female | 35.0 |
| 4 | 0 | 3 | male | 35.0 |
In [17]:
test_df = test_df.drop(names[4:], axis = 1)
In [18]:
test_df.head()
Out[18]:
| survived | pclass | sex | age | |
|---|---|---|---|---|
| 800 | 0 | 2 | male | 34.00 |
| 801 | 1 | 2 | female | 31.00 |
| 802 | 1 | 1 | male | 11.00 |
| 803 | 1 | 3 | male | 0.42 |
| 804 | 1 | 3 | male | 27.00 |
- 결측값 확인
In [19]:
print(train_df.isnull().sum())
print(test_df.isnull().sum())
survived 0 pclass 0 sex 0 age 163 dtype: int64 survived 0 pclass 0 sex 0 age 14 dtype: int64
- age 평균으로 age 결측값 채우기
* 만약 pclass 별 age 의 평균으로 채우고 싶다면 아래 주석 처리된 코드 사용
In [ ]:
# train_df["age"] = train_df.groupby(['pclass']).age.transform(lambda x: x.fillna(x.mean()))
# test_df["age"] = test_df.groupby(['pclass']).age.transform(lambda x: x.fillna(x.mean()))
In [20]:
train_df.fillna(train_df.mean()[['age']], inplace = True)
test_df.fillna(test_df.mean()[['age']], inplace = True)
In [21]:
print(train_df.isnull().sum())
print(test_df.isnull().sum())
survived 0 pclass 0 sex 0 age 0 dtype: int64 survived 0 pclass 0 sex 0 age 0 dtype: int64
- 성별 인코딩
In [22]:
map_dict = {'female' : 0, 'male' : 1}
train_df['sex'] = train_df['sex'].map(map_dict).astype(int)
test_df['sex'] = test_df['sex'].map(map_dict).astype(int)
In [23]:
train_df.head()
Out[23]:
| survived | pclass | sex | age | |
|---|---|---|---|---|
| 0 | 0 | 3 | 1 | 22.0 |
| 1 | 1 | 1 | 0 | 38.0 |
| 2 | 1 | 3 | 0 | 26.0 |
| 3 | 1 | 1 | 0 | 35.0 |
| 4 | 0 | 3 | 1 | 35.0 |
- 나이 분류
In [24]:
def function1(x):
if x < 20:
return 1
elif x < 40:
return 2
elif x < 60:
return 3
else:
return 4
In [25]:
train_df['age'] = train_df['age'].apply(function1)
test_df['age'] = test_df['age'].apply(function1)
In [26]:
train_df.head()
Out[26]:
| survived | pclass | sex | age | |
|---|---|---|---|---|
| 0 | 0 | 3 | 1 | 2 |
| 1 | 1 | 1 | 0 | 2 |
| 2 | 1 | 3 | 0 | 2 |
| 3 | 1 | 1 | 0 | 2 |
| 4 | 0 | 3 | 1 | 2 |
3. 머신러닝 모델 구성 및 결과 검증¶
In [27]:
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
- 데이터 분류
In [28]:
X_train = train_df.drop(["survived"], axis=1)
Y_train = train_df["survived"]
X_test = test_df.drop("survived", axis=1)
Y_test = test_df["survived"]
In [29]:
X_train.head()
Out[29]:
| pclass | sex | age | |
|---|---|---|---|
| 0 | 3 | 1 | 2 |
| 1 | 1 | 0 | 2 |
| 2 | 3 | 0 | 2 |
| 3 | 1 | 0 | 2 |
| 4 | 3 | 1 | 2 |
In [30]:
Y_train.head()
Out[30]:
0 0 1 1 2 1 3 1 4 0 Name: survived, dtype: int64
- 모델 생성 및 학습(decision tree 사용)
In [31]:
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Out[31]:
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=None, splitter='best')
- 모델 정확도 검증
In [32]:
print(decision_tree.score(X_train, Y_train))
print(decision_tree.score(X_test, Y_test))
0.8 0.7692307692307693
- 실제값 예측값 비교 구현
In [33]:
Y_pred = decision_tree.predict(X_test)
print(Y_pred)
[0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0]
In [34]:
Y_test
Out[34]:
800 0
801 1
802 1
803 1
804 1
..
886 0
887 1
888 0
889 1
890 0
Name: survived, Length: 91, dtype: int64
In [35]:
len(Y_pred)
Out[35]:
91
In [36]:
len(Y_test)
Out[36]:
91
In [37]:
Y_test_list = list(Y_test)
In [38]:
Y_pred[0]
Out[38]:
0
In [39]:
Y_test_list[0]
Out[39]:
0
In [40]:
total = 0
for i in range(len(Y_pred)):
if Y_pred[i] == Y_test_list[i]:
total += 1
else:
pass
print(total)
print(total / len(Y_pred))
70 0.7692307692307693
- graphviz 를 이용한 tree 구조 시각화
In [41]:
from sklearn.tree import export_graphviz
export_graphviz(
decision_tree,
out_file = "titanic.dot",
feature_names = ['pclass', 'sex', 'age'],
class_names = ['Unsurvived','Survived'],
filled=True
)
import graphviz
f = open('titanic.dot')
dot_graph = f.read()
# 자원을 효율적으로 쓰기 위해서는 아래 주석 처리 된 코드 사용
#with open("titanic.dot") as f:
# dot_graph = f.read()
dot = graphviz.Source(dot_graph)
dot.format = 'png'
dot.render(filename = 'titanic_tree')
dot
Out[41]:
4. 다양한 머신러닝 기법¶
- 데이터 생성
In [42]:
import seaborn as sns
import pandas as pd
df = sns.load_dataset('titanic')
train_df = df[:800]
test_df = df[800:]
names = train_df.columns
train_df = train_df.drop(names[4:], axis = 1)
test_df = test_df.drop(names[4:], axis = 1)
train_df.fillna(train_df.mean()[['age']], inplace = True)
test_df.fillna(test_df.mean()[['age']], inplace = True)
map_dict = {'female' : 0, 'male' : 1}
train_df['sex'] = train_df['sex'].map(map_dict).astype(int)
test_df['sex'] = test_df['sex'].map(map_dict).astype(int)
def function1(x):
if x < 20:
return 1
elif x < 40:
return 2
elif x < 60:
return 3
else:
return 4
train_df['age'] = train_df['age'].apply(function1)
test_df['age'] = test_df['age'].apply(function1)
X_train = train_df.drop(["survived"], axis=1)
Y_train = train_df["survived"]
X_test = test_df.drop("survived", axis=1)
Y_test = test_df["survived"]
- 결정나무
In [43]:
from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
print(decision_tree.score(X_train, Y_train))
print(decision_tree.score(X_test, Y_test))
0.8 0.7692307692307693
- 배깅(랜덤 포레스트)
In [44]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
print(random_forest.score(X_train, Y_train))
print(random_forest.score(X_test, Y_test))
0.8 0.7912087912087912
- 부스팅(xgboost)
In [45]:
import xgboost as xgb
boosting_model = xgb.XGBClassifier(n_estimators = 100)
boosting_model.fit(X_train, Y_train)
print(boosting_model.score(X_train, Y_train))
print(boosting_model.score(X_test, Y_test))
0.79875 0.7802197802197802
5. 딥러닝 소개¶
6. numpy 를 이용한 행렬 연산¶
- 행렬 생성
In [46]:
import numpy as np
In [47]:
a = np.array([[1, 2], [3, 4]])
- 전치 행렬
In [48]:
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a.T)
[[1 4] [2 5] [3 6]]
- 행렬 차원 확인
In [49]:
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a.shape)
(2, 3)
- 행렬 형태 변경
In [50]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.reshape(a, (3, 2))
print(b.shape)
print(b)
(3, 2) [[1 2] [3 4] [5 6]]
- 배열간 사칙연산
In [51]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 3, 4], [5, 6, 7]])
print(a + b)
print(a + 3)
[[ 3 5 7] [ 9 11 13]] [[4 5 6] [7 8 9]]
In [52]:
print(a - b)
print(a - 2)
[[-1 -1 -1] [-1 -1 -1]] [[-1 0 1] [ 2 3 4]]
In [53]:
print(a * b)
print(a * 2)
[[ 2 6 12] [20 30 42]] [[ 2 4 6] [ 8 10 12]]
In [54]:
print(a / b)
print(a / 2)
[[0.5 0.66666667 0.75 ] [0.8 0.83333333 0.85714286]] [[0.5 1. 1.5] [2. 2.5 3. ]]
- 행렬의 형태가 다른 경우에는 불가
In [55]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 3], [5, 6]])
print(a + b)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-55-f6a6202d5151> in <module>() 2 b = np.array([[2, 3], [5, 6]]) 3 ----> 4 print(a + b) ValueError: operands could not be broadcast together with shapes (2,3) (2,2)
- 행렬의 곱셈
In [56]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.dot(a, b))
32
In [57]:
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[1, 2],
[3, 4],
[5, 6]])
print(np.dot(a, b))
[[22 28] [49 64]]
7. 딥러닝 모델 구성 및 결과 검증¶
- 런타임 -> 런타임 유형 변경
In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
In [2]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11493376/11490434 [==============================] - 0s 0us/step
In [3]:
print(len(x_train), len(y_train))
print(len(x_test), len(y_test))
60000 60000 10000 10000
In [4]:
x_train[0]
Out[4]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,
18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170,
253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253,
253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253,
253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253,
205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253,
90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253,
190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190,
253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35,
241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39,
148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221,
253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253,
253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253,
195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133,
11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0]], dtype=uint8)
In [5]:
x_train[0].shape
Out[5]:
(28, 28)
- 정규화
In [6]:
x_train, x_test = x_train / 255.0, x_test / 255.0
- 그림 그리기
In [7]:
plt.imshow(x_test[0])
Out[7]:
<matplotlib.image.AxesImage at 0x7fa7ccf4b490>
In [8]:
y_test[0]
Out[8]:
7
- 모델 작성
In [9]:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
- 모델 시각화
In [10]:
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= flatten (Flatten) (None, 784) 0 _________________________________________________________________ dense (Dense) (None, 128) 100480 _________________________________________________________________ dropout (Dropout) (None, 128) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0 _________________________________________________________________
- 모델 학습 및 평가
In [11]:
model.fit(x_train, y_train, epochs = 5)
Epoch 1/5 1875/1875 [==============================] - 6s 2ms/step - loss: 0.2972 - accuracy: 0.9134 Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1450 - accuracy: 0.9571 Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1076 - accuracy: 0.9671 Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0879 - accuracy: 0.9725 Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0728 - accuracy: 0.9777
Out[11]:
<tensorflow.python.keras.callbacks.History at 0x7fa7cdbbd550>
In [15]:
model.evaluate(x_test, y_test, verbose = 2)
313/313 - 0s - loss: 0.0764 - accuracy: 0.9766
Out[15]:
[0.07641138881444931, 0.9765999913215637]
8. 머신러닝 / 딥러닝의 한계¶
'Python' 카테고리의 다른 글
| python 복습(2) - 데이터 전처리 (0) | 2024.03.15 |
|---|---|
| python 복습(1) - 기초 (0) | 2024.03.15 |
| 19. Pandas_DataProcessingAndAnalysis_complete (0) | 2024.03.14 |
| 18. Introduction to GUI Programming with Tkinter (0) | 2024.03.14 |
| 17. Object-Oriented Programming (0) | 2024.03.14 |
