python 복습(4) - 머신러닝 딥러닝 맛보기

2024. 3. 15. 12:22·Python
05_한_걸음_더_나아가기

1. 머신러닝 소개¶

스크린샷 2021-06-26 오후 1.33.40.png

스크린샷 2021-06-26 오후 2.20.23.png

스크린샷 2021-06-26 오후 2.21.23.png

2. 데이터 전처리¶

In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
df = sns.load_dataset('titanic')

- 상위 5개 데이터 확인

In [3]:
df.head()
Out[3]:
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

- 데이터 요약

In [4]:
df.describe()
Out[4]:
survived pclass age sibsp parch fare
count 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

- 데이터 정보

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB

- 문제 : 처음부터 800번까지의 데이터를 학습 데이터로 이용하고, 나머지 데이터를 테스트 데이터로 이용하여 모델간의 결과를 비교하여라.

In [6]:
train_df = df[:800]
test_df = df[800:]
In [7]:
print(len(train_df))
print(len(test_df))
800
91

- pclass 와 survived 의 관계(관계 있음)

In [8]:
train_df[['pclass', 'survived']].groupby(['pclass'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[8]:
pclass survived
0 1 0.615385
1 2 0.481928
2 3 0.246014

- sex 와 survived 의 관계(관계 있음)

In [9]:
train_df[["sex", "survived"]].groupby(['sex'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[9]:
sex survived
0 female 0.745583
1 male 0.187621

- parch 와 survived 의 관계(관계가 적음)

In [10]:
train_df[["parch", "survived"]].groupby(['parch'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[10]:
parch survived
2 2 0.527778
1 1 0.514851
3 3 0.500000
0 0 0.350163
5 5 0.250000
4 4 0.000000
6 6 0.000000

- sibsp 와 survived 의 관계(관계가 적음)

In [11]:
train_df[["sibsp", "survived"]].groupby(['sibsp'], as_index=False).mean().sort_values(by='survived', ascending=False)
Out[11]:
sibsp survived
1 1 0.518325
2 2 0.481481
0 0 0.348708
3 3 0.266667
4 4 0.200000
5 5 0.000000
6 8 0.000000

- age 와 survived 의 관계

In [12]:
sns.histplot(data = train_df, x = 'age', bins = 20, hue = 'survived')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f517d9f6a50>
In [ ]:
a = sns.FacetGrid(train_df, col='survived')
a.map(plt.hist, 'age', bins=20)
Out[ ]:
<seaborn.axisgrid.FacetGrid at 0x7f46972b0f10>

- pclass 에 따른 age 별 survived 유무

In [13]:
a = sns.FacetGrid(train_df, col='survived', row='pclass')
a.map(plt.hist, 'age', bins=20)
Out[13]:
<seaborn.axisgrid.FacetGrid at 0x7f517cdea0d0>

- 필요없는 필드 삭제

In [14]:
names = train_df.columns
print(names)
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')
In [15]:
train_df = train_df.drop(names[4:], axis = 1)
In [16]:
train_df.head()
Out[16]:
survived pclass sex age
0 0 3 male 22.0
1 1 1 female 38.0
2 1 3 female 26.0
3 1 1 female 35.0
4 0 3 male 35.0
In [17]:
test_df = test_df.drop(names[4:], axis = 1)
In [18]:
test_df.head()
Out[18]:
survived pclass sex age
800 0 2 male 34.00
801 1 2 female 31.00
802 1 1 male 11.00
803 1 3 male 0.42
804 1 3 male 27.00

- 결측값 확인

In [19]:
print(train_df.isnull().sum())
print(test_df.isnull().sum())
survived      0
pclass        0
sex           0
age         163
dtype: int64
survived     0
pclass       0
sex          0
age         14
dtype: int64

- age 평균으로 age 결측값 채우기

* 만약 pclass 별 age 의 평균으로 채우고 싶다면 아래 주석 처리된 코드 사용

In [ ]:
# train_df["age"] = train_df.groupby(['pclass']).age.transform(lambda x: x.fillna(x.mean()))
# test_df["age"] = test_df.groupby(['pclass']).age.transform(lambda x: x.fillna(x.mean()))
In [20]:
train_df.fillna(train_df.mean()[['age']], inplace = True)
test_df.fillna(test_df.mean()[['age']], inplace = True)
In [21]:
print(train_df.isnull().sum())
print(test_df.isnull().sum())
survived    0
pclass      0
sex         0
age         0
dtype: int64
survived    0
pclass      0
sex         0
age         0
dtype: int64

- 성별 인코딩

In [22]:
map_dict = {'female' : 0, 'male' : 1}

train_df['sex'] = train_df['sex'].map(map_dict).astype(int)
test_df['sex'] = test_df['sex'].map(map_dict).astype(int)
In [23]:
train_df.head()
Out[23]:
survived pclass sex age
0 0 3 1 22.0
1 1 1 0 38.0
2 1 3 0 26.0
3 1 1 0 35.0
4 0 3 1 35.0

- 나이 분류

In [24]:
def function1(x):
    if x < 20:
        return 1
    elif x < 40:
        return 2
    elif x < 60:
        return 3
    else:
        return 4
In [25]:
train_df['age'] = train_df['age'].apply(function1)
test_df['age'] = test_df['age'].apply(function1)
In [26]:
train_df.head()
Out[26]:
survived pclass sex age
0 0 3 1 2
1 1 1 0 2
2 1 3 0 2
3 1 1 0 2
4 0 3 1 2

3. 머신러닝 모델 구성 및 결과 검증¶

In [27]:
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

- 데이터 분류

In [28]:
X_train = train_df.drop(["survived"], axis=1)
Y_train = train_df["survived"]
X_test  = test_df.drop("survived", axis=1)
Y_test = test_df["survived"]
In [29]:
X_train.head()
Out[29]:
pclass sex age
0 3 1 2
1 1 0 2
2 3 0 2
3 1 0 2
4 3 1 2
In [30]:
Y_train.head()
Out[30]:
0    0
1    1
2    1
3    1
4    0
Name: survived, dtype: int64

- 모델 생성 및 학습(decision tree 사용)

In [31]:
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Out[31]:
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

- 모델 정확도 검증

In [32]:
print(decision_tree.score(X_train, Y_train))
print(decision_tree.score(X_test, Y_test))
0.8
0.7692307692307693

- 실제값 예측값 비교 구현

In [33]:
Y_pred = decision_tree.predict(X_test)
print(Y_pred)
[0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0
 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 1 0 0
 1 1 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0]
In [34]:
Y_test
Out[34]:
800    0
801    1
802    1
803    1
804    1
      ..
886    0
887    1
888    0
889    1
890    0
Name: survived, Length: 91, dtype: int64
In [35]:
len(Y_pred)
Out[35]:
91
In [36]:
len(Y_test)
Out[36]:
91
In [37]:
Y_test_list = list(Y_test)
In [38]:
Y_pred[0]
Out[38]:
0
In [39]:
Y_test_list[0]
Out[39]:
0
In [40]:
total = 0
for i in range(len(Y_pred)):
    if Y_pred[i] == Y_test_list[i]:
        total += 1
    else:
        pass
print(total)
print(total / len(Y_pred))
70
0.7692307692307693

- graphviz 를 이용한 tree 구조 시각화

In [41]:
from sklearn.tree import export_graphviz

export_graphviz(
        decision_tree,
        out_file = "titanic.dot",
        feature_names = ['pclass', 'sex', 'age'],
        class_names = ['Unsurvived','Survived'],
        filled=True
    )

import graphviz
f = open('titanic.dot')
dot_graph = f.read()
# 자원을 효율적으로 쓰기 위해서는 아래 주석 처리 된 코드 사용
#with open("titanic.dot") as f:
#   dot_graph = f.read()
dot = graphviz.Source(dot_graph)
dot.format = 'png'
dot.render(filename = 'titanic_tree')
dot
Out[41]:
Tree 0 sex <= 0.5 gini = 0.474 samples = 800 value = [492, 308] class = Unsurvived 1 pclass <= 2.5 gini = 0.379 samples = 283 value = [72, 211] class = Survived 0->1 True 22 pclass <= 1.5 gini = 0.305 samples = 517 value = [420, 97] class = Unsurvived 0->22 False 2 pclass <= 1.5 gini = 0.1 samples = 152 value = [8, 144] class = Survived 1->2 15 age <= 2.5 gini = 0.5 samples = 131 value = [64, 67] class = Survived 1->15 3 age <= 1.5 gini = 0.07 samples = 82 value = [3, 79] class = Survived 2->3 10 age <= 1.5 gini = 0.133 samples = 70 value = [5, 65] class = Survived 2->10 4 gini = 0.153 samples = 12 value = [1, 11] class = Survived 3->4 5 age <= 2.5 gini = 0.056 samples = 70 value = [2, 68] class = Survived 3->5 6 gini = 0.041 samples = 48 value = [1, 47] class = Survived 5->6 7 age <= 3.5 gini = 0.087 samples = 22 value = [1, 21] class = Survived 5->7 8 gini = 0.095 samples = 20 value = [1, 19] class = Survived 7->8 9 gini = 0.0 samples = 2 value = [0, 2] class = Survived 7->9 11 gini = 0.0 samples = 16 value = [0, 16] class = Survived 10->11 12 age <= 2.5 gini = 0.168 samples = 54 value = [5, 49] class = Survived 10->12 13 gini = 0.18 samples = 40 value = [4, 36] class = Survived 12->13 14 gini = 0.133 samples = 14 value = [1, 13] class = Survived 12->14 16 age <= 1.5 gini = 0.496 samples = 121 value = [55, 66] class = Survived 15->16 19 age <= 3.5 gini = 0.18 samples = 10 value = [9, 1] class = Unsurvived 15->19 17 gini = 0.497 samples = 39 value = [18, 21] class = Survived 16->17 18 gini = 0.495 samples = 82 value = [37, 45] class = Survived 16->18 20 gini = 0.0 samples = 9 value = [9, 0] class = Unsurvived 19->20 21 gini = 0.0 samples = 1 value = [0, 1] class = Survived 19->21 23 age <= 2.5 gini = 0.462 samples = 113 value = [72, 41] class = Unsurvived 22->23 30 age <= 1.5 gini = 0.239 samples = 404 value = [348, 56] class = Unsurvived 22->30 24 age <= 1.5 gini = 0.493 samples = 61 value = [34, 27] class = Unsurvived 23->24 27 age <= 3.5 gini = 0.393 samples = 52 value = [38, 14] class = Unsurvived 23->27 25 gini = 0.5 samples = 6 value = [3, 3] class = Unsurvived 24->25 26 gini = 0.492 samples = 55 value = [31, 24] class = Unsurvived 24->26 28 gini = 0.432 samples = 38 value = [26, 12] class = Unsurvived 27->28 29 gini = 0.245 samples = 14 value = [12, 2] class = Unsurvived 27->29 31 pclass <= 2.5 gini = 0.378 samples = 71 value = [53, 18] class = Unsurvived 30->31 34 age <= 2.5 gini = 0.202 samples = 333 value = [295, 38] class = Unsurvived 30->34 32 gini = 0.5 samples = 16 value = [8, 8] class = Unsurvived 31->32 33 gini = 0.298 samples = 55 value = [45, 10] class = Unsurvived 31->33 35 pclass <= 2.5 gini = 0.212 samples = 282 value = [248, 34] class = Unsurvived 34->35 38 age <= 3.5 gini = 0.145 samples = 51 value = [47, 4] class = Unsurvived 34->38 36 gini = 0.155 samples = 59 value = [54, 5] class = Unsurvived 35->36 37 gini = 0.226 samples = 223 value = [194, 29] class = Unsurvived 35->37 39 pclass <= 2.5 gini = 0.127 samples = 44 value = [41, 3] class = Unsurvived 38->39 42 pclass <= 2.5 gini = 0.245 samples = 7 value = [6, 1] class = Unsurvived 38->42 40 gini = 0.111 samples = 17 value = [16, 1] class = Unsurvived 39->40 41 gini = 0.137 samples = 27 value = [25, 2] class = Unsurvived 39->41 43 gini = 0.375 samples = 4 value = [3, 1] class = Unsurvived 42->43 44 gini = 0.0 samples = 3 value = [3, 0] class = Unsurvived 42->44

4. 다양한 머신러닝 기법¶

- 데이터 생성

In [42]:
import seaborn as sns
import pandas as pd

df = sns.load_dataset('titanic')

train_df = df[:800]
test_df = df[800:]

names = train_df.columns
train_df = train_df.drop(names[4:], axis = 1)
test_df = test_df.drop(names[4:], axis = 1)

train_df.fillna(train_df.mean()[['age']], inplace = True)
test_df.fillna(test_df.mean()[['age']], inplace = True)

map_dict = {'female' : 0, 'male' : 1}

train_df['sex'] = train_df['sex'].map(map_dict).astype(int)
test_df['sex'] = test_df['sex'].map(map_dict).astype(int)

def function1(x):
    if x < 20:
        return 1
    elif x < 40:
        return 2
    elif x < 60:
        return 3
    else:
        return 4

train_df['age'] = train_df['age'].apply(function1)
test_df['age'] = test_df['age'].apply(function1)

X_train = train_df.drop(["survived"], axis=1)
Y_train = train_df["survived"]
X_test  = test_df.drop("survived", axis=1)
Y_test = test_df["survived"]

- 결정나무

In [43]:
from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)

print(decision_tree.score(X_train, Y_train))
print(decision_tree.score(X_test, Y_test))
0.8
0.7692307692307693

스크린샷 2021-06-26 오전 10.43.53.png

- 배깅(랜덤 포레스트)

스크린샷 2021-06-26 오전 10.44.13.png

In [44]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)
print(random_forest.score(X_train, Y_train))
print(random_forest.score(X_test, Y_test))
0.8
0.7912087912087912

- 부스팅(xgboost)

스크린샷 2021-06-26 오전 10.44.26.png

In [45]:
import xgboost as xgb
boosting_model = xgb.XGBClassifier(n_estimators = 100)
boosting_model.fit(X_train, Y_train)
print(boosting_model.score(X_train, Y_train))
print(boosting_model.score(X_test, Y_test))
0.79875
0.7802197802197802

5. 딥러닝 소개¶

20210626_155122.png

2.png

3.png

4.png

5.png

6. numpy 를 이용한 행렬 연산¶

- 행렬 생성

In [46]:
import numpy as np
In [47]:
a = np.array([[1, 2], [3, 4]])

스크린샷 2021-06-26 오후 12.07.36.png

- 전치 행렬

In [48]:
a = np.array([[1, 2, 3], [4, 5, 6]])

print(a.T)
[[1 4]
 [2 5]
 [3 6]]

스크린샷 2021-06-26 오후 12.14.34.png

- 행렬 차원 확인

In [49]:
a = np.array([[1, 2, 3], [4, 5, 6]])

print(a.shape)
(2, 3)

- 행렬 형태 변경

In [50]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.reshape(a, (3, 2))

print(b.shape)
print(b)
(3, 2)
[[1 2]
 [3 4]
 [5 6]]

- 배열간 사칙연산

In [51]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 3, 4], [5, 6, 7]])

print(a + b)
print(a + 3)
[[ 3  5  7]
 [ 9 11 13]]
[[4 5 6]
 [7 8 9]]
In [52]:
print(a - b)
print(a - 2)
[[-1 -1 -1]
 [-1 -1 -1]]
[[-1  0  1]
 [ 2  3  4]]
In [53]:
print(a * b)
print(a * 2)
[[ 2  6 12]
 [20 30 42]]
[[ 2  4  6]
 [ 8 10 12]]
In [54]:
print(a / b)
print(a / 2)
[[0.5        0.66666667 0.75      ]
 [0.8        0.83333333 0.85714286]]
[[0.5 1.  1.5]
 [2.  2.5 3. ]]

- 행렬의 형태가 다른 경우에는 불가

In [55]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[2, 3], [5, 6]])

print(a + b)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-f6a6202d5151> in <module>()
      2 b = np.array([[2, 3], [5, 6]])
      3 
----> 4 print(a + b)

ValueError: operands could not be broadcast together with shapes (2,3) (2,2) 

- 행렬의 곱셈

In [56]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(np.dot(a, b))
32

스크린샷 2021-06-26 오후 12.43.06.png

In [57]:
a = np.array([[1, 2, 3], 
              [4, 5, 6]])
b = np.array([[1, 2],
              [3, 4],
              [5, 6]])

print(np.dot(a, b))
[[22 28]
 [49 64]]

스크린샷 2021-06-26 오후 12.47.04.png

스크린샷 2021-06-26 오후 12.47.17.png

스크린샷 2021-06-26 오후 12.47.28.png

스크린샷 2021-06-26 오후 12.47.39.png

7. 딥러닝 모델 구성 및 결과 검증¶

- 런타임 -> 런타임 유형 변경

In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
In [2]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
In [3]:
print(len(x_train), len(y_train))
print(len(x_test), len(y_test))
60000 60000
10000 10000
In [4]:
x_train[0]
Out[4]:
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  30,  36,  94, 154, 170,
        253, 253, 253, 253, 253, 225, 172, 253, 242, 195,  64,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  49, 238, 253, 253, 253, 253,
        253, 253, 253, 253, 251,  93,  82,  82,  56,  39,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  18, 219, 253, 253, 253, 253,
        253, 198, 182, 247, 241,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  80, 156, 107, 253, 253,
        205,  11,   0,  43, 154,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  14,   1, 154, 253,
         90,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 139, 253,
        190,   2,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  11, 190,
        253,  70,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  35,
        241, 225, 160, 108,   1,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         81, 240, 253, 253, 119,  25,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  45, 186, 253, 253, 150,  27,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  16,  93, 252, 253, 187,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0, 249, 253, 249,  64,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  46, 130, 183, 253, 253, 207,   2,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  39,
        148, 229, 253, 253, 253, 250, 182,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  24, 114, 221,
        253, 253, 253, 253, 201,  78,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  23,  66, 213, 253, 253,
        253, 253, 198,  81,   2,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  18, 171, 219, 253, 253, 253, 253,
        195,  80,   9,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,  55, 172, 226, 253, 253, 253, 253, 244, 133,
         11,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0, 136, 253, 253, 253, 212, 135, 132,  16,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)
In [5]:
x_train[0].shape
Out[5]:
(28, 28)

- 정규화

In [6]:
x_train, x_test = x_train / 255.0, x_test / 255.0

- 그림 그리기

In [7]:
plt.imshow(x_test[0])
Out[7]:
<matplotlib.image.AxesImage at 0x7fa7ccf4b490>
In [8]:
y_test[0]
Out[8]:
7

- 모델 작성

In [9]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

- 모델 시각화

In [10]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

- 모델 학습 및 평가

In [11]:
model.fit(x_train, y_train, epochs = 5)
Epoch 1/5
1875/1875 [==============================] - 6s 2ms/step - loss: 0.2972 - accuracy: 0.9134
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1450 - accuracy: 0.9571
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1076 - accuracy: 0.9671
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0879 - accuracy: 0.9725
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0728 - accuracy: 0.9777
Out[11]:
<tensorflow.python.keras.callbacks.History at 0x7fa7cdbbd550>
In [15]:
model.evaluate(x_test,  y_test, verbose = 2)
313/313 - 0s - loss: 0.0764 - accuracy: 0.9766
Out[15]:
[0.07641138881444931, 0.9765999913215637]

8. 머신러닝 / 딥러닝의 한계¶

스크린샷 2021-06-26 오후 2.57.25.png

스크린샷 2021-06-26 오후 3.04.17.png

'Python' 카테고리의 다른 글

python 복습(2) - 데이터 전처리  (0) 2024.03.15
python 복습(1) - 기초  (0) 2024.03.15
19. Pandas_DataProcessingAndAnalysis_complete  (0) 2024.03.14
18. Introduction to GUI Programming with Tkinter  (0) 2024.03.14
17. Object-Oriented Programming  (0) 2024.03.14
'Python' 카테고리의 다른 글
  • python 복습(2) - 데이터 전처리
  • python 복습(1) - 기초
  • 19. Pandas_DataProcessingAndAnalysis_complete
  • 18. Introduction to GUI Programming with Tkinter
Juson
Juson
  • Juson
    Juson의 데이터 공부
    Juson
  • 전체
    오늘
    어제
    • 분류 전체보기 (95)
      • RAG (2)
      • AI (2)
        • NLP (0)
        • Generative Model (0)
        • Deep Reinforcement Learning (2)
        • LLM (0)
      • Logistic Optimization (0)
      • Machine Learning (37)
        • Linear Regression (2)
        • Logistic Regression (2)
        • Decision Tree (5)
        • Naive Bayes (1)
        • KNN (2)
        • SVM (2)
        • Clustering (4)
        • Dimension Reduction (3)
        • Boosting (6)
        • Abnomaly Detection (2)
        • Recommendation (4)
        • Embedding & NLP (4)
      • Reinforcement Learning (5)
      • Deep Learning (10)
        • Deep learning Bacis Mathema.. (10)
      • Optimization (2)
        • OR Optimization (0)
        • Convex Optimization (0)
        • Integer Optimization (0)
      • SNA 분석 (0)
      • 포트폴리오 최적화 공부 (0)
        • 최적화 기법 (0)
        • 금융 베이스 (0)
      • Finanancial engineering (0)
      • 프로그래머스 데브코스(Boot camp) (15)
        • SQL (9)
        • Python (5)
        • Machine Learning (1)
      • Python (22)
      • Project (0)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.4
Juson
python 복습(4) - 머신러닝 딥러닝 맛보기
상단으로

티스토리툴바