데이터 프레임 생성¶

스크린샷 2021-05-23 오전 12.53.24.png

In [ ]:

import pandas as pd

Dict 를 통한 데이터 프레임 생성

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

In [ ]:

type(df)

Out[ ]:

pandas.core.frame.DataFrame

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

dummy = {'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]}

In [ ]:

df2 = pd.DataFrame(dummy)

In [ ]:

df2

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

List 를 이용한 데이터 프레임 생성

In [ ]:

a = [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

In [ ]:

df3 = pd.DataFrame(a)

In [ ]:

df3

Out[ ]:

	0	1	2
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df3.columns = ['a', 'b', 'c']

In [ ]:

df3

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

문제 : 아래 테이블과 같은 데이터 프레임을 만드시오.

스크린샷 2021-05-23 오전 1.22.45.png

In [ ]:

a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6]}

In [ ]:

df4 = pd.DataFrame(a)

In [ ]:

df4

Out[ ]:

	company	직원수
0	abc	400
1	회사	10
2	123	6

문제 : 아래 테이블과 같은 데이터 프레임을 만드시오.

스크린샷 2021-05-23 오전 1.35.23.png

In [ ]:

a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6], '위치' : ['Seoul', NaN, 'Busan']}

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-ae66b6ff152c> in <module>()
----> 1 a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6], '위치' : ['Seoul', NaN, 'Busan']}

NameError: name 'NaN' is not defined

In [ ]:

a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6], '위치' : ['Seoul', , 'Busan']}

  File "<ipython-input-17-ec3acaf87bae>", line 1
    a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6], '위치' : ['Seoul', , 'Busan']}
                                                                                ^
SyntaxError: invalid syntax

- numpy를 통한 해결

In [ ]:

import numpy as np

In [ ]:

a = {'company' : ['abc', '회사', 123], '직원수' : [400, 10, 6], '위치' : ['Seoul', np.NaN, 'Busan']}

In [ ]:

df5 = pd.DataFrame(a)

In [ ]:

df5

Out[ ]:

	company	직원수	위치
0	abc	400	Seoul
1	회사	10	NaN
2	123	6	Busan

칼럼명 추출 / 변경¶

데이터 프레임 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

칼럼명 얻기

In [ ]:

df.columns

Out[ ]:

Index(['a', 'b', 'c'], dtype='object')

In [ ]:

df.columns[1]

Out[ ]:

'b'

- 문제 : 칼럼명인 a, b, c 를 d, e, f 로 바꾸어라.

치환을 통한 칼럼명 변경

In [ ]:

df.columns = ['d', 'e', 'f']

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

- 문제 : 칼럼명인 d, e, f 중 d 를 '디' 로 f 를 '에프' 로 바꾸어라.

In [ ]:

df.columns = ['디', 'e', '에프']

In [ ]:

df

Out[ ]:

	디	e	에프
0	1	4	7
1	2	5	8
2	3	6	9

rename 을 통한 칼럼명 변경

In [ ]:

# 데이터 프레임 재생성
df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})
df.columns = ['d', 'e', 'f']

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df.rename(columns = {'d' : '디', 'f' : '에프'})

Out[ ]:

	디	e	에프
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

- inplace = True 로 되어 있어야 저장

In [ ]:

df.rename(columns = {'d' : '디', 'f' : '에프'}, inplace = True)

In [ ]:

df

Out[ ]:

	디	e	에프
0	1	4	7
1	2	5	8
2	3	6	9

copy 를 이용한 데이터 복사¶

- 데이터 생성

In [ ]:

import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

- 문제 : 필드명을 a, b, c 에서 d, e, f 로 변경하시오.

In [ ]:

df.columns = ['d', 'e', 'f']

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

- 문제 : 필드명 a 를 '에이' 로 변경하시오.

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

In [ ]:

df.rename(columns = {'a' : '에이'}, inplace = True)

In [ ]:

df

Out[ ]:

	에이	b	c
0	1	4	7
1	2	5	8
2	3	6	9

- 변수명을 복제하여 해결???

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})
df2 = df

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df2

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

- 문제 : 필드명을 a, b, c 에서 d, e, f 로 변경하시오.

In [ ]:

df.columns = ['d', 'e', 'f']

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df2

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

스크린샷 2021-05-23 오전 3.04.03.png

deep copy 를 통한 해결

In [ ]:

import copy

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})
df2 = copy.deepcopy(df)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df2

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df.columns = ['d', 'e', 'f']

In [ ]:

df

Out[ ]:

	d	e	f
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df2

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

시리즈¶

- 데이터 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

- a 열을 추출하시오

In [ ]:

df['a']

Out[ ]:

0    1
1    2
2    3
Name: a, dtype: int64

In [ ]:

type(df['a'])

Out[ ]:

pandas.core.series.Series

시리즈 생성 방법

In [ ]:

a = pd.Series([1, 2, 3, 1, 2, 3])

In [ ]:

Out[ ]:

0    1
1    2
2    3
3    1
4    2
5    3
dtype: int64

- 인덱스 변경

In [ ]:

a = pd.Series([1, 2, 3, 1, 2, 3], index = ['a', 'b', 'c', 'd', 'e', 'f'])

In [ ]:

Out[ ]:

a    1
b    2
c    3
d    1
e    2
f    3
dtype: int64

In [ ]:

a['e']

Out[ ]:

유일한 값 찾기

- 아래 테이블(df) 의 a 필드의 값들 중 유일한 것들만 추출하시오.

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3, 1, 2, 3], 'b' : [4, 5, 6, 6, 7, 8], 'c' : [7, 8, 9, 10, 11, 12]})

In [ ]:

a = df['a']

In [ ]:

Out[ ]:

0    1
1    2
2    3
3    1
4    2
5    3
Name: a, dtype: int64

In [ ]:

type(a)

Out[ ]:

pandas.core.series.Series

In [ ]:

a.unique()

Out[ ]:

array([1, 2, 3])

In [ ]:

a.unique()[2]

Out[ ]:

- 문제 : a, b 열을 추출하시오.

In [ ]:

df = pd.DataFrame({'a': [1, 2, 3], 'b' : [4, 5, 6], 'c' : [7, 8, 9]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	4	7
1	2	5	8
2	3	6	9

In [ ]:

df['a', 'b']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('a', 'b')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-75-8fa5ad5a23e2> in <module>()
----> 1 df['a', 'b']

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: ('a', 'b')

loc 과 iloc을 이용한 원하는 위치의 데이터 추출¶

- 데이터 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'a' : [i for i in range(1, 11)], 'b' : [i for i in range(11, 21)], 'c' : [i for i in range(21, 31)]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

- 문제 : a, b 열을 추출하시오.

In [ ]:

df['a', 'b']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('a', 'b')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-78-8fa5ad5a23e2> in <module>()
----> 1 df['a', 'b']

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: ('a', 'b')

In [ ]:

df[['a', 'b']]

Out[ ]:

	a	b
0	1	11
1	2	12
2	3	13
3	4	14
4	5	15
5	6	16
6	7	17
7	8	18
8	9	19
9	10	20

In [ ]:

type(df[['a', 'b']])

Out[ ]:

pandas.core.frame.DataFrame

- 문제 : 첫 번째 행의 데이터를 출력하시오.

In [ ]:

df[0]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-81-ad11118bc8f3> in <module>()
----> 1 df[0]

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 0

loc 을 통한 해결

In [ ]:

df.loc[0]

Out[ ]:

a     1
b    11
c    21
Name: 0, dtype: int64

- 슬라이스도 가능

In [ ]:

df.loc[2:4]

Out[ ]:

	a	b	c
2	3	13	23
3	4	14	24
4	5	15	25

- 인덱스가 문자로 이루어진 데이터 프레임 생성

In [ ]:

index = ['a', 'b', 'd', 'c', 'e', 'f', 'g', 'g', 'h', 'i']

In [ ]:

df = pd.DataFrame({'a' : [i for i in range(1, 11)], 'b' : [i for i in range(11, 21)], 'c' : [i for i in range(21, 31)]}, index = index)

In [ ]:

df

Out[ ]:

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

In [ ]:

df.loc[0]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine._maybe_get_bool_indexer()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-87-7eaf75073732> in <module>()
----> 1 df.loc[0]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3491             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3492         else:
-> 3493             loc = self.index.get_loc(key)
   3494 
   3495             if isinstance(loc, np.ndarray):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 0

In [ ]:

df.loc['g']

Out[ ]:

	a	b	c
g	7	17	27
g	8	18	28

In [ ]:

df.loc['c':]

Out[ ]:

	a	b	c
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

- 문제 : 열이 a, c 이며 인덱스가 g, i 인 데이터를 출력하시오.

In [ ]:

df

Out[ ]:

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

In [ ]:

df.loc[['g', 'i'], ['a', 'c']]

Out[ ]:

	a	c
g	7	27
g	8	28
i	10	30

- 문제 : 처음부터 5번째 까지의 데이터와 첫 번째 열과 세 번째 열의 데이터를 추출하시오.

In [ ]:

df = pd.DataFrame({'a' : [i for i in range(1, 11)], 'b' : [i for i in range(11, 21)], 'c' : [i for i in range(21, 31)]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

iloc을 사용한 해결

In [ ]:

df.iloc[:5, [0, 2]]

Out[ ]:

	a	c
0	1	21
1	2	22
2	3	23
3	4	24
4	5	25

- 문제 : 처음부터 5번째 까지의 데이터와 첫 번째 열과 세 번째 열의 데이터를 추출하시오.

In [ ]:

index = ['a', 'b', 'd', 'c', 'e', 'f', 'g', 'g', 'h', 'i']
df = pd.DataFrame({'a' : [i for i in range(1, 11)], 'b' : [i for i in range(11, 21)], 'c' : [i for i in range(21, 31)]}, index = index)

In [ ]:

df

Out[ ]:

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

In [ ]:

df.iloc[:5, [0, 2]]

Out[ ]:

	a	c
a	1	21
b	2	22
d	3	23
c	4	24
e	5	25

조건에 맞는 데이터 추출¶

- 데이터 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'a' : [i for i in range(1, 11)], 'b' : [i for i in range(11, 21)], 'c' : [i for i in range(21, 31)]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

- 문제 : a, c 열을 출력하시오.

In [ ]:

df[['a', 'c']]

Out[ ]:

	a	c
0	1	21
1	2	22
2	3	23
3	4	24
4	5	25
5	6	26
6	7	27
7	8	28
8	9	29
9	10	30

- 문제 : a 가 3 이상인 데이터를 출력하시오.

In [ ]:

df[df['a'] >= 3]

Out[ ]:

	a	b	c
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

- 문제 : a 가 3 이상인 데이터 중 a, c 열만 출력하시오.

In [ ]:

df[df['a'] >= 3][['a', 'c']]

Out[ ]:

	a	c
2	3	23
3	4	24
4	5	25
5	6	26
6	7	27
7	8	28
8	9	29
9	10	30

- 문제 a 가 3 이상이고, b 가 16 미만인 데이터를 출력하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

In [ ]:

df[(df['a'] >= 3) & (df['b'] < 16)]

Out[ ]:

	a	b	c
2	3	13	23
3	4	14	24
4	5	15	25

In [ ]:

a = (df['a'] >= 3) & (df['b'] < 16)

In [ ]:

Out[ ]:

0    False
1    False
2     True
3     True
4     True
5    False
6    False
7    False
8    False
9    False
dtype: bool

In [ ]:

type(a)

Out[ ]:

pandas.core.series.Series

In [ ]:

df[a]

Out[ ]:

	a	b	c
2	3	13	23
3	4	14	24
4	5	15	25

- 문제 : a 가 3 이하 이거나 7 이상인 데이터를 출력하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

In [ ]:

df[(df['a'] <= 3) | (df['a'] >= 7)]

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

- 문제 : a 가 3 이상이고, b 가 16 미만이거나 c 가 30 인 데이터를 출력하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

In [ ]:

df[(df['a'] >= 3) & ((df['b'] < 16) | (df['c'] == 30))]

Out[ ]:

	a	b	c
2	3	13	23
3	4	14	24
4	5	15	25
9	10	20	30

정렬¶

- 데이터 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'a' : [2, 3, 2, 7, 4], 'b' : [2, 1, 3, 5, 3], 'c' : [1, 1, 2, 3, 5]})

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
1	3	1	1
2	2	3	2
3	7	5	3
4	4	3	5

인덱스 기준 정렬

In [ ]:

df.sort_index()

Out[ ]:

	a	b	c
0	2	2	1
1	3	1	1
2	2	3	2
3	7	5	3
4	4	3	5

- 내림차순일때는 ascending = False

In [ ]:

df.sort_index(ascending = False)

Out[ ]:

	a	b	c
4	4	3	5
3	7	5	3
2	2	3	2
1	3	1	1
0	2	2	1

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
1	3	1	1
2	2	3	2
3	7	5	3
4	4	3	5

- 결과를 저장하고 싶으면 inplace = True

In [ ]:

df.sort_index(ascending = False, inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
4	4	3	5
3	7	5	3
2	2	3	2
1	3	1	1
0	2	2	1

- 인덱스 초기화는 reset_index

In [ ]:

df.reset_index()

Out[ ]:

	index	a	b	c
0	4	4	3	5
1	3	7	5	3
2	2	2	3	2
3	1	3	1	1
4	0	2	2	1

In [ ]:

df

Out[ ]:

	a	b	c
4	4	3	5
3	7	5	3
2	2	3	2
1	3	1	1
0	2	2	1

In [ ]:

df.reset_index(drop = True)

Out[ ]:

	a	b	c
0	4	3	5
1	7	5	3
2	2	3	2
3	3	1	1
4	2	2	1

In [ ]:

df.reset_index(drop = True, inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
0	4	3	5
1	7	5	3
2	2	3	2
3	3	1	1
4	2	2	1

값 기준 정렬

In [ ]:

df = pd.DataFrame({'a' : [2, 3, 2, 7, 4], 'b' : [2, 1, 3, 5, 3], 'c' : [1, 1, 2, 3, 5]})

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
1	3	1	1
2	2	3	2
3	7	5	3
4	4	3	5

- 문제 : a 열 기준으로 오름차순 정렬하시오.

In [ ]:

df.sort_values(by = ['a'])

Out[ ]:

	a	b	c
0	2	2	1
2	2	3	2
1	3	1	1
4	4	3	5
3	7	5	3

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
1	3	1	1
2	2	3	2
3	7	5	3
4	4	3	5

In [ ]:

df.sort_values(by = ['a'], inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
2	2	3	2
1	3	1	1
4	4	3	5
3	7	5	3

- 문제 : a 열 기준으로 내림차순 정렬하시오.

In [ ]:

df.sort_values(by = ['a'], ascending = False)

Out[ ]:

	a	b	c
3	7	5	3
4	4	3	5
1	3	1	1
0	2	2	1
2	2	3	2

- 문제 : a, b 열 기준으로 오름차순 정렬하시오.

In [ ]:

df.sort_values(by = ['a', 'b'])

Out[ ]:

	a	b	c
0	2	2	1
2	2	3	2
1	3	1	1
4	4	3	5
3	7	5	3

- 문제 : a 열 기준으로 오름차순 정렬 한 이후, b 열 기준으로 내림차순 정렬하시오.

In [ ]:

df.sort_values(by = ['a', 'b'], ascending = [True, False])

Out[ ]:

	a	b	c
2	2	3	2
0	2	2	1
1	3	1	1
4	4	3	5
3	7	5	3

In [ ]:

df

Out[ ]:

	a	b	c
0	2	2	1
2	2	3	2
1	3	1	1
4	4	3	5
3	7	5	3

In [ ]:

df.sort_values(by = ['a', 'b'], ascending = [True, False], inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
2	2	3	2
0	2	2	1
1	3	1	1
4	4	3	5
3	7	5	3

In [ ]:

df.reset_index(drop = True, inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
0	2	3	2
1	2	2	1
2	3	1	1
3	4	3	5
4	7	5	3

결측값 처리¶

In [ ]:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a' : [1, 1, 3, 4, 5], 'b' : [2, 3, np.nan, 3, 4], 'c' : [3, 4, 7, 6, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
2	3	NaN	7
3	4	3.0	6
4	5	4.0	4

결측 유무 확인

In [ ]:

df.isnull()

Out[ ]:

	a	b	c
0	False	False	False
1	False	False	False
2	False	True	False
3	False	False	False
4	False	False	False

결측값 개수 확인

In [ ]:

df.isnull().sum()

Out[ ]:

a    0
b    1
c    0
dtype: int64

결측값이 포함된 열 / 행 지우기

- 결측값이 포함된 행 지우기

In [ ]:

df.dropna()

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
3	4	3.0	6
4	5	4.0	4

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
2	3	NaN	7
3	4	3.0	6
4	5	4.0	4

In [ ]:

df.dropna(inplace = True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
3	4	3.0	6
4	5	4.0	4

- 결측값이 포함된 열 지우기

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, 5], 'b' : [2, 3, np.nan, 3, 4], 'c' : [3, 4, 7, 6, 4]})

In [ ]:

df.dropna(axis=1)

Out[ ]:

	a	c
0	1	3
1	1	4
2	3	7
3	4	6
4	5	4

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
2	3	NaN	7
3	4	3.0	6
4	5	4.0	4

In [ ]:

df.dropna(axis=1, inplace = True)

In [ ]:

df

Out[ ]:

	a	c
0	1	3
1	1	4
2	3	7
3	4	6
4	5	4

결측값을 다른 값으로 대체하기

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, 5], 'b' : [2, 3, np.nan, 3, 4], 'c' : [3, 4, 7, 6, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
2	3	NaN	7
3	4	3.0	6
4	5	4.0	4

- 특정값으로 대체하기

In [ ]:

df.fillna(0, inplace=True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2.0	3
1	1	3.0	4
2	3	0.0	7
3	4	3.0	6
4	5	4.0	4

- 앞이나 뒤의 숫자로 바꾸기

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, np.nan], 'b' : [2, 3, np.nan, np.nan, 4], 'c' : [np.nan, 4, 1, 1, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

1) 뒤의 값으로 채우기

In [ ]:

df.fillna(method='bfill')

Out[ ]:

	a	b	c
0	1.0	2.0	4.0
1	1.0	3.0	4.0
2	3.0	4.0	1.0
3	4.0	4.0	1.0
4	NaN	4.0	4.0

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

2) 앞의 값으로 채우기

In [ ]:

df.fillna(method='ffill')

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	3.0	1.0
3	4.0	3.0	1.0
4	4.0	4.0	4.0

- limit 설정

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

In [ ]:

df.fillna(method='ffill', limit =1)

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	3.0	1.0
3	4.0	NaN	1.0
4	4.0	4.0	4.0

- 문제 : 데이터 프레임에 존재하는 결측값들을 뒤의 값으로 대체한 이후 앞의 값으로 대체하시오.

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, np.nan], 'b' : [2, 3, np.nan, np.nan, 4], 'c' : [np.nan, 4, 1, 1, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

In [ ]:

df.fillna(method='bfill', inplace=True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	4.0
1	1.0	3.0	4.0
2	3.0	4.0	1.0
3	4.0	4.0	1.0
4	NaN	4.0	4.0

In [ ]:

df.fillna(method='ffill', inplace=True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	4.0
1	1.0	3.0	4.0
2	3.0	4.0	1.0
3	4.0	4.0	1.0
4	4.0	4.0	4.0

- 평균으로 대체

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, np.nan], 'b' : [2, 3, np.nan, np.nan, 4], 'c' : [np.nan, 4, 1, 1, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

In [ ]:

df.mean()['a']

Out[ ]:

2.25

In [ ]:

df.fillna(df.mean()['a'])

Out[ ]:

	a	b	c
0	1.00	2.00	2.25
1	1.00	3.00	4.00
2	3.00	2.25	1.00
3	4.00	2.25	1.00
4	2.25	4.00	4.00

In [ ]:

df.mean()

Out[ ]:

a    2.25
b    3.00
c    2.50
dtype: float64

In [ ]:

df.fillna(df.mean())

Out[ ]:

	a	b	c
0	1.00	2.0	2.5
1	1.00	3.0	4.0
2	3.00	3.0	1.0
3	4.00	3.0	1.0
4	2.25	4.0	4.0

In [ ]:

df

Out[ ]:

	a	b	c
0	1.0	2.0	NaN
1	1.0	3.0	4.0
2	3.0	NaN	1.0
3	4.0	NaN	1.0
4	NaN	4.0	4.0

- 문제 : b, c 의 결측값들을 데이터 프레임의 전체 값의 평균으로 치환하시오.

In [ ]:

df.fillna(df.mean()[['b','c']])

Out[ ]:

	a	b	c
0	1.0	2.0	2.5
1	1.0	3.0	4.0
2	3.0	3.0	1.0
3	4.0	3.0	1.0
4	NaN	4.0	4.0

타입 변환¶

- 데이터 생성

In [ ]:

import pandas as pd

df = pd.DataFrame({'판매일' : ['5/11/21', '5/12/21', '5/13/21', '5/14/21', '5/15/21'],
                   '판매량' : ['10', '15', '20', '25', '30'], '방문자수' : ['10', '-', '17', '23', '25'], 
                   '기온' : ['24.1', '24.3', '24.8', '25', '25.4']})

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온
0	5/11/21	10	10	24.1
1	5/12/21	15	-	24.3
2	5/13/21	20	17	24.8
3	5/14/21	25	23	25
4	5/15/21	30	25	25.4

타입 확인

In [ ]:

df.dtypes

Out[ ]:

판매일     object
판매량     object
방문자수    object
기온      object
dtype: object

In [ ]:

df['판매량 보정'] = df['판매량'] + 1

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

/usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in evaluate(op, a, b, use_numexpr)
    232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
    234     return _evaluate_standard(op, op_str, a, b)

/usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in _evaluate_numexpr(op, op_str, a, b)
    118     if result is None:
--> 119         result = _evaluate_standard(op, op_str, a, b)
    120 

/usr/local/lib/python3.7/dist-packages/pandas/core/computation/expressions.py in _evaluate_standard(op, op_str, a, b)
     67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
     69 

TypeError: can only concatenate str (not "int") to str

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-a19684ee6306> in <module>()
----> 1 df['판매량 보정'] = df['판매량'] + 1

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
    188     else:
    189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
    191 
    192     return res_values

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    148             #  will handle complex numbers incorrectly, see GH#32047
    149             raise
--> 150         result = masked_arith_op(left, right, op)
    151 
    152     if is_cmp and (is_scalar(result) or result is NotImplemented):

/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in masked_arith_op(x, y, op)
    110         if mask.any():
    111             with np.errstate(all="ignore"):
--> 112                 result[mask] = op(xrav[mask], y)
    113 
    114     result, _ = maybe_upcast_putmask(result, ~mask, np.nan)

TypeError: can only concatenate str (not "int") to str

- 문제 : 판매량을 정수 형태로 변환 하시오.

In [ ]:

df.astype({'판매량' : 'int'})

Out[ ]:

	판매일	판매량	방문자수	기온
0	5/11/21	10	10	24.1
1	5/12/21	15	-	24.3
2	5/13/21	20	17	24.8
3	5/14/21	25	23	25
4	5/15/21	30	25	25.4

In [ ]:

df.dtypes

Out[ ]:

판매일     object
판매량     object
방문자수    object
기온      object
dtype: object

In [ ]:

df = df.astype({'판매량' : 'int'})

In [ ]:

df.dtypes

Out[ ]:

판매일     object
판매량      int64
방문자수    object
기온      object
dtype: object

In [ ]:

df['판매량 보정'] = df['판매량'] + 1

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온	판매량 보정
0	5/11/21	10	10	24.1	11
1	5/12/21	15	-	24.3	16
2	5/13/21	20	17	24.8	21
3	5/14/21	25	23	25	26
4	5/15/21	30	25	25.4	31

- 문제 : 방문자수를 숫자 타입으로 변형 하시오.

In [ ]:

df.astype({'방문자수' : 'int'})

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-06d7c7a7b0c5> in <module>()
----> 1 df.astype({'방문자수' : 'int'})

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5531                 if col_name in dtype:
   5532                     results.append(
-> 5533                         col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
   5534                     )
   5535                 else:

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5546         else:
   5547             # else, only a single dtype is given
-> 5548             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5549             return self._constructor(new_data).__finalize__(self, method="astype")
   5550 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    602         self, dtype, copy: bool = False, errors: str = "raise"
    603     ) -> "BlockManager":
--> 604         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    605 
    606     def convert(

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    407                 applied = b.apply(f, **kwargs)
    408             else:
--> 409                 applied = getattr(b, f)(**kwargs)
    410             result_blocks = _extend_blocks(applied, result_blocks)
    411 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    972         # work around NumPy brokenness, #1987
    973         if np.issubdtype(dtype.type, np.integer):
--> 974             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    975 
    976         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: '-'

In [ ]:

pd.to_numeric(df['방문자수'])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "-"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-12-647814c4e748> in <module>()
----> 1 pd.to_numeric(df['방문자수'])

/usr/local/lib/python3.7/dist-packages/pandas/core/tools/numeric.py in to_numeric(arg, errors, downcast)
    151         try:
    152             values = lib.maybe_convert_numeric(
--> 153                 values, set(), coerce_numeric=coerce_numeric
    154             )
    155         except (ValueError, TypeError):

pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()

ValueError: Unable to parse string "-" at position 1

In [ ]:

pd.to_numeric(df['방문자수'], errors = 'coerce')

Out[ ]:

0    10.0
1     NaN
2    17.0
3    23.0
4    25.0
Name: 방문자수, dtype: float64

In [ ]:

df.dtypes

Out[ ]:

판매일       object
판매량        int64
방문자수      object
기온        object
판매량 보정     int64
dtype: object

In [ ]:

df['방문자수'] = pd.to_numeric(df['방문자수'], errors = 'coerce')

In [ ]:

df.dtypes

Out[ ]:

판매일        object
판매량         int64
방문자수      float64
기온         object
판매량 보정      int64
dtype: object

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온	판매량 보정
0	5/11/21	10	10.0	24.1	11
1	5/12/21	15	NaN	24.3	16
2	5/13/21	20	17.0	24.8	21
3	5/14/21	25	23.0	25	26
4	5/15/21	30	25.0	25.4	31

In [ ]:

df = df.astype({'방문자수' : 'int'})

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-5212a29a4b6c> in <module>()
----> 1 df = df.astype({'방문자수' : 'int'})

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5531                 if col_name in dtype:
   5532                     results.append(
-> 5533                         col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
   5534                     )
   5535                 else:

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5546         else:
   5547             # else, only a single dtype is given
-> 5548             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
   5549             return self._constructor(new_data).__finalize__(self, method="astype")
   5550 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    602         self, dtype, copy: bool = False, errors: str = "raise"
    603     ) -> "BlockManager":
--> 604         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    605 
    606     def convert(

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    407                 applied = b.apply(f, **kwargs)
    408             else:
--> 409                 applied = getattr(b, f)(**kwargs)
    410             result_blocks = _extend_blocks(applied, result_blocks)
    411 

/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    593             vals1d = values.ravel()
    594             try:
--> 595                 values = astype_nansafe(vals1d, dtype, copy=True)
    596             except (ValueError, TypeError):
    597                 # e.g. astype_nansafe can fail on object-dtype of strings

/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    966 
    967         if not np.isfinite(arr).all():
--> 968             raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
    969 
    970     elif is_object_dtype(arr):

ValueError: Cannot convert non-finite values (NA or inf) to integer

In [ ]:

df.fillna(0, inplace = True)

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온	판매량 보정
0	5/11/21	10	10.0	24.1	11
1	5/12/21	15	0.0	24.3	16
2	5/13/21	20	17.0	24.8	21
3	5/14/21	25	23.0	25	26
4	5/15/21	30	25.0	25.4	31

In [ ]:

df = df.astype({'방문자수' : 'int'})

In [ ]:

df.dtypes

Out[ ]:

판매일       object
판매량        int64
방문자수       int64
기온        object
판매량 보정     int64
dtype: object

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온	판매량 보정
0	5/11/21	10	10	24.1	11
1	5/12/21	15	0	24.3	16
2	5/13/21	20	17	24.8	21
3	5/14/21	25	23	25	26
4	5/15/21	30	25	25.4	31

- 문제 : 판매일을 datetime 의 형태로 바꾸시오.

In [ ]:

df['판매일'] = pd.to_datetime(df['판매일'], format="%m/%d/%y")

In [ ]:

df

Out[ ]:

	판매일	판매량	방문자수	기온	판매량 보정
0	2021-05-11	10	10	24.1	11
1	2021-05-12	15	0	24.3	16
2	2021-05-13	20	17	24.8	21
3	2021-05-14	25	23	25	26
4	2021-05-15	30	25	25.4	31

In [ ]:

df.dtypes

Out[ ]:

판매일       datetime64[ns]
판매량                int64
방문자수               int64
기온                object
판매량 보정             int64
dtype: object

레코드, 칼럼 추가 / 삭제¶

In [ ]:

import pandas as pd

df = pd.DataFrame({'a' : [1, 1, 3, 4, 5], 'b' : [2, 3, 2, 3, 4], 'c' : [3, 4, 7, 6, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

칼럼 추가

- 문제 : 1, 3, 6, 4, 8 로 이루어진 d 칼럼을 추가하시오.

In [ ]:

df['d'] = [1, 3, 6, 4, 8]

In [ ]:

df

Out[ ]:

	a	b	c	d
0	1	2	3	1
1	1	3	4	3
2	3	2	7	6
3	4	3	6	4
4	5	4	4	8

- 문제 : 1 로 이루어진 e 칼럼을 추가하시오.

In [ ]:

df['e'] = [1, 1, 1, 1, 1]

In [ ]:

df

Out[ ]:

	a	b	c	d	e
0	1	2	3	1	1
1	1	3	4	3	1
2	3	2	7	6	1
3	4	3	6	4	1
4	5	4	4	8	1

또는

In [ ]:

df['e'] = 1

In [ ]:

df

Out[ ]:

	a	b	c	d	e
0	1	2	3	1	1
1	1	3	4	3	1
2	3	2	7	6	1
3	4	3	6	4	1
4	5	4	4	8	1

In [ ]:

df.dtypes

Out[ ]:

a    int64
b    int64
c    int64
d    int64
e    int64
dtype: object

- 문제 : a + b - c 의 결과로 이루어진 f 칼럼을 추가하시오.

In [ ]:

df['f'] = df['a'] + df['b'] - df['c']

In [ ]:

df

Out[ ]:

	a	b	c	d	e	f
0	1	2	3	1	1	0
1	1	3	4	3	1	0
2	3	2	7	6	1	-2
3	4	3	6	4	1	1
4	5	4	4	8	1	5

칼럼 삭제

- 문제 : 칼럼 d, e, f 를 삭제하시오.

In [ ]:

df.drop(['d', 'e', 'f'], axis=1)

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

In [ ]:

df

Out[ ]:

	a	b	c	d	e	f
0	1	2	3	1	1	0
1	1	3	4	3	1	0
2	3	2	7	6	1	-2
3	4	3	6	4	1	1
4	5	4	4	8	1	5

In [ ]:

df.drop(['d', 'e', 'f'], axis=1, inplace=True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

레코드 추가

- 문제 : a 에는 6, b 에는 7, c 에는 8 을 추가하시오.

In [ ]:

df.append({'a' : 6, 'b' : 7, 'c' : 8})

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-c0622c67ea15> in <module>()
----> 1 df.append({'a' : 6, 'b' : 7, 'c' : 8})

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity, sort)
   7709             if isinstance(other, dict):
   7710                 if not ignore_index:
-> 7711                     raise TypeError("Can only append a dict if ignore_index=True")
   7712                 other = Series(other)
   7713             if other.name is None and not ignore_index:

TypeError: Can only append a dict if ignore_index=True

In [ ]:

df.append({'a' : 6, 'b' : 7, 'c' : 8}, ignore_index = True)

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

In [ ]:

df = df.append({'a' : 6, 'b' : 7, 'c' : 8}, ignore_index = True)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8

- a 에는 7, b 에는 8, c 에는 9 를 추가하시오.

In [ ]:

df.loc[6] = [7, 8, 9]

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8
6	7	8	9

레코드 삭제

- 문제 : 첫 번째 레코드를 삭제하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8
6	7	8	9

In [ ]:

df.drop(0)

Out[ ]:

	a	b	c
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8
6	7	8	9

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8
6	7	8	9

- 문제 : 첫 번째, 두 번째 레코드를 삭제하시오.

In [ ]:

df = df.drop([0, 1])

In [ ]:

df

Out[ ]:

	a	b	c
2	3	2	7
3	4	3	6
4	5	4	4
5	6	7	8
6	7	8	9

- 문제 : 첫 번째에서 네 번째 레코드를 삭제하시오.

In [ ]:

df = pd.DataFrame({'a' : [1, 1, 3, 4, 5], 'b' : [2, 3, 2, 3, 4], 'c' : [3, 4, 7, 6, 4]})

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

In [ ]:

df.drop([i for i in range(4)])

Out[ ]:

	a	b	c
4	5	4	4

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

또는

In [ ]:

df.drop(df.index[:4])

Out[ ]:

	a	b	c
4	5	4	4

- 문제 : a 가 4 미만인 레코드들을 삭제하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

In [ ]:

df[df['a'] < 4].index

Out[ ]:

Int64Index([0, 1, 2], dtype='int64')

In [ ]:

df.drop(df[df['a'] < 4].index)

Out[ ]:

	a	b	c
3	4	3	6
4	5	4	4

- 문제 : a 가 3 미만이고 c 가 4 인 레코드들을 삭제하시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2	3
1	1	3	4
2	3	2	7
3	4	3	6
4	5	4	4

In [ ]:

df[(df['a'] < 3) & (df['c'] == 4)].index

Out[ ]:

Int64Index([1], dtype='int64')

In [ ]:

df.drop(df[(df['a'] < 3) & (df['c'] == 4)].index)

Out[ ]:

	a	b	c
0	1	2	3
2	3	2	7
3	4	3	6
4	5	4	4

apply, map 을 활용한 데이터 변환¶

In [ ]:

import pandas as pd

df = pd.DataFrame({'a' : [1, 2, 3, 4, 5]})

- 문제 : a 가 2 보다 작으면 '2 미만', 4 보다 작으면 '4 미만', 4 보다 크면 '4 이상' 이 저장된 b 칼럼을 추가하시오.

In [ ]:

df

Out[ ]:

	a
0	1
1	2
2	3
3	4
4	5

In [ ]:

df['b'] = 0

In [ ]:

df

Out[ ]:

	a	b
0	1	0
1	2	0
2	3	0
3	4	0
4	5	0

In [ ]:

a = df[df['a'] < 2]

In [ ]:

Out[ ]:

	a	b
0	1	0

In [ ]:

df['b'][a.index] = '2 미만'

In [ ]:

df

Out[ ]:

	a	b
0	1	2 미만
1	2	0
2	3	0
3	4	0
4	5	0

In [ ]:

a = df[(df['a'] >= 2) & (df['a'] < 4)]

In [ ]:

Out[ ]:

	a	b
1	2	0
2	3	0

In [ ]:

df['b'][a.index] = '4 미만'

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

In [ ]:

pd.set_option('mode.chained_assignment',  None)

In [ ]:

df['b'][a.index] = '4 미만'

In [ ]:

df

Out[ ]:

	a	b
0	1	2 미만
1	2	4 미만
2	3	4 미만
3	4	0
4	5	0

In [ ]:

a = df[df['a'] >= 4]

In [ ]:

df['b'][a.index] = '4 이상'

In [ ]:

df

Out[ ]:

	a	b
0	1	2 미만
1	2	4 미만
2	3	4 미만
3	4	4 이상
4	5	4 이상

함수 + apply 를 이용한 해결

In [ ]:

def case_function(x):
    if x < 2:
        return '2 미만'
    elif x < 4:
        return '4 미만'
    else:
        return '4 이상'

In [ ]:

df['c'] = df['a'].apply(case_function)

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2 미만	2 미만
1	2	4 미만	4 미만
2	3	4 미만	4 미만
3	4	4 이상	4 이상
4	5	4 이상	4 이상

- 문제 : a 가 1 이면 'one', 2 이면 'two', 3 이면 'three', 4 이면 'four', 5 이면 'five' 를 출력하는 칼럼 d 를 만드시오.

In [ ]:

df

Out[ ]:

	a	b	c
0	1	2 미만	2 미만
1	2	4 미만	4 미만
2	3	4 미만	4 미만
3	4	4 이상	4 이상
4	5	4 이상	4 이상

- 사용자 정의함수를 이용한 해결 방법

In [ ]:

def function(x):
    if x == 1:
        return 'one'
    elif x == 2:
        return 'two'
    elif x == 3:
        return 'three'
    elif x == 4:
        return 'four'
    elif x == 5:
        return 'five'

In [ ]:

df['d'] = df['a'].apply(function)

In [ ]:

df

Out[ ]:

	a	b	c	d
0	1	2 미만	2 미만	one
1	2	4 미만	4 미만	two
2	3	4 미만	4 미만	three
3	4	4 이상	4 이상	four
4	5	4 이상	4 이상	five

- map 을 이용한 해결 방법

In [ ]:

a = { 1 : 'one', 2 : 'two', 3 : 'three', 4 : 'four', 5 : 'five'}

In [ ]:

df['e'] = df['a'].map(a)

In [ ]:

df

Out[ ]:

	a	b	c	d	e
0	1	2 미만	2 미만	one	one
1	2	4 미만	4 미만	two	two
2	3	4 미만	4 미만	three	three
3	4	4 이상	4 이상	four	four
4	5	4 이상	4 이상	five	five

데이터 프레임 결합¶

스크린샷 2021-05-29 오후 2.35.20.png

1) 상하 결합¶

In [ ]:

import pandas as pd

df1 = pd.DataFrame({'A' : [1, 2, 3], 'B' : [11, 12, 13], 'C' : [21, 22, 23]})
df2 = pd.DataFrame({'A' : [4, 5, 6], 'B' : [14, 15, 16], 'C' : [24, 25, 26]})

In [ ]:

pd.concat([df1, df2])

Out[ ]:

	A	B	C
0	1	11	21
1	2	12	22
2	3	13	23
0	4	14	24
1	5	15	25
2	6	16	26

In [ ]:

pd.concat([df2, df1])

Out[ ]:

	A	B	C
0	4	14	24
1	5	15	25
2	6	16	26
0	1	11	21
1	2	12	22
2	3	13	23

- index 초기화를 위해서는 ignore_index = True

In [ ]:

pd.concat([df1, df2], ignore_index = True)

Out[ ]:

	A	B	C
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26

- 필드의 순서가 섞였을 때 결합 결과 확인

In [ ]:

df1 = pd.DataFrame({'A' : [1, 2, 3], 'B' : [11, 12, 13], 'C' : [21, 22, 23]})
df2 = pd.DataFrame({'B' : [14, 15, 16], 'A' : [4, 5, 6], 'C' : [24, 25, 26]})

In [ ]:

df1

Out[ ]:

	A	B	C
0	1	11	21
1	2	12	22
2	3	13	23

In [ ]:

df2

Out[ ]:

	B	A	C
0	14	4	24
1	15	5	25
2	16	6	26

In [ ]:

pd.concat([df1, df2])

Out[ ]:

	A	B	C
0	1	11	21
1	2	12	22
2	3	13	23
0	4	14	24
1	5	15	25
2	6	16	26

- 서로 다른 필드로 구성되어 있는 데이터 프레임의 결합

In [ ]:

df1 = pd.DataFrame({'A' : [1, 2, 3], 'B' : [11, 12, 13], 'C' : [21, 22, 23], 'D' : [31, 32, 33]})
df2 = pd.DataFrame({'A' : [3, 4, 5], 'B' : [13, 14, 15], 'C' : [23, 24, 25], 'E' : [41, 42, 43]})

In [ ]:

df1

Out[ ]:

	A	B	C	D
0	1	11	21	31
1	2	12	22	32
2	3	13	23	33

In [ ]:

df2

Out[ ]:

	A	B	C	E
0	3	13	23	41
1	4	14	24	42
2	5	15	25	43

In [ ]:

pd.concat([df1, df2])

Out[ ]:

	A	B	C	D	E
0	1	11	21	31.0	NaN
1	2	12	22	32.0	NaN
2	3	13	23	33.0	NaN
0	3	13	23	NaN	41.0
1	4	14	24	NaN	42.0
2	5	15	25	NaN	43.0

In [ ]:

pd.concat([df1, df2], join = 'outer')

Out[ ]:

	A	B	C	D	E
0	1	11	21	31.0	NaN
1	2	12	22	32.0	NaN
2	3	13	23	33.0	NaN
0	3	13	23	NaN	41.0
1	4	14	24	NaN	42.0
2	5	15	25	NaN	43.0

In [ ]:

pd.concat([df1, df2], join = 'inner')

Out[ ]:

	A	B	C
0	1	11	21
1	2	12	22
2	3	13	23
0	3	13	23
1	4	14	24
2	5	15	25

2) 좌우 결합¶

In [ ]:

import pandas as pd

df1 = pd.DataFrame({'A' : [1, 2, 3], 'B' : [11, 12, 13], 'C' : [21, 22, 23], 'D' : [31, 32, 33]})
df2 = pd.DataFrame({'E' : [3, 4, 5], 'F' : [13, 14, 15], 'G' : [23, 24, 25], 'H' : [41, 42, 43]})

In [ ]:

df1

Out[ ]:

	A	B	C	D
0	1	11	21	31
1	2	12	22	32
2	3	13	23	33

In [ ]:

df2

Out[ ]:

	E	F	G	H
0	3	13	23	41
1	4	14	24	42
2	5	15	25	43

In [ ]:

pd.concat([df1, df2], axis = 1)

Out[ ]:

	A	B	C	D	E	F	G	H
0	1	11	21	31	3	13	23	41
1	2	12	22	32	4	14	24	42
2	3	13	23	33	5	15	25	43

- 다음 두 데이터 프레임을 결합하시오.

In [ ]:

df1 = pd.DataFrame({'ID' : [1, 2, 3], '성별' : ['F', 'M', 'F'], '나이' : [20, 30, 40]})
df2 = pd.DataFrame({'ID' : [1, 2, 3], '키' : [160.5, 170.3, 180.1], '몸무게' : [45.1, 50.3, 72.1]})

In [ ]:

df1

Out[ ]:

	ID	성별	나이
0	1	F	20
1	2	M	30
2	3	F	40

In [ ]:

df2

Out[ ]:

	ID	키	몸무게
0	1	160.5	45.1
1	2	170.3	50.3
2	3	180.1	72.1

In [ ]:

pd.concat([df1, df2], axis = 1)

Out[ ]:

	ID	성별	나이	ID	키	몸무게
0	1	F	20	1	160.5	45.1
1	2	M	30	2	170.3	50.3
2	3	F	40	3	180.1	72.1

- 다음 두 데이터 프레임을 ID를 기준으로 결합하시오.

In [ ]:

df1 = pd.DataFrame({'ID' : [1, 2, 3, 4, 5], '성별' : ['F', 'M', 'F', 'M', 'F'], '나이' : [20, 30, 40, 25, 42]})
df2 = pd.DataFrame({'ID' : [3, 4, 5, 6, 7], '키' : [160.5, 170.3, 180.1, 142.3, 153.7], '몸무게' : [45.1, 50.3, 72.1, 38,  42]})

In [ ]:

df1

Out[ ]:

	ID	성별	나이
0	1	F	20
1	2	M	30
2	3	F	40
3	4	M	25
4	5	F	42

In [ ]:

df2

Out[ ]:

	ID	키	몸무게
0	3	160.5	45.1
1	4	170.3	50.3
2	5	180.1	72.1
3	6	142.3	38.0
4	7	153.7	42.0

In [ ]:

pd.concat([df1, df2], axis = 1)

Out[ ]:

	ID	성별	나이	ID	키	몸무게
0	1	F	20	3	160.5	45.1
1	2	M	30	4	170.3	50.3
2	3	F	40	5	180.1	72.1
3	4	M	25	6	142.3	38.0
4	5	F	42	7	153.7	42.0

스크린샷 2021-05-30 오전 7.53.55.png

In [ ]:

df1 = pd.DataFrame({'ID' : [1, 2, 3, 4, 5], '성별' : ['F', 'M', 'F', 'M', 'F'], '나이' : [20, 30, 40, 25, 42]})
df2 = pd.DataFrame({'ID' : [3, 4, 5, 6, 7], '키' : [160.5, 170.3, 180.1, 142.3, 153.7], '몸무게' : [45.1, 50.3, 72.1, 38,  42]})

In [ ]:

df1

Out[ ]:

	ID	성별	나이
0	1	F	20
1	2	M	30
2	3	F	40
3	4	M	25
4	5	F	42

In [ ]:

df2

Out[ ]:

	ID	키	몸무게
0	3	160.5	45.1
1	4	170.3	50.3
2	5	180.1	72.1
3	6	142.3	38.0
4	7	153.7	42.0

- 문제 : 성별과 나이가 확인 된 유저들을 대상으로 키와 몸무게의 정보를 결합하시오.

In [ ]:

pd.merge(df1, df2, how = 'left', on = 'ID')

Out[ ]:

	ID	성별	나이	키	몸무게
0	1	F	20	NaN	NaN
1	2	M	30	NaN	NaN
2	3	F	40	160.5	45.1
3	4	M	25	170.3	50.3
4	5	F	42	180.1	72.1

- 문제 : 키와 몸무게가 확인 된 유저들을 대상으로 성별과 나이의 정보를 결합하시오.

In [ ]:

pd.merge(df2, df1, how = 'left', on = 'ID')

Out[ ]:

	ID	키	몸무게	성별	나이
0	3	160.5	45.1	F	40.0
1	4	170.3	50.3	M	25.0
2	5	180.1	72.1	F	42.0
3	6	142.3	38.0	NaN	NaN
4	7	153.7	42.0	NaN	NaN

또는

In [ ]:

pd.merge(df1, df2, how = 'right', on = 'ID')

Out[ ]:

	ID	성별	나이	키	몸무게
0	3	F	40.0	160.5	45.1
1	4	M	25.0	170.3	50.3
2	5	F	42.0	180.1	72.1
3	6	NaN	NaN	142.3	38.0
4	7	NaN	NaN	153.7	42.0

- 문제 : 키, 몸무게, 성별, 나이 정보가 모두 확인 된 유저들의 정보를 출력하시오.

In [ ]:

pd.merge(df1, df2, how = 'inner', on = 'ID')

Out[ ]:

	ID	성별	나이	키	몸무게
0	3	F	40	160.5	45.1
1	4	M	25	170.3	50.3
2	5	F	42	180.1	72.1

- 문제 : 모든 유저들의 정보를 출력하시오.

In [ ]:

pd.merge(df1, df2, how = 'outer', on = 'ID')

Out[ ]:

	ID	성별	나이	키	몸무게
0	1	F	20.0	NaN	NaN
1	2	M	30.0	NaN	NaN
2	3	F	40.0	160.5	45.1
3	4	M	25.0	170.3	50.3
4	5	F	42.0	180.1	72.1
5	6	NaN	NaN	142.3	38.0
6	7	NaN	NaN	153.7	42.0

- 문제 : 모든 유저들의 정보를 출력하시오.

In [ ]:

df1 = pd.DataFrame({'USER_ID' : [1, 2, 3, 4, 5], '성별' : ['F', 'M', 'F', 'M', 'F'], '나이' : [20, 30, 40, 25, 42]})
df2 = pd.DataFrame({'ID' : [3, 4, 5, 6, 7], '키' : [160.5, 170.3, 180.1, 142.3, 153.7], '몸무게' : [45.1, 50.3, 72.1, 38,  42]})

In [ ]:

pd.merge(df1, df2, how = 'outer', left_on = 'USER_ID', right_on = 'ID')

Out[ ]:

	USER_ID	성별	나이	ID	키	몸무게
0	1.0	F	20.0	NaN	NaN	NaN
1	2.0	M	30.0	NaN	NaN	NaN
2	3.0	F	40.0	3.0	160.5	45.1
3	4.0	M	25.0	4.0	170.3	50.3
4	5.0	F	42.0	5.0	180.1	72.1
5	NaN	NaN	NaN	6.0	142.3	38.0
6	NaN	NaN	NaN	7.0	153.7	42.0

- 문제 : df1 은 회원의 정보를 저장한 데이터 프레임이며, df2 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 정보와 구매 내역을 취합하여 하나의 데이터 프레임으로 만드시오.

In [ ]:

df1 = pd.DataFrame({'ID' : [1, 2, 3, 4, 5], '가입일' : ['2021-01-02', '2021-01-04', '2021-01-10', '2021-02-10', '2021-02-24'], '성별' : ['F', 'M', 'F', 'M', 'M']})
df2 = pd.DataFrame({'구매순서' : [1, 2, 3, 4, 5], 'ID' : [1, 1, 2, 4, 1], '구매월' : [1, 1, 2, 2, 3], '금액' : [1000, 1500, 2000, 3000, 4000]})

In [ ]:

df1

Out[ ]:

	ID	가입일	성별
0	1	2021-01-02	F
1	2	2021-01-04	M
2	3	2021-01-10	F
3	4	2021-02-10	M
4	5	2021-02-24	M

In [ ]:

df2

Out[ ]:

	구매순서	ID	구매월	금액
0	1	1	1	1000
1	2	1	1	1500
2	3	2	2	2000
3	4	4	2	3000
4	5	1	3	4000

In [ ]:

pd.merge(df1, df2, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	구매순서	구매월	금액
0	1	2021-01-02	F	1.0	1.0	1000.0
1	1	2021-01-02	F	2.0	1.0	1500.0
2	1	2021-01-02	F	5.0	3.0	4000.0
3	2	2021-01-04	M	3.0	2.0	2000.0
4	3	2021-01-10	F	NaN	NaN	NaN
5	4	2021-02-10	M	4.0	2.0	3000.0
6	5	2021-02-24	M	NaN	NaN	NaN

그룹화¶

In [ ]:

import pandas as pd

df1 = pd.DataFrame({'ID' : [1, 2, 3, 4, 5], '가입일' : ['2021-01-02', '2021-01-04', '2021-01-10', '2021-02-10', '2021-02-24'], '성별' : ['F', 'M', 'F', 'M', 'M']})
df2 = pd.DataFrame({'구매순서' : [1, 2, 3, 4, 5], 'ID' : [1, 1, 2, 4, 1], '구매월' : [1, 1, 2, 2, 3], '금액' : [1000, 1500, 2000, 3000, 4000]})

- 문제 : df1 은 회원의 정보를 저장한 데이터 프레임이며, df2 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 정보와 구매 내역을 취합하여 하나의 데이터 프레임으로 만드시오.

In [ ]:

pd.merge(df1, df2, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	구매순서	구매월	금액
0	1	2021-01-02	F	1.0	1.0	1000.0
1	1	2021-01-02	F	2.0	1.0	1500.0
2	1	2021-01-02	F	5.0	3.0	4000.0
3	2	2021-01-04	M	3.0	2.0	2000.0
4	3	2021-01-10	F	NaN	NaN	NaN
5	4	2021-02-10	M	4.0	2.0	3000.0
6	5	2021-02-24	M	NaN	NaN	NaN

- 문제 : df1 은 회원의 정보를 저장한 데이터 프레임이며, df2 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 누적 금액을 회원 ID 별로 구하시오.

In [ ]:

df2

Out[ ]:

	구매순서	ID	구매월	금액
0	1	1	1	1000
1	2	1	1	1500
2	3	2	2	2000
3	4	4	2	3000
4	5	1	3	4000

In [ ]:

df2.groupby(by = ['ID'])['금액'].sum()

Out[ ]:

ID
1    6500
2    2000
4    3000
Name: 금액, dtype: int64

In [ ]:

type(df2.groupby(by = ['ID'])['금액'].sum())

Out[ ]:

pandas.core.series.Series

In [ ]:

s2 = df2.groupby(by = ['ID'])['금액'].sum()

In [ ]:

pd.merge(df1, s2, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	금액
0	1	2021-01-02	F	6500.0
1	2	2021-01-04	M	2000.0
2	3	2021-01-10	F	NaN
3	4	2021-02-10	M	3000.0
4	5	2021-02-24	M	NaN

- 문제 : df1 은 회원의 정보를 저장한 데이터 프레임이며, df2 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 월별 누적 금액을 회원 ID 별로 구하시오.

In [ ]:

df2

Out[ ]:

	구매순서	ID	구매월	금액
0	1	1	1	1000
1	2	1	1	1500
2	3	2	2	2000
3	4	4	2	3000
4	5	1	3	4000

In [ ]:

df2.groupby(by = ['ID', '구매월'])['금액'].sum()

Out[ ]:

ID  구매월
1   1      2500
    3      4000
2   2      2000
4   2      3000
Name: 금액, dtype: int64

In [ ]:

type(df2.groupby(by = ['ID', '구매월'])['금액'].sum())

Out[ ]:

pandas.core.series.Series

In [ ]:

s2 = df2.groupby(by = ['ID', '구매월'])['금액'].sum()

In [ ]:

s2.index

Out[ ]:

MultiIndex([(1, 1),
            (1, 3),
            (2, 2),
            (4, 2)],
           names=['ID', '구매월'])

In [ ]:

pd.merge(df1, s2, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	금액
0	1	2021-01-02	F	2500.0
1	1	2021-01-02	F	4000.0
2	2	2021-01-04	M	2000.0
3	3	2021-01-10	F	NaN
4	4	2021-02-10	M	3000.0
5	5	2021-02-24	M	NaN

In [ ]:

df3 = pd.DataFrame(s2)

In [ ]:

df3

Out[ ]:

		금액
ID	구매월
1	1	2500
1	3	4000
2	2	2000
4	2	3000

In [ ]:

df3.index

Out[ ]:

MultiIndex([(1, 1),
            (1, 3),
            (2, 2),
            (4, 2)],
           names=['ID', '구매월'])

In [ ]:

pd.merge(df1, df3, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	금액
0	1	2021-01-02	F	2500.0
1	1	2021-01-02	F	4000.0
2	2	2021-01-04	M	2000.0
3	3	2021-01-10	F	NaN
4	4	2021-02-10	M	3000.0
5	5	2021-02-24	M	NaN

- 그룹을 index 로 사용하고 싶지 않은 경우에는 as_index = False 로 설정

In [ ]:

df2.groupby(by = ['ID', '구매월'], as_index = False)['금액'].sum()

Out[ ]:

	ID	구매월	금액
0	1	1	2500
1	1	3	4000
2	2	2	2000
3	4	2	3000

In [ ]:

type(df2.groupby(by = ['ID', '구매월'], as_index = False)['금액'].sum())

Out[ ]:

pandas.core.frame.DataFrame

In [ ]:

df3 = df2.groupby(by = ['ID', '구매월'], as_index = False)['금액'].sum()

In [ ]:

pd.merge(df1, df3, how = 'left', on = 'ID')

Out[ ]:

	ID	가입일	성별	구매월	금액
0	1	2021-01-02	F	1.0	2500.0
1	1	2021-01-02	F	3.0	4000.0
2	2	2021-01-04	M	2.0	2000.0
3	3	2021-01-10	F	NaN	NaN
4	4	2021-02-10	M	2.0	3000.0
5	5	2021-02-24	M	NaN	NaN

- 문제 : df 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 누적 금액과 누적 구매 횟수를 회원 ID 별로 구하시오.

In [ ]:

df = pd.DataFrame({'구매순서' : [1, 2, 3, 4, 5], 'ID' : [1, 1, 2, 4, 1], '구매월' : [1, 1, 2, 2, 3], '금액' : [1000, 1500, 2000, 3000, 4000], '수수료' : [100, 150, 200, 300, 400]})

In [ ]:

df

Out[ ]:

	구매순서	ID	구매월	금액	수수료
0	1	1	1	1000	100
1	2	1	1	1500	150
2	3	2	2	2000	200
3	4	4	2	3000	300
4	5	1	3	4000	400

In [ ]:

df.groupby(by = ['ID'])['금액'].agg([sum, len])

Out[ ]:

	sum	len
ID
1	6500	3
2	2000	1
4	3000	1

In [ ]:

df.groupby(by = ['ID'], as_index = False)['금액'].agg([sum, len])

Out[ ]:

	sum	len
ID
1	6500	3
2	2000	1
4	3000	1

In [ ]:

df2 = df.groupby(by = ['ID'])['금액'].agg([sum, len])

In [ ]:

df2.reset_index(inplace = True)

In [ ]:

df2

Out[ ]:

	ID	sum	len
0	1	6500	3
1	2	2000	1
2	4	3000	1

- 문제 : df 는 각 회원의 구매 내역을 저장한 데이터 프레임이다. 각 회원의 최대 사용 금액 / 최소 사용 금액과 최저 수수료의 값을 구하시오.

In [ ]:

df

Out[ ]:

	구매순서	ID	구매월	금액	수수료
0	1	1	1	1000	100
1	2	1	1	1500	150
2	3	2	2	2000	200
3	4	4	2	3000	300
4	5	1	3	4000	400

In [ ]:

df.groupby(by = ['ID']).agg({'금액' : [max, min], '수수료' : min})

Out[ ]:

	금액		수수료
	max	min	min
ID
1	4000	1000	100
2	2000	2000	200
4	3000	3000	300

In [ ]:

df2 = df.groupby(by = ['ID']).agg({'금액' : [max, min], '수수료' : min})

In [ ]:

df2.reset_index()

Out[ ]:

	ID	금액		수수료
		max	min	min
0	1	4000	1000	100
1	2	2000	2000	200
2	4	3000	3000	300

In [ ]:

df2.columns

Out[ ]:

MultiIndex([( '금액', 'max'),
            ( '금액', 'min'),
            ('수수료', 'min')],
           )

In [ ]:

df2.columns.values

Out[ ]:

array([('금액', 'max'), ('금액', 'min'), ('수수료', 'min')], dtype=object)

In [ ]:

df2.columns = ['_'.join(col) for col in df2.columns.values]

In [ ]:

df2

Out[ ]:

	금액_max	금액_min	수수료_min
ID
1	4000	1000	100
2	2000	2000	200
4	3000	3000	300

In [ ]:

 df2.reset_index()

Out[ ]:

	ID	금액_max	금액_min	수수료_min
0	1	4000	1000	100
1	2	2000	2000	200
2	4	3000	3000	300

피벗테이블¶

In [ ]:

import pandas as pd
import numpy as np

- 문제 : 다음 데이터 프레임은 A 서비스의 월별 탈퇴 회원수를 가입 월별로 분류해 놓은 것이다. 이 데이터 프레임을 이용하여 피벗 테이블을 만드시오.

In [ ]:

df = pd.DataFrame({'가입월' : [1, 1, 1, 2, 2, 3], '탈퇴월' : [1, 2, 3, 2, 3, 3], '탈퇴회원수' : [101, 52, 30, 120, 60, 130]})

In [ ]:

df

Out[ ]:

	가입월	탈퇴월	탈퇴회원수
0	1	1	101
1	1	2	52
2	1	3	30
3	2	2	120
4	2	3	60
5	3	3	130

In [ ]:

pivot = pd.pivot_table(df, values = '탈퇴회원수' , index = ['가입월'], columns = ['탈퇴월'])

In [ ]:

type(pivot)

Out[ ]:

pandas.core.frame.DataFrame

In [ ]:

pivot

Out[ ]:

탈퇴월	1	2	3
가입월
1	101.0	52.0	30.0
2	NaN	120.0	60.0
3	NaN	NaN	130.0

In [ ]:

pd.pivot_table(df, values = '탈퇴회원수' , index = ['가입월'], columns = ['탈퇴월'], fill_value = 0 )

Out[ ]:

탈퇴월	1	2	3
가입월
1	101	52	30
2	0	120	60
3	0	0	130

- 문제 : 다음 데이터 프레임은 어느 과일 매장의 판매내역이다. 각 상품 항목 별, 크기 별로 판매 개수와 판매 금액의 합을 구하시오.

In [ ]:

import random

In [ ]:

random.randint(1,3)

Out[ ]:

In [ ]:

a = []
b = []

for i in range(100):
    a.append(random.randint(1,3))
    b.append(random.randint(1,3))

In [ ]:

df = pd.DataFrame({'품목' : a, '크기' : b})

In [ ]:

df

Out[ ]:

	품목	크기
0	2	2
1	2	3
2	2	2
3	1	3
4	1	1
...	...	...
95	1	1
96	1	2
97	1	3
98	2	2
99	2	1

100 rows × 2 columns

In [ ]:

df['금액'] = df['품목'] * df['크기'] * 500
df['수수료'] = df['금액'] * 0.1

In [ ]:

df

Out[ ]:

	품목	크기	금액	수수료
0	2	2	2000	200.0
1	2	3	3000	300.0
2	2	2	2000	200.0
3	1	3	1500	150.0
4	1	1	500	50.0
...	...	...	...	...
95	1	1	500	50.0
96	1	2	1000	100.0
97	1	3	1500	150.0
98	2	2	2000	200.0
99	2	1	1000	100.0

100 rows × 4 columns

In [ ]:

fruit_name = {1 : '토마토', 2 : '바나나', 3 : '사과'}
fruit_size = {1 : '소', 2 : '중', 3 : '대'}

df['품목'] = df['품목'].map(fruit_name)
df['크기'] = df['크기'].map(fruit_size)

In [ ]:

df

Out[ ]:

	품목	크기	금액	수수료
0	바나나	중	2000	200.0
1	바나나	대	3000	300.0
2	바나나	중	2000	200.0
3	토마토	대	1500	150.0
4	토마토	소	500	50.0
...	...	...	...	...
95	토마토	소	500	50.0
96	토마토	중	1000	100.0
97	토마토	대	1500	150.0
98	바나나	중	2000	200.0
99	바나나	소	1000	100.0

100 rows × 4 columns

In [ ]:

pd.pivot_table(df, values = '금액', index = ['품목'], columns = ['크기'], aggfunc = ( 'count', 'sum'))

Out[ ]:

	count			sum
크기	대	소	중	대	소	중
품목
바나나	10	10	12	30000	10000	24000
사과	9	16	10	40500	24000	30000
토마토	17	7	9	25500	3500	9000

- 문제 : 다음 데이터 프레임은. 어느 과일 매장의 판매내역이다. 각 상품 항목 별, 크기 별로. 판매 개수와 판매 금액 / 수수료의 합을 구하시오.

In [ ]:

df

Out[ ]:

	품목	크기	금액	수수료
0	토마토	중	1000	100.0
1	사과	소	1500	150.0
2	바나나	대	3000	300.0
3	사과	소	1500	150.0
4	토마토	대	1500	150.0
...	...	...	...	...
95	토마토	중	1000	100.0
96	사과	소	1500	150.0
97	토마토	중	1000	100.0
98	바나나	대	3000	300.0
99	토마토	소	500	50.0

100 rows × 4 columns

In [ ]:

pd.pivot_table(df, index = ['품목'], columns = ['크기'], aggfunc = {'금액' : ['count', 'sum'], '수수료' : 'sum'})

Out[ ]:

	금액						수수료
	count			sum			sum
크기	대	소	중	대	소	중	대	소	중
품목
바나나	10	10	12	30000	10000	24000	3000.0	1000.0	2400.0
사과	9	16	10	40500	24000	30000	4050.0	2400.0	3000.0
토마토	17	7	9	25500	3500	9000	2550.0	350.0	900.0

파일 호출 / 저장¶

In [1]:

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

In [2]:

import pandas as pd

파일 호출

In [3]:

df= pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/과일가게.csv')

In [4]:

type(df)

Out[4]:

pandas.core.frame.DataFrame

In [5]:

df

Out[5]:

	Unnamed: 0	품목	크기	금액	수수료
0	0	바나나	중	2000	200.0
1	1	바나나	대	3000	300.0
2	2	바나나	중	2000	200.0
3	3	토마토	대	1500	150.0
4	4	토마토	소	500	50.0
...	...	...	...	...	...
95	95	토마토	소	500	50.0
96	96	토마토	중	1000	100.0
97	97	토마토	대	1500	150.0
98	98	바나나	중	2000	200.0
99	99	바나나	소	1000	100.0

100 rows × 5 columns

- head() : 처음부터 5행까지 출력

In [7]:

df.head(10)

Out[7]:

	Unnamed: 0	품목	크기	금액	수수료
0	0	바나나	중	2000	200.0
1	1	바나나	대	3000	300.0
2	2	바나나	중	2000	200.0
3	3	토마토	대	1500	150.0
4	4	토마토	소	500	50.0
5	5	바나나	중	2000	200.0
6	6	바나나	소	1000	100.0
7	7	사과	중	3000	300.0
8	8	바나나	중	2000	200.0
9	9	토마토	소	500	50.0

- tail : 뒤에서부터 5행까지 출력

In [9]:

df.tail(10)

Out[9]:

	Unnamed: 0	품목	크기	금액	수수료
90	90	토마토	소	500	50.0
91	91	바나나	중	2000	200.0
92	92	사과	중	3000	300.0
93	93	바나나	소	1000	100.0
94	94	토마토	중	1000	100.0
95	95	토마토	소	500	50.0
96	96	토마토	중	1000	100.0
97	97	토마토	대	1500	150.0
98	98	바나나	중	2000	200.0
99	99	바나나	소	1000	100.0

- 첫 번째 열을 인덱스열로 삼고 싶을 경우

In [10]:

df = pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/과일가게.csv', index_col = 0)

In [11]:

df.head()

Out[11]:

	품목	크기	금액	수수료
0	바나나	중	2000	200.0
1	바나나	대	3000	300.0
2	바나나	중	2000	200.0
3	토마토	대	1500	150.0
4	토마토	소	500	50.0

- 구분자가 , 가 아닌 다른 기호인 경우

In [12]:

df = pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/read_sep.txt', index_col = 0, sep = '|')

In [13]:

df

Out[13]:

	A	B	C
index
0	1	11	21
1	2	12	22
2	3	13	23

- header 가 여러줄인 경우

In [14]:

df = pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/read_multi_header.csv', header = 1)

In [15]:

df

Out[15]:

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24

In [16]:

df.columns

Out[16]:

Index(['a', ' b', ' c'], dtype='object')

- 데이터를 읽으면서 칼럼명을 추가하고 싶을 때

In [17]:

df = pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/make_column_name.csv', 
                 index_col = 0, names = ['품목', '크기', '금액', '수수료'])

In [18]:

df.head()

Out[18]:

	품목	크기	금액	수수료
0	바나나	중	2000	200.0
1	바나나	대	3000	300.0
2	바나나	중	2000	200.0
3	토마토	대	1500	150.0
4	토마토	소	500	50.0

- 원하는 칼럼만 쓰고 싶을 때

In [19]:

df = pd.read_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/과일가게.csv', usecols = ['품목', '크기'])

In [20]:

df.head()

Out[20]:

	품목	크기
0	바나나	중
1	바나나	대
2	바나나	중
3	토마토	대
4	토마토	소

파일 저장

In [21]:

df.head()

Out[21]:

	품목	크기
0	바나나	중
1	바나나	대
2	바나나	중
3	토마토	대
4	토마토	소

In [22]:

df.to_csv('/content/drive/MyDrive/00__강의자료/2021_05_패스트캠퍼스_데이터분석/실습자료/00_colab/02_데이터_전처리/make_csv.csv')

In [ ]:

python 복습(2) - 데이터 전처리

데이터 프레임 생성¶

칼럼명 추출 / 변경¶

copy 를 이용한 데이터 복사¶

시리즈¶

loc 과 iloc을 이용한 원하는 위치의 데이터 추출¶

조건에 맞는 데이터 추출¶

정렬¶

결측값 처리¶

타입 변환¶

레코드, 칼럼 추가 / 삭제¶

apply, map 을 활용한 데이터 변환¶

데이터 프레임 결합¶

1) 상하 결합¶

2) 좌우 결합¶

그룹화¶

피벗테이블¶

파일 호출 / 저장¶

'Python' 카테고리의 다른 글

티스토리툴바

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

python 복습(4) - 머신러닝 딥러닝 맛보기 (0)	2024.03.15
python 복습(1) - 기초 (0)	2024.03.15
19. Pandas_DataProcessingAndAnalysis_complete (0)	2024.03.14
18. Introduction to GUI Programming with Tkinter (0)	2024.03.14
17. Object-Oriented Programming (0)	2024.03.14

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
a	1	11	21
b	2	12	22
d	3	13	23
c	4	14	24
e	5	15	25
f	6	16	26
g	7	17	27
g	8	18	28
h	9	19	29
i	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30

	a	b	c
0	1	11	21
1	2	12	22
2	3	13	23
3	4	14	24
4	5	15	25
5	6	16	26
6	7	17	27
7	8	18	28
8	9	19	29
9	10	20	30