All Articles

๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํŒจํ‚ค์ง€

pandas [ํ™ˆํŽ˜์ด์ง€]

pandas๋ž€?

Pandas๋Š” ๋น ๋ฅด๊ณ , ์œ ์—ฐํ•˜๋ฉฐ ํ’๋ถ€ํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ์˜ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€์ด๋‹ค. ์„œ๋กœ ๊ด€๊ณ„์žˆ๊ฑฐ๋‚˜, ๋ผ๋ฒจ๋ง ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‰ฝ๊ณ  ์ง๊ด€์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค€๋‹ค.(๊ณ  ํ™ˆํŽ˜์ด์ง€์— ์จ์žˆ๋‹ค.)

์กฐ๊ธˆ ๋” ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•ด๋ณด๋ฉด, pandas๋Š” ํ–‰๊ณผ ์—ด๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ ์˜ค๋ธŒ์ ํŠธ(=๊ด€๊ณ„ or ๋ผ๋ฒจ๋ง)๋ฅผ ๋งŒ๋“ค์–ด ๋‹ค๋ฃจ๋ฉฐ, ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ํŽธ๋ฆฌํ•œ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

pandas๋ฅผ ์™œ ์“ธ๊นŒ?

  • ๋ถ€๋™ ์†Œ์ˆ˜์  ๋ฐ์ดํ„ฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋ˆ„๋ฝ๋œ ๋ฐ์ดํ„ฐ(NaN์œผ๋กœ ์ฒ˜๋ฆฌ)๋ฅผ ์†์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌ
  • ํฌ๊ธฐ ๋ณ€๊ฒฝ : DataFrame ๋ฐ ์ƒ์œ„ ์ฐจ์› ๊ฐœ์ฒด์—์„œ ์—ด์„ ์‚ฝ์ž…ํ•˜๊ณ  ์‚ญ์ œ ๊ฐ€๋Šฅ
  • ๋ฐ์ดํ„ฐ ์ •๋ ฌ
  • ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ(python ๋ฐ numpy ๋“ฑ)์˜ ๋น„์ •ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ DataFrame ๊ฐ์ฒด๋กœ ์‰ฝ๊ฒŒ ๋ณ€ํ™˜ ๊ฐ€๋Šฅ
  • ์ง๊ด€์ ์ธ ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ ๋ฐ ๊ฒฐํ•ฉ
  • csv file, excel file ๋“ฑ ์—ฌ๋Ÿฌ ํŒŒ์ผ์„ ๋กœ๋“œ/์ €์žฅ ํ•˜๊ธฐ์— ์šฉ์ด. => ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ด์ค€๋‹ค.

data ๊ตฌ์กฐ?

์ฐจ์ˆ˜ ์ด๋ฆ„ ๊ธฐ์ˆ 
1 Series 1์ฐจ์› ๋ฐฐ์—ด
2 DataFrame 2์ฐจ์› ๋ฐฐ์—ด. (์ผ๋ฐ˜์ ์œผ๋กœ)

1. Series

  • pd.Series๋Š” 1์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ์‚ฌ์šฉ. ๋ณ€์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๋ฉด ์ธ๋ฑ์Šค ๋ฒˆํ˜ธ์™€ ์ด๋ฆ„, ์ž๋ฃŒํ˜•๋„ ํ•จ๊ป˜ ์ถœ๋ ฅ๋œ๋‹ค.
  • ์ƒ์„ฑํ•จ์ˆ˜

    • Series(data, name): data๋ฅผ name ์ด๋ผ๋Š” ์ด๋ฆ„์˜ Seriesํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด ์ค€๋‹ค.
    import pandas as pd
    
    ages = pd.Series([18, 21, 20, 16, 32, 22], name="ages")
    print(ages)
    ============
    0    18
    1    21
    2    20
    3    16
    4    32
    5    22
    Name: ages, dtype: int64
    ages.index=['a', 'b', 'c', 'd', 'e', 'f']
    print(ages)
    =============
    a    18
    b    21
    c    20
    d    16
    e    32
    f    22
    Name: ages, dtype: int64
    ages2 = pd.Series([18, 21, 20, 16, 32, 22],
                      index=['a', 'b', 'c', 'd', 'e', 'f'],
                      name="ages2")
    print(ages2)
    ==============
    a    18
    b    21
    c    20
    d    16
    e    32
    f    22
    Name: ages2, dtype: int64
    class_name = {'๊ตญ์–ด' : 90,'์˜์–ด' : 70,'์ˆ˜ํ•™' : 100,'๊ณผํ•™' : 80}
    class_name = pd.Series(class_name)
    print(class_name,'\n')
    ============
    ๊ตญ์–ด     90
    ์˜์–ด     70
    ์ˆ˜ํ•™    100
    ๊ณผํ•™     80
    dtype: int64 
    • Series๋กœ 1์ฐจ์› ๋ฐฐ์—ด์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งŒ๋“ค๋ฉด ์ž๋™์œผ๋กœ 0๋ฒˆ๋ถ€ํ„ฐ ์ˆœ์„œ๊ฐ€ ๋ถ™๊ฒŒ๋œ๋‹ค.

      • ์ด ๋•Œ index๋ฅผ ๋”ฐ๋กœ ์„ค์ •ํ•˜๊ณ  ์‹ถ์œผ๋ฉด index๋ผ๋Š” 1์ฐจ์› ๋ฐฐ์—ด์„ ๋‚˜์ค‘์— ๋„ฃ์–ด์ฃผ๊ฑฐ๋‚˜ ์ƒ์„ฑ์‹œ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.
    • Series๋Š” python dictionary๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์šฉ์ดํ•œ๋ฐ, index๋ช…์œผ๋กœ ์‚ฌ์šฉํ•  key๊ฐ’๊ณผ value๊ฐ’์„ ๊ฐ๊ฐ ์ž…๋ ฅํ•˜์—ฌ ํ•œ๋ฒˆ์— Series ์ฒ˜๋ฆฌํ•˜๋ฉด ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

2. DataFrame

  • DataFrame์€ Series์™€ ๋‹ฌ๋ฆฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ column์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค.
  • DataFrame์„ ์ •์˜ํ•  ๋•Œ๋Š” 2์ฐจ์› ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งค๊ฐœ ๋ณ€์ˆ˜๋กœ ์ „๋‹ฌํ•˜๋ฉฐ ์—ฌ๋Ÿฌ๊ฐœ์˜ Series ๋ฐ์ดํ„ฐ๋ฅผ ํ•ฉ์ณ DataFrame์„ ๋งŒ๋“ค ์ˆ˜๋„ ์žˆ๋‹ค.
DataFrame ?
import pandas as pd

values = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
index = ['one', 'two', 'three']
columns = ['A', 'B', 'C']

df = pd.DataFrame(values, index=index, columns=columns)
print(df)
===========
       A  B  C
one    1  2  3
two    4  5  6
three  7  8  9
  • DataFrame์€ value๊ฐ’๊ณผ index, columns์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.
print(df.index) # index ์ถœ๋ ฅ
===========
Index(['one', 'two', 'three'], dtype='object')

print(df.columns) # column๋ช… ์ถœ๋ ฅ
===========
Index(['A', 'B', 'C'], dtype='object')

print(df.values) # ๊ฐ’ ์ถœ๋ ฅ
===========
[[1 2 3]
 [4 5 6]
 [7 8 9]]
์ƒ์„ฑ ํ•จ์ˆ˜
  • DataFrame(data): data๋ฅผ DataFrame ๊ตฌ์กฐ๋กœ ๋งŒ๋“ค์–ด ์ค€๋‹ค.

    import pandas as pd
    
    data = [['name', 'age'], ['A', 20], ['B', 29], ['c', 24], ['d', 26]]
    data = pd.DataFrame(data)
    
    print(data)
    ============
        0    1
    0  name  age
    1     A   20
    2     B   29
    3     c   24
    4     d   26
    • name, age๊ฐ€ ๋“ค์–ด๊ฐ€๋Š” columns๊ฐ€ 0๋ฒˆ์— ๋“ค์–ด๊ฐ„๋‹ค. ํ•˜์ง€๋งŒ ์ € ์ž๋ฆฌ์— ๊ผญ columns๊ฐ€ ๋“ค์–ด๊ฐ€์•ผ ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋‹ค.
    • ๋จผ์ € ๋‚˜์˜ค๋Š” ๊ฒƒ์„ 0๋ฒˆ, ๋‚˜์ค‘์— ๋‚˜์˜ค๋Š” ๊ฒƒ์„ 1๋ฒˆ์œผ๋กœ label์ด ๋ถ™๋Š”๋‹ค. == index๊ฐ€ ๋ถ™๋Š”๋‹ค.
    • ๊ทธ๋Ÿฌ๋‚˜, ์œ„์˜ ๋ฐฉ๋ฒ•์€ ์ข‹์ง€ ๋ชปํ•œ(0๋ฒˆ์— column ๋ช…์ด ๋ถ™๋Š”) ๋ฐฉ๋ฒ•์ด๋‹ค.
  • data = [['A', 20], ['B', 29], ['c', 24], ['d', 26]]
    df = pd.DataFrame(data)
    print(df)
    ============
    	0   1
    0  A  20
    1  B  29
    2  c  24
    3  d  26
    • column ๋ช…์„ ์ œ๊ฑฐ ํ–ˆ์„ ๊ฒฝ์šฐ, 0๋ฒˆ index๋ถ€ํ„ฐ ๊ฐ’์ด ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ์ด ๊ฒฝ์šฐ, column ๋ช…์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
  • df = pd.DataFrame(data, columns=['name', 'age'])
    print(df)
    ============
        name  age
    0    A   20
    1    B   29
    2    c   24
    3    d   26
    • ์œ„์˜ data๋ฅผ ๊ฐ€์ง€๊ณ  columns์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ƒˆ๋กญ๊ฒŒ df๋ฅผ ๋งŒ๋“ค๊ณ , ์ถœ๋ ฅํ•œ๋‹ค.
  • python์˜ dictionary๋ฅผ ๊ฐ€์ง€๊ณ  ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•๋„ ์กด์žฌํ•œ๋‹ค.

    data = {'name': ['A', 'B', 'c', 'd'],
          'age': [20, 29, 24, 26]}
    df = pd.DataFrame(data)
    print(df)
    ===========
    name  age
    0    A   20
    1    B   29
    2    c   24
    3    d   26
    • dictionary์˜ key๊ฐ€ ์•Œ์•„์„œ column ๋ช…์ด ๋œ๋‹ค.
๋ฐ์ดํ„ฐ ์ถ”์ถœ ๋ฐ ์ถ”๊ฐ€ ํ•จ์ˆ˜
  • loc(): ๋ช…์‹œ์ ์ธ ์ธ๋ฑ์Šค๋ฅผ ์ฐธ์กฐํ•˜๋Š” ์ธ๋ฑ์‹ฑ/์Šฌ๋ผ์ด์‹ฑ
  • iloc() : ์ •์ˆ˜ ์ธ๋ฑ์Šค ์ธ๋ฑ์‹ฑ/์Šฌ๋ผ์ด์‹ฑ. ๋‹จ iloc์˜ ๊ฒฝ์šฐ ๋ฆฌ์ŠคํŠธ์™€ ๊ฐ™์ด ๋งˆ์ง€๋ง‰ ์ธ๋ฑ์Šค๋Š” ํฌํ•จ๋˜์ง€ ์•Š๋Š”๋‹ค.
  • loc, iloc ํ•จ์ˆ˜์— Index ๊ฐ’์„ ์ž…๋ ฅํ•˜์—ฌ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค๋ฅผ ์ถ”์ถœ/ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค.
๋ฐ์ดํ„ฐ ์‚ญ์ œ ํ•จ์ˆ˜
  • drop() : index, column ์‚ญ์ œ
  • drop()ํ•จ์ˆ˜์— Index ๊ฐ’์„ ์ž…๋ ฅํ•˜์—ฌ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ธ๋ฑ์Šค๋ฅผ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ๋‹ค.
import pandas as pd

a = pd.Series([20, 15, 30, 25, 35], name='age')
b = pd.Series([68.5, 60.3, 53.4, 74.1, 80.7], name='weight')
c = pd.Series([180, 165, 155, 178, 185], name='height')
human = pd.DataFrame([a, b, c])

print(human)
===========
            0      1      2      3      4
age      20.0   15.0   30.0   25.0   35.0
weight   68.5   60.3   53.4   74.1   80.7
height  180.0  165.0  155.0  178.0  185.0


# loc(), iloc() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํŠน์ • ํ–‰, ์—ด ์ถ”์ถœ
print(human.loc['age'], '\n')
===========
0    20.0
1    15.0
2    30.0
3    25.0
4    35.0
Name: age, dtype: float64 
        
print(human.iloc[0], '\n')
===========
0    20.0
1    15.0
2    30.0
3    25.0
4    35.0
Name: age, dtype: float64 

# loc(), iloc() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ํŠน์ • ๋ฒ”์œ„ ์ถ”์ถœ
print(human.loc['weight': 'height'], '\n')
===========
            0      1      2      3      4
weight   68.5   60.3   53.4   74.1   80.7
height  180.0  165.0  155.0  178.0  185.0 


print(human.iloc[1:3], '\n')
===========
            0      1      2      3      4
weight   68.5   60.3   53.4   74.1   80.7
height  180.0  165.0  155.0  178.0  185.0 


sex = ['F', 'M', 'F', 'M', 'F']
# ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜๊ธฐ
human.loc['sex'] = sex
print(human, '\n')
===========
           0     1     2     3     4
age       20    15    30    25    35
weight  68.5  60.3  53.4  74.1  80.7
height   180   165   155   178   185
sex        F     M     F     M     F 


# ์›ํ•˜๋Š” ํ–‰/์—ด ๋ฐ์ดํ„ฐ ์‚ญ์ œํ•˜๊ธฐ
tmp = human.drop(['height'])
print(tmp, '\n')
===========
           0     1     2     3     4
age       20    15    30    25    35
weight  68.5  60.3  53.4  74.1  80.7
sex        F     M     F     M     F 
plus
  • ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•จ์ˆ˜๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

    • df.head(n) - ์•ž ๋ถ€๋ถ„์„ n๊ฐœ๋งŒ ๋ณด๊ธฐ
    • df.tail(n) - ๋’ท ๋ถ€๋ถ„์„ n๊ฐœ๋งŒ ๋ณด๊ธฐ
    • df['์—ด์ด๋ฆ„'] - ํ•ด๋‹น๋˜๋Š” ์—ด์„ ํ™•์ธ

numpy

Numpy๋ž€?

  • ๋„˜ํŒŒ์ด(Numpy)๋Š” ํŒŒ์ด์ฌ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ์˜ ์ˆ˜์น˜ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ. ๋ฒกํ„ฐ ๋ฐ ํ–‰๋ ฌ ์—ฐ์‚ฐ์— ์žˆ์–ด์„œ ๋งค์šฐ ํŽธ๋ฆฌํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.
  • ๊ณ„์‚ฐ์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ๋ฐฐ์—ด(array)์„ ๊ฐ„ํŽธํ•˜๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ํ•จ์ˆ˜๋“ค์„ ์ œ๊ณตํ•œ๋‹ค.

    • N์ฐจ์›์˜ ๋ฐฐ์—ด ๊ฐ์ฒด ์‚ฌ์šฉ ์šฉ์ด
    • ์œ ์šฉํ•œ ์„ ํ˜• ๋Œ€์ˆ˜, ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜ ๋ฐ ๋‚œ์ˆ˜ ๊ตฌํ˜„ ๊ฐ€๋Šฅ.
  • Numpy๋ฅผ ํ™œ์šฉํ•ด deep learning ์„ ์šฉ์ดํ•˜๊ฒŒ ๋งŒ๋“  scipy ํŒจํ‚ค์ง€๋„ ์กด์žฌํ•œ๋‹ค.

Numpy๋ฅผ ์™œ ์“ธ๊นŒ?

๋‹ค์–‘ํ•œ ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋งž๋‹ค. ํŠนํžˆ ๋ฒกํ„ฐ ๋ฐ ํ–‰๋ ฌ ์—ฐ์‚ฐ์ด ์šฉ์ด ํ•œ ๊ฒƒ์ด ๊ฐ€์žฅ ํฐ ์žฅ์ ์ด๋‹ค.

๊ทธ๋Ÿผ, ์–ด๋А์ •๋„๊นŒ์ง€ ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ?

๊ธฐ์กด ํŒŒ์ด์ฌ์—์„œ ํ–‰๋ ฌ๋ผ๋ฆฌ์˜ ํ•ฉ์€ ์–ด๋–ค ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์„๊นŒ? ํ–‰๋ ฌ๋ผ๋ฆฌ์˜ ์ฐจ๋Š”?

a = [1, 2, 3]
b = [1, 1, 1]

print(a + b)

out:

[1, 2, 3, 1, 1, 1]

print(a - b)

out:

Traceback (most recent call last):
  File ".../numpy_test.py", line 4, in <module>
    print(a - b)
TypeError: unsupported operand type(s) for -: 'list' and 'list'

์œ„์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ, python์—์„œ list์˜ ํ•ฉ์€ list๋ฅผ ๋ถ™์ด๋Š” ๊ฒƒ์—์„œ ๋๋‚˜๊ณ , list์˜ ์ฐจ๋Š” ์—๋Ÿฌ๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค.


๊ทธ๋Ÿผ, numpy๋Š” ์–ด๋–จ๊นŒ?

import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 1, 1])

print("a + b : ", a + b)
print("a - b : ", a - b)

out:

a + b :  [2 3 4]
a - b :  [0 1 2]

list ์ž์ฒด๋ฅผ ํ•œ๋ฒˆ์— ์—ฐ์‚ฐํ•ด์ค€๋‹ค. ๋‹จ์ ์œผ๋กœ ๋งํ•˜๋ฉด, ํ–‰๋ ฌ๊ฐ„์˜ ์‰ฌ์šด ์—ฐ์‚ฐ์ด numpy ์‚ฌ์šฉ์˜ ์ „๋ถ€ ๋ผ๊ณ  ์ƒ๊ฐํ•ด๋„ ๋  ๋งŒํผ ํ˜์‹ ์ ์ธ ๊ณ„์‚ฐ์ด๋‹ค.

๋ฐ์ดํ„ฐ ๋ถ„์„์— ์žˆ์–ด์„œ, ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์— ๋งž๋Š” ๋ชจ๋ธ์„ ๊ตฌํ•˜๋ ค๋ฉด ๋•Œ๋•Œ๋กœ ๋งŽ์€ ๋ฐฉ์ •์‹์„ ํ’€์–ด์•ผ ํ•œ๋‹ค. numpy๋Š” ๊ทธ ๋ฐฉ์ •์‹์„ ์‰ฝ๊ฒŒ ํ’€ ์ˆ˜ ์žˆ๋Š” ์•„์ฃผ ์šฉ์ดํ•œ ํŒจํ‚ค์ง€์ด๋‹ค.

์šฉ์ดํ•œ ํ•จ์ˆ˜

๋‹ค์–‘ํ•œ ํ•จ์ˆ˜๋“ค์ด ์กด์žฌํ•˜๋‚˜, ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ธ๊ณต์ง€๋Šฅ ํ•™์Šต์—์„œ ์ž์ฃผ ์‚ฌ์šฉํ•˜๊ณ  ํ™œ์šฉ๋„๊ฐ€ ๋†’์€ ํ•จ์ˆ˜ ์œ„์ฃผ๋กœ ์†Œ๊ฐœํ•˜๋ ค๊ณ  ํ•œ๋‹ค.


list์˜ ์ •๋ณด์™€ ๊ด€๋ จ๋œ ํ•จ์ˆ˜๋“ค

  • np.array() : ์ง์ ‘ ๊ฐ’์„ ๋„ฃ์–ด์ค€ array๋กœ list๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
  • np.ndarray() : n์ฐจ์›์˜ ํฌ๊ธฐ๋ฅผ ์ง€์ •. ๋‚œ์ˆ˜์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” list๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

    • np.array()์™€ np.ndarray()๋กœ ๋งŒ๋“ค์–ด์ง„ numpy ๋ฐฐ์—ด์€ ๋™์ผํ•˜๋‹ค.
  • np.ones(), np.zeros() : 1, 0์œผ๋กœ๋งŒ ์ด๋ฃจ์–ด์ง„ ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•œ๋‹ค.
  • array.shape : ๋ฐฐ์—ด์˜ ํฌ๊ธฐ
  • array.ndim : ๋ฐฐ์—ด์˜ ์ฐจ์›
  • array.dtype : ๋ฐฐ์—ด์˜ type
import numpy as np
a = np.array([[1, 2, 3]])
print(a)
	=> [[1 2 3]]
print(np.ndarray((1, 3)))
	=> [[1.71457464e+214 9.30277090e+242 4.56535246e-085]]
print(np.ones((1, 3)))
	=> [[1. 1. 1.]]
print(np.zeros((1, 3)))
	=> [[0. 0. 0.]]
print(a.dtype)
	=> int32
print(a.shape)
	=> (1, 3)

list์˜ ํ†ต๊ณ„์  ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ•จ์ˆ˜๋“ค

  • np.min(x) : ๋ฐฐ์—ด x์˜ ์ตœ์†Ÿ๊ฐ’
  • np.max(x) : ๋ฐฐ์—ด x์˜ ์ตœ๋Œ“๊ฐ’
  • np.mean(x) : ๋ฐฐ์—ด x์˜ ํ‰๊ท ๊ฐ’
  • np.median(x) : ๋ฐฐ์—ด x์˜ ์ค‘์•™๊ฐ’
  • np.var(x) : ๋ฐฐ์—ด x์˜ ๋ถ„์‚ฐ
  • np.std(x) : ๋ฐฐ์—ด x์˜ ํ‘œ์ค€ํŽธ์ฐจ
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(np.min(a))
	=> 1
print(np.max(a))
	=> 9
print(np.mean(a))
	=> 5.0
print(np.median(a))
	=> 5.0
print(np.var(a))
	=> 6.666666666666667
print(np.std(a))
	=> 2.581988897471611

matplotlib [ํ™ˆํŽ˜์ด์ง€]

Matplotlib๋ž€?

์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด, python์—์„œ ๊ทธ๋ž˜ํ”„ ํ‘œ์‹œ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

๊ฐ„๋‹จํ•˜๊ฒŒ 2์ฐจ์› ์„ ๋ถ€ํ„ฐ, ๊ทธ๋ฆผ(์ด๋ฏธ์ง€), ํžˆ์Šคํ† ๊ทธ๋žจ, ๋ถ„ํฌ๋„๋ฅผ ํฌํ•จ, 3์ฐจ์› ๋ถ„ํฌ์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋‹ค.

Matplotlib๋ฅผ ์™œ ์“ธ๊นŒ?

๊ธฐ๋ณธ์ ์ธ y = ax + b๋ผ๋Š” ๋ผ์ธ์„ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ ๋ถ€ํ„ฐ, ์ˆ˜์น˜ํ™” ๋œ ๋ฐ์ดํ„ฐ๋“ค์„ ์‹œ๊ฐํ™” ํ•˜๊ธฐ ์ข‹๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์ดํ›„์— ๋‹ค๋ฃฐ ์„ ํ˜• ํšŒ๊ท€์—์„œ๋Š” ํฉ๋ฟŒ๋ ค์ง„ ๋ฐ์ดํ„ฐ๋“ค์— ๋Œ€ํ•ด ์„ ํ˜•ํšŒ๊ท€ํ•œ ํ•˜๋‚˜์˜ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค ๋•Œ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•ด ์ง๊ด€์ ์œผ๋กœ ์›ํ•˜๋Š” ๊ฐ’์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋‹ค๋ค„๋ณด๊ธฐ

import numpy as np
from matplotlib import pyplot as plt

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

plt.scatter(x, y)
plt.plot(x, 2 * x)
plt.show()

x์— ๋”ฐ๋ผ y ๊ฐ’์— ํ•ด๋‹น ํ•˜๋Š” ๋ถ€๋ถ„์— ์ ์„ ์ฐ๊ณ , 2x์— ํ•ด๋‹นํ•˜๋Š” y์— ์„ ์„ ๊ทธ๋ฆฌ๋Š” ์ฝ”๋“œ์ด๋‹ค.

matplotlib-line





๊ฒฐ๋ก 

์—ฌ๊ธฐ๊นŒ์ง€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํŒจํ‚ค์ง€์˜ ์‚ผ๋Œ€์žฅ์„ ์•Œ์•„๋ณด์•˜๋‹ค.

์„ธ ๊ฐ€์ง€์˜ ํŒจํ‚ค์ง€์— scipy์™€ ๊ฐ™์€ ํŒจํ‚ค์ง€๋ฅผ ๋”ํ•œ๋‹ค๋ฉด ๋จธ์‹ ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๋Š”๋ฐ ํฐ ๋„์›€์ด ๋  ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

Ref

๋”ฅ ๋Ÿฌ๋‹์„ ์ด์šฉํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž…๋ฌธ

์—˜๋ฆฌ์Šค - ํŒŒ์ด์ฌ์œผ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„