10. NumPy tutorial

2024. 3. 14. 01:22·Python
9numpy-basics

This notebook is based on the SciPy NumPy tutorial.

Array Creation and Properties¶

Here we create an array using arange and then change its shape to be 3 rows and 5 columns.

In [10]:
a = np.arange(15, dtype=np.float32).reshape(3, 5)
print(a.dtype)
a
float32
Out[10]:
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.]], dtype=float32)
In [7]:
np.arange?
Docstring:
arange([start,] stop[, step,], dtype=None, *, like=None)

Return evenly spaced values within a given interval.

``arange`` can be called with a varying number of positional arguments:

* ``arange(stop)``: Values are generated within the half-open interval
  ``[0, stop)`` (in other words, the interval including `start` but
  excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
  interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
  interval ``[start, stop)``, with spacing between values given by
  ``step``.

For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.

When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.

See the Warning sections below for more information.

Parameters
----------
start : integer or real, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : integer or real
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : integer or real, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values, ``out[i+1] - out[i]``.  The default
    step size is 1.  If `step` is specified as a position argument,
    `start` must also be given.
dtype : dtype, optional
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.
like : array_like, optional
    Reference object to allow the creation of arrays which are not
    NumPy arrays. If an array-like passed in as ``like`` supports
    the ``__array_function__`` protocol, the result will be defined
    by it. In this case, it ensures the creation of an array object
    compatible with that passed in via this argument.

    .. versionadded:: 1.20.0

Returns
-------
arange : ndarray
    Array of evenly spaced values.

    For floating point arguments, the length of the result is
    ``ceil((stop - start)/step)``.  Because of floating point overflow,
    this rule may result in the last element of `out` being greater
    than `stop`.

Warnings
--------
The length of the output might not be numerically stable.

Another stability issue is due to the internal implementation of
`numpy.arange`.
The actual step value used to populate the array is
``dtype(start + step) - dtype(start)`` and not `step`. Precision loss
can occur here, due to casting or due to using floating points when
`start` is much larger than `step`. This can lead to unexpected
behaviour. For example::

  >>> np.arange(0, 5, 0.5, dtype=int)
  array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
  >>> np.arange(-3, 3, 0.5, dtype=int)
  array([-3, -2, -1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

In such cases, the use of `numpy.linspace` should be preferred.

The built-in :py:class:`range` generates :std:doc:`Python built-in integers
that have arbitrary size <c-api/long>`, while `numpy.arange` produces
`numpy.int32` or `numpy.int64` numbers. This may result in incorrect
results for large integer values::

  >>> power = 40
  >>> modulo = 10000
  >>> x1 = [(n ** power) % modulo for n in range(8)]
  >>> x2 = [(n ** power) % modulo for n in np.arange(8)]
  >>> print(x1)
  [0, 1, 7776, 8801, 6176, 625, 6576, 4001]  # correct
  >>> print(x2)
  [0, 1, 7776, 7185, 0, 5969, 4816, 3361]  # incorrect

See Also
--------
numpy.linspace : Evenly spaced numbers with careful handling of endpoints.
numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions.
numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.

Examples
--------
>>> np.arange(3)
array([0, 1, 2])
>>> np.arange(3.0)
array([ 0.,  1.,  2.])
>>> np.arange(3,7)
array([3, 4, 5, 6])
>>> np.arange(3,7,2)
array([3, 5])
Type:      builtin_function_or_method

Note the row-major ordering -- you'll see that the numbers in each rows are together (in the inner []).

In [4]:
print(a)
[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]]

A numpy array has a lot of meta-data associated with it describing its shape, datatype, etc.

In [5]:
print(a.ndim)
print(a.shape)
print(a.size)
print(a.dtype)
print(a.itemsize)
print(type(a))
2
(3, 5)
15
float32
4
<class 'numpy.ndarray'>

We can create an array from a list.

In [3]:
b = np.array([1., 2, 3, 4])
print(b)
print(b.dtype)
[1. 2. 3. 4.]
float64

We can create a multi-dimensional array of a specified size initialized all to 0 easily. There is also an analogous ones() and empty() array routine. Note that here we explicitly set the datatype for the array.

Unlike lists in python, all of the elements of a numpy array are of the same datatype.

In [10]:
c = np.empty((10, 7), dtype=np.float64)
print(c)
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

linspace (and logspace) create arrays with evenly space (in log) numbers. For logspace, you specify the start and ending powers (base**start to base**stop)

In [13]:
d = np.linspace(0, 1, 11, endpoint=False)
print(d)
[0.         0.09090909 0.18181818 0.27272727 0.36363636 0.45454545
 0.54545455 0.63636364 0.72727273 0.81818182 0.90909091]
In [14]:
e = np.logspace(-1, 2, 15, endpoint=True, base=10)
print(e)
[  0.1          0.16378937   0.26826958   0.43939706   0.71968567
   1.17876863   1.93069773   3.16227766   5.17947468   8.48342898
  13.89495494  22.75845926  37.2759372   61.05402297 100.        ]

As always, as for help -- the numpy functions have very nice docstrings.

In [4]:
help(np.logspace)
Help on function logspace in module numpy:

logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)
    Return numbers spaced evenly on a log scale.
    
    In linear space, the sequence starts at ``base ** start``
    (`base` to the power of `start`) and ends with ``base ** stop``
    (see `endpoint` below).
    
    .. versionchanged:: 1.16.0
        Non-scalar `start` and `stop` are now supported.
    
    Parameters
    ----------
    start : array_like
        ``base ** start`` is the starting value of the sequence.
    stop : array_like
        ``base ** stop`` is the final value of the sequence, unless `endpoint`
        is False.  In that case, ``num + 1`` values are spaced over the
        interval in log-space, of which all but the last (a sequence of
        length `num`) are returned.
    num : integer, optional
        Number of samples to generate.  Default is 50.
    endpoint : boolean, optional
        If true, `stop` is the last sample. Otherwise, it is not included.
        Default is True.
    base : array_like, optional
        The base of the log space. The step size between the elements in
        ``ln(samples) / ln(base)`` (or ``log_base(samples)``) is uniform.
        Default is 10.0.
    dtype : dtype
        The type of the output array.  If `dtype` is not given, the data type
        is inferred from `start` and `stop`. The inferred type will never be
        an integer; `float` is chosen even if the arguments would produce an
        array of integers.
    axis : int, optional
        The axis in the result to store the samples.  Relevant only if start
        or stop are array-like.  By default (0), the samples will be along a
        new axis inserted at the beginning. Use -1 to get an axis at the end.
    
        .. versionadded:: 1.16.0
    
    
    Returns
    -------
    samples : ndarray
        `num` samples, equally spaced on a log scale.
    
    See Also
    --------
    arange : Similar to linspace, with the step size specified instead of the
             number of samples. Note that, when used with a float endpoint, the
             endpoint may or may not be included.
    linspace : Similar to logspace, but with the samples uniformly distributed
               in linear space, instead of log space.
    geomspace : Similar to logspace, but with endpoints specified directly.
    
    Notes
    -----
    Logspace is equivalent to the code
    
    >>> y = np.linspace(start, stop, num=num, endpoint=endpoint)
    ... # doctest: +SKIP
    >>> power(base, y).astype(dtype)
    ... # doctest: +SKIP
    
    Examples
    --------
    >>> np.logspace(2.0, 3.0, num=4)
    array([ 100.        ,  215.443469  ,  464.15888336, 1000.        ])
    >>> np.logspace(2.0, 3.0, num=4, endpoint=False)
    array([100.        ,  177.827941  ,  316.22776602,  562.34132519])
    >>> np.logspace(2.0, 3.0, num=4, base=2.0)
    array([4.        ,  5.0396842 ,  6.34960421,  8.        ])
    
    Graphical illustration:
    
    >>> import matplotlib.pyplot as plt
    >>> N = 10
    >>> x1 = np.logspace(0.1, 1, N, endpoint=True)
    >>> x2 = np.logspace(0.1, 1, N, endpoint=False)
    >>> y = np.zeros(N)
    >>> plt.plot(x1, y, 'o')
    [<matplotlib.lines.Line2D object at 0x...>]
    >>> plt.plot(x2, y + 0.5, 'o')
    [<matplotlib.lines.Line2D object at 0x...>]
    >>> plt.ylim([-0.5, 1])
    (-0.5, 1)
    >>> plt.show()

We can also initialize an array based on a function.

In [17]:
f = np.fromfunction(lambda i, j: i + j, (3, 3), dtype=np.int32)
f
Out[17]:
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])
In [18]:
f.dtype
Out[18]:
dtype('int32')
In [19]:
def myFun(x,y): 
    return 10*x+y

g = np.fromfunction(myFun, (5,4), dtype=int)
g
Out[19]:
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])

Array Operations¶

Most operations will work on an entire array at once.

In [20]:
a = np.arange(12).reshape(3,4)
print(a)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
In [22]:
a.sum(axis=0)
Out[22]:
array([12, 15, 18, 21])
In [23]:
a.sum()
Out[23]:
66
In [24]:
print(a.min(), a.max(axis=1))
0 [ 3  7 11]
In [25]:
a.dtype
Out[25]:
dtype('int32')

Universal Functions¶

In [26]:
b = a * np.pi / 12.0
print(b)
[[0.         0.26179939 0.52359878 0.78539816]
 [1.04719755 1.30899694 1.57079633 1.83259571]
 [2.0943951  2.35619449 2.61799388 2.87979327]]
In [27]:
c = np.cos(b)
print(c)
[[ 1.00000000e+00  9.65925826e-01  8.66025404e-01  7.07106781e-01]
 [ 5.00000000e-01  2.58819045e-01  6.12323400e-17 -2.58819045e-01]
 [-5.00000000e-01 -7.07106781e-01 -8.66025404e-01 -9.65925826e-01]]
In [28]:
d = b * c # same as .* in MATLAB
In [29]:
print(d)
[[ 0.00000000e+00  2.52878790e-01  4.53449841e-01  5.55360367e-01]
 [ 5.23598776e-01  3.38793338e-01  9.61835347e-17 -4.74310673e-01]
 [-1.04719755e+00 -1.66608110e+00 -2.26724921e+00 -2.78166669e+00]]
In [35]:
np.dot(b, c.T)
Out[35]:
array([[ 1.261689  , -0.13551734, -1.39720633],
       [ 4.96778188,  0.38808144, -4.57970044],
       [ 8.67387476,  0.91168022, -7.76219455]])
In [36]:
b @ c.T
Out[36]:
array([[ 1.261689  , -0.13551734, -1.39720633],
       [ 4.96778188,  0.38808144, -4.57970044],
       [ 8.67387476,  0.91168022, -7.76219455]])

Slicing¶

Slicing works very similarly to how we saw with strings.

In [50]:
a = np.fromfunction(myFun, (5,4), dtype=int)
print(a)
[[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]

Giving a single index (0-based) for each dimension just references a single value in the array.

In [38]:
a[2, 1]
Out[38]:
21
In [41]:
a[2][1]
Out[41]:
21

Note that you could also use a[2][1], but it is slower than a[2,1].

Doing slices will access a range of elements. Think of the start and stop in the slice as referencing the left-edge of the slots in the array.

In [51]:
b = a[0:2, 0:2].copy()
print(b)
[[ 0  1]
 [10 11]]
In [52]:
b[0,0] = 100
print(a[0,0])
0
In [57]:
a[:, 1].shape
Out[57]:
(5,)

Sometimes we want a one-dimensional view into the array -- here we see the memory layout (row-major) more explicitly.

In [58]:
a.flatten()
Out[58]:
array([ 0,  1,  2,  3, 10, 11, 12, 13, 20, 21, 22, 23, 30, 31, 32, 33, 40,
       41, 42, 43])

We can also iterate -- this is done over the first axis

In [59]:
for row in a:
    print(row)
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]
In [60]:
for col in a.T:
    print(col)
[ 0 10 20 30 40]
[ 1 11 21 31 41]
[ 2 12 22 32 42]
[ 3 13 23 33 43]

or element by element

In [61]:
for e in a.flat:
    print(e)
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43
In [62]:
a.flatten?

Copying Arrays¶

Simply using "=" does not make a copy, but much like with lists, you will just have multiple names pointing to the same ndarray object.

In [63]:
a = np.arange(10)
print(a)
[0 1 2 3 4 5 6 7 8 9]
In [64]:
b = a
b is a
Out[64]:
True

Since b and a are the same, changes to the shape of one are reflected in the other -- no copy is made.

In [65]:
b.shape = (2,5)
print(b)
a.shape
[[0 1 2 3 4]
 [5 6 7 8 9]]
Out[65]:
(2, 5)
In [66]:
b is a
Out[66]:
True
In [67]:
print(a)
[[0 1 2 3 4]
 [5 6 7 8 9]]

A shallow copy creates a new view into the array -- the data is the same, but the array properties can be different.

In [71]:
a = np.arange(12)
c = a[:]
a.shape = (3, 4)

print(a)
print(c)
type(c)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[ 0  1  2  3  4  5  6  7  8  9 10 11]
Out[71]:
numpy.ndarray

Since the underlying data is the same memory, changing an element of one is reflected in the other.

In [72]:
c[1] = -1
print(a)
[[ 0 -1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
In [73]:
d = c[3:8]
print(d)
[3 4 5 6 7]
In [74]:
d[:] =0 
In [75]:
print(a)
print(c)
print(d)
[[ 0 -1  2  0]
 [ 0  0  0  0]
 [ 8  9 10 11]]
[ 0 -1  2  0  0  0  0  0  8  9 10 11]
[0 0 0 0 0]
In [76]:
print(c is a)
print(c.base is b.base)
print(c.flags.owndata)
print(a.flags.owndata)
False
False
False
True

To make a copy of the data of the array that you can deal with independently of the original, you need a deep copy.

In [77]:
d = a.copy()
d[:, :] = 0.0

print(a)
print(d)
[[ 0 -1  2  0]
 [ 0  0  0  0]
 [ 8  9 10 11]]
[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

Boolean Indexing¶

In [78]:
a = np.arange(12).reshape(3, 4)
a
Out[78]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [79]:
a[a > 4] = 0
a
Out[79]:
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])
In [80]:
a[a == 0] = -1
a
Out[80]:
array([[-1,  1,  2,  3],
       [ 4, -1, -1, -1],
       [-1, -1, -1, -1]])

If we have 2 tests, we need to use logical_and() or logical_or().

In [81]:
a = np.arange(12).reshape(3, 4)
a[np.logical_and(a > 3, a <= 9)] = 0.0
a
Out[81]:
array([[ 0,  1,  2,  3],
       [ 0,  0,  0,  0],
       [ 0,  0, 10, 11]])
In [82]:
a > 4
Out[82]:
array([[False, False, False, False],
       [False, False, False, False],
       [False, False,  True,  True]])

Avoiding Loops¶

In general, you want to avoid loops over elements of an array. Here we look at a 2-d Gaussian and create an average over angles.

Start by initialize coordinate arrays and a Gaussian function.

In [8]:
# create 1-d x and y arrays -- we define the coordinate values such that
# they are centered in the bin
N = 32
xmin = ymin = 0.0
xmax = ymax = 1.0

dx = (xmax - xmin)/N
x = np.linspace(xmin, xmax, N, endpoint=False) + 0.5*dx
y = x.copy()
In [9]:
x2d = np.repeat(x, N)
x2d.shape = (N, N)

y2d = np.repeat(y, N)
y2d.shape = (N, N)
y2d = np.transpose(y2d)

print(x2d)
print(y2d)
[[0.015625 0.015625 0.015625 ... 0.015625 0.015625 0.015625]
 [0.046875 0.046875 0.046875 ... 0.046875 0.046875 0.046875]
 [0.078125 0.078125 0.078125 ... 0.078125 0.078125 0.078125]
 ...
 [0.921875 0.921875 0.921875 ... 0.921875 0.921875 0.921875]
 [0.953125 0.953125 0.953125 ... 0.953125 0.953125 0.953125]
 [0.984375 0.984375 0.984375 ... 0.984375 0.984375 0.984375]]
[[0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]
 [0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]
 [0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]
 ...
 [0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]
 [0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]
 [0.015625 0.046875 0.078125 ... 0.921875 0.953125 0.984375]]
Now initialize an array with a Gaussian
In [10]:
g = np.exp(-((x2d-0.5)**2 + (y2d-0.5)**2)/0.2**2)
print(g)
[[8.04100059e-06 1.67261841e-05 3.31343050e-05 ... 3.31343050e-05
  1.67261841e-05 8.04100059e-06]
 [1.67261841e-05 3.47923410e-05 6.89230749e-05 ... 6.89230749e-05
  3.47923410e-05 1.67261841e-05]
 [3.31343050e-05 6.89230749e-05 1.36535517e-04 ... 1.36535517e-04
  6.89230749e-05 3.31343050e-05]
 ...
 [3.31343050e-05 6.89230749e-05 1.36535517e-04 ... 1.36535517e-04
  6.89230749e-05 3.31343050e-05]
 [1.67261841e-05 3.47923410e-05 6.89230749e-05 ... 6.89230749e-05
  3.47923410e-05 1.67261841e-05]
 [8.04100059e-06 1.67261841e-05 3.31343050e-05 ... 3.31343050e-05
  1.67261841e-05 8.04100059e-06]]
In [6]:
import matplotlib.pylab as plt
%matplotlib inline
In [11]:
plt.imshow(g, interpolation="nearest")
Out[11]:
<matplotlib.image.AxesImage at 0x1e0e146d420>
In [12]:
A = np.fromfunction(lambda i,j:i+j+2, (3,3), dtype=float)
In [13]:
print(A)
[[2. 3. 4.]
 [3. 4. 5.]
 [4. 5. 6.]]
In [20]:
B = np.matrix(A)
B
Out[20]:
matrix([[2., 3., 4.],
        [3., 4., 5.],
        [4., 5., 6.]])
In [22]:
B*B
Out[22]:
matrix([[29., 38., 47.],
        [38., 50., 62.],
        [47., 62., 77.]])
In [16]:
A*A
Out[16]:
array([[ 4.,  9., 16.],
       [ 9., 16., 25.],
       [16., 25., 36.]])
In [ ]:
 

'Python' 카테고리의 다른 글

12. matplotlib-basics  (0) 2024.03.14
11. sympy-examples  (0) 2024.03.14
9. Regular Expression  (0) 2024.03.14
8. Text File I/O  (0) 2024.03.14
7. Lambda Functions  (0) 2024.03.14
'Python' 카테고리의 다른 글
  • 12. matplotlib-basics
  • 11. sympy-examples
  • 9. Regular Expression
  • 8. Text File I/O
Juson
Juson
  • Juson
    Juson의 데이터 공부
    Juson
  • 전체
    오늘
    어제
    • 분류 전체보기 (95)
      • RAG (2)
      • AI (2)
        • NLP (0)
        • Generative Model (0)
        • Deep Reinforcement Learning (2)
        • LLM (0)
      • Logistic Optimization (0)
      • Machine Learning (37)
        • Linear Regression (2)
        • Logistic Regression (2)
        • Decision Tree (5)
        • Naive Bayes (1)
        • KNN (2)
        • SVM (2)
        • Clustering (4)
        • Dimension Reduction (3)
        • Boosting (6)
        • Abnomaly Detection (2)
        • Recommendation (4)
        • Embedding & NLP (4)
      • Reinforcement Learning (5)
      • Deep Learning (10)
        • Deep learning Bacis Mathema.. (10)
      • Optimization (2)
        • OR Optimization (0)
        • Convex Optimization (0)
        • Integer Optimization (0)
      • SNA 분석 (0)
      • 포트폴리오 최적화 공부 (0)
        • 최적화 기법 (0)
        • 금융 베이스 (0)
      • Finanancial engineering (0)
      • 프로그래머스 데브코스(Boot camp) (15)
        • SQL (9)
        • Python (5)
        • Machine Learning (1)
      • Python (22)
      • Project (0)
  • 블로그 메뉴

    • 홈
    • 태그
    • 방명록
  • 링크

  • 공지사항

  • 인기 글

  • 태그

  • 최근 댓글

  • 최근 글

  • hELLO· Designed By정상우.v4.10.4
Juson
10. NumPy tutorial
상단으로

티스토리툴바