13scipy-optimization

Optimization¶

In [1]:

import math
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import Image

The optimization problem is to find maxima and minima of an objective function. Optimization is often referred to as minimization, because maximization can be formulated as minimizing its negative value.

Optimization is closely related to root finding: because at the minima the slope is zero for smooth functions. In general, it only finds a local extremum.

Univariate optimization¶

Let's first look at how to find the minima of a simple function of a single variable:

In [2]:

def f(x):
    return x**4 + 4*x**3 + (x-2)**2

In [3]:

fig, ax  = plt.subplots()
x = np.linspace(-5, 3, 100)
ax.plot(x, f(x));

The minima are at critical points where the first derivative is zero.

For simple problems, this could be solved analytically.

In [4]:

import sympy

In [5]:

x = sympy.Symbol("x")
objfunc = x**4 + 4*x**3 + (x-2)**2; objfunc

Out[5]:

$\displaystyle x^{4} + 4 x^{3} + \left(x - 2\right)^{2}$

In [6]:

objfunc.diff(x)

Out[6]:

$\displaystyle 4 x^{3} + 12 x^{2} + 2 x - 4$

In [7]:

sols = sympy.solve(objfunc.diff(x), x); sols

Out[7]:

[-1 - 5/(2*(-1/2 - sqrt(3)*I/2)*(27/4 + 3*sqrt(669)*I/4)**(1/3)) - (-1/2 - sqrt(3)*I/2)*(27/4 + 3*sqrt(669)*I/4)**(1/3)/3,
 -1 - (-1/2 + sqrt(3)*I/2)*(27/4 + 3*sqrt(669)*I/4)**(1/3)/3 - 5/(2*(-1/2 + sqrt(3)*I/2)*(27/4 + 3*sqrt(669)*I/4)**(1/3)),
 -1 - (27/4 + 3*sqrt(669)*I/4)**(1/3)/3 - 5/(2*(27/4 + 3*sqrt(669)*I/4)**(1/3))]

In [8]:

sols[0].evalf(chop=True)

Out[8]:

$\displaystyle -0.796635786203095$

In [9]:

sols[1].evalf(chop=True)

Out[9]:

$\displaystyle 0.469617434058037$

In [10]:

sols[2].evalf(chop=True)

Out[10]:

$\displaystyle -2.67298164785494$

At local minimum, the second derivative is positive. In general, we may need to perform second-derivative test or compare the solutions at the critical points.

In [11]:

objfunc.diff(x, 2).evalf(subs={x:sols[0]}, chop=True)

Out[11]:

$\displaystyle -9.5037159585612$

In [12]:

objfunc.diff(x, 2).evalf(subs={x:sols[1]}, chop=True)

Out[12]:

$\displaystyle 15.9173048298479$

In [13]:

objfunc.diff(x, 2).evalf(subs={x:sols[2]}, chop=True)

Out[13]:

$\displaystyle 23.5864111287133$

We can use various functions in scipy.optimize to find a minimum (or more precisely, to find a critical point).

For univariate problems, the main function is scipy.optimize.fminbound, which finds the minimum within an interval.

In [14]:

import scipy.optimize as optimize

In [15]:

optimize.fminbound?

Signature:
optimize.fminbound(
    func,
    x1,
    x2,
    args=(),
    xtol=1e-05,
    maxfun=500,
    full_output=0,
    disp=1,
)
Docstring:
Bounded minimization for scalar functions.

Parameters
----------
func : callable f(x,*args)
    Objective function to be minimized (must accept and return scalars).
x1, x2 : float or array scalar
    The optimization bounds.
args : tuple, optional
    Extra arguments passed to function.
xtol : float, optional
    The convergence tolerance.
maxfun : int, optional
    Maximum number of function evaluations allowed.
full_output : bool, optional
    If True, return optional outputs.
disp : int, optional
    If non-zero, print messages.
        0 : no message printing.
        1 : non-convergence notification messages only.
        2 : print a message on convergence too.
        3 : print iteration results.


Returns
-------
xopt : ndarray
    Parameters (over given interval) which minimize the
    objective function.
fval : number
    The function value at the minimum point.
ierr : int
    An error flag (0 if converged, 1 if maximum number of
    function calls reached).
numfunc : int
  The number of function calls made.

See also
--------
minimize_scalar: Interface to minimization algorithms for scalar
    univariate functions. See the 'Bounded' `method` in particular.

Notes
-----
Finds a local minimizer of the scalar function `func` in the
interval x1 < xopt < x2 using Brent's method. (See `brent`
for auto-bracketing.)

Examples
--------
`fminbound` finds the minimum of the function in the given range.
The following examples illustrate the same

>>> def f(x):
...     return x**2

>>> from scipy import optimize

>>> minimum = optimize.fminbound(f, -1, 2)
>>> minimum
0.0
>>> minimum = optimize.fminbound(f, 1, 2)
>>> minimum
1.0000059608609866
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

In [16]:

optimize.fminbound(f, -4, -2)

Out[16]:

-2.672982383464524

In [17]:

optimize.fminbound(f, 0, 2)

Out[17]:

0.46961757409234745

In [18]:

def negf(x):
    return -f(x)
optimize.fminbound(negf, -2, 0)

Out[18]:

-0.7966372403655028

Alternatively, we could use scipy.optimize.minimize_scalar for automatic bracketing.

In [19]:

x = optimize.minimize_scalar(f); x

Out[19]:

     fun: 2.804987644868733
 message: '\nOptimization terminated successfully;\nThe returned value satisfies the termination criteria\n(using xtol = 1.48e-08 )'
    nfev: 17
     nit: 12
 success: True
       x: 0.46961742708176163

However, automatic bracketing is not fool proof. For example,

In [20]:

def f(r):
    return 2 * np.pi * r**2 + 2 / r

r = np.linspace(1.e-20, 2, 100)
fig, ax = plt.subplots(figsize=(8, 4))

ax.plot(r, f(r), lw=2, color='b')
ax.set_title(r"$f(r) = 2\pi r^2+2/r$", fontsize=18)
ax.set_xlabel(r"$r$", fontsize=18)
ax.set_xticks([0, 0.5, 1, 1.5, 2])
ax.set_ylim(0, 30)
fig.tight_layout()

In [21]:

try:
    x = optimize.minimize_scalar(f)
except Exception as e:
    print(e)

float division by zero

In [22]:

output = optimize.minimize_scalar(f, bracket=(0.1, 4)); output.x

Out[22]:

0.5419260772557135

Under the hood (Optional)¶

Golden section search¶

The golden-section search for minimization is analogous to the bisection method for root finding. It starts with an interval with opposite signs of slopes, and repeatedly shrinks the interval until the length of interval is within some tolerance.

The following code implements the method.

In [23]:

def golden_section(f, a, b, tol):
    ''' 
    finds a minimum of f(x) within the interval [a,b] using the
    golden-section search. It assumes f has negative slope
    at a and positive slope at b.
    '''

    tau = (np.sqrt(5) - 1) / 2
    x1 = a + (1 - tau) * (b - a)
    x2 = a + tau * (b - a)
    f1, f2 = f(x1), f(x2)

    k = 0
    while b-a > tol:
        if f1 > f2:
            a = x1 
            x1 = x2
            f1 = f2
            x2 = a + tau * (b-a)
            f2 = f(x2)
        else:
            b = x2
            x2 = x1
            f2 = f1
            x1 = a + (1 - tau) * (b-a)
            f1 = f(x1)

        k = k + 1

    x = (a + b) / 2;

    return x, k

In [24]:

x, k = golden_section(f, 0.1, 4, 1.e-12); x

Out[24]:

0.5419260716085812

In [25]:

Out[25]:

Successive Parabolic Interpolation¶

Successive parabolic interpolation for minimization is analogous to the secant method to root finding. It uses three points to construct a quadratic interpolation. It converges faster than golden_section search.

The following code illustrates the idea:

In [26]:

def spi(f, a, b, tol):
    'SPI minimizes f(x) using successive parabolic interpolation.'

    maxiter = 100

    xs = np.array([a, (a+b)/2, b])
    fx = f(xs)

    for k in range(1, maxiter+1):
        # For simplicity, we use numpy.polyfit for constructing quadratic polynomial
        p = np.polyfit(xs, fx, 2)
        x = -0.5*p[1] / p[0]

        xs = np.array([xs[1], xs[2], x])
        fx = np.array([fx[1], fx[2], f(x)])

        if abs(xs[2]-xs[1]) < tol * abs(xs[1]):
            break

    x = xs[2]
    return x, k

In [27]:

np.polyfit?

Signature: np.polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)
Docstring:
Least squares polynomial fit.

.. note::
   This forms part of the old polynomial API. Since version 1.4, the
   new polynomial API defined in `numpy.polynomial` is preferred.
   A summary of the differences can be found in the
   :doc:`transition guide </reference/routines.polynomials>`.

Fit a polynomial ``p(x) = p[0] * x**deg + ... + p[deg]`` of degree `deg`
to points `(x, y)`. Returns a vector of coefficients `p` that minimises
the squared error in the order `deg`, `deg-1`, ... `0`.

The `Polynomial.fit <numpy.polynomial.polynomial.Polynomial.fit>` class
method is recommended for new code as it is more stable numerically. See
the documentation of the method for more information.

Parameters
----------
x : array_like, shape (M,)
    x-coordinates of the M sample points ``(x[i], y[i])``.
y : array_like, shape (M,) or (M, K)
    y-coordinates of the sample points. Several data sets of sample
    points sharing the same x-coordinates can be fitted at once by
    passing in a 2D-array that contains one dataset per column.
deg : int
    Degree of the fitting polynomial
rcond : float, optional
    Relative condition number of the fit. Singular values smaller than
    this relative to the largest singular value will be ignored. The
    default value is len(x)*eps, where eps is the relative precision of
    the float type, about 2e-16 in most cases.
full : bool, optional
    Switch determining nature of return value. When it is False (the
    default) just the coefficients are returned, when True diagnostic
    information from the singular value decomposition is also returned.
w : array_like, shape (M,), optional
    Weights. If not None, the weight ``w[i]`` applies to the unsquared
    residual ``y[i] - y_hat[i]`` at ``x[i]``. Ideally the weights are
    chosen so that the errors of the products ``w[i]*y[i]`` all have the
    same variance.  When using inverse-variance weighting, use
    ``w[i] = 1/sigma(y[i])``.  The default value is None.
cov : bool or str, optional
    If given and not `False`, return not just the estimate but also its
    covariance matrix. By default, the covariance are scaled by
    chi2/dof, where dof = M - (deg + 1), i.e., the weights are presumed
    to be unreliable except in a relative sense and everything is scaled
    such that the reduced chi2 is unity. This scaling is omitted if
    ``cov='unscaled'``, as is relevant for the case that the weights are
    w = 1/sigma, with sigma known to be a reliable estimate of the
    uncertainty.

Returns
-------
p : ndarray, shape (deg + 1,) or (deg + 1, K)
    Polynomial coefficients, highest power first.  If `y` was 2-D, the
    coefficients for `k`-th data set are in ``p[:,k]``.

residuals, rank, singular_values, rcond
    These values are only returned if ``full == True``

    - residuals -- sum of squared residuals of the least squares fit
    - rank -- the effective rank of the scaled Vandermonde
       coefficient matrix
    - singular_values -- singular values of the scaled Vandermonde
       coefficient matrix
    - rcond -- value of `rcond`.

    For more details, see `numpy.linalg.lstsq`.

V : ndarray, shape (M,M) or (M,M,K)
    Present only if ``full == False`` and ``cov == True``.  The covariance
    matrix of the polynomial coefficient estimates.  The diagonal of
    this matrix are the variance estimates for each coefficient.  If y
    is a 2-D array, then the covariance matrix for the `k`-th data set
    are in ``V[:,:,k]``


Warns
-----
RankWarning
    The rank of the coefficient matrix in the least-squares fit is
    deficient. The warning is only raised if ``full == False``.

    The warnings can be turned off by

    >>> import warnings
    >>> warnings.simplefilter('ignore', np.RankWarning)

See Also
--------
polyval : Compute polynomial values.
linalg.lstsq : Computes a least-squares fit.
scipy.interpolate.UnivariateSpline : Computes spline fits.

Notes
-----
The solution minimizes the squared error

.. math::
    E = \sum_{j=0}^k |p(x_j) - y_j|^2

in the equations::

    x[0]**n * p[0] + ... + x[0] * p[n-1] + p[n] = y[0]
    x[1]**n * p[0] + ... + x[1] * p[n-1] + p[n] = y[1]
    ...
    x[k]**n * p[0] + ... + x[k] * p[n-1] + p[n] = y[k]

The coefficient matrix of the coefficients `p` is a Vandermonde matrix.

`polyfit` issues a `RankWarning` when the least-squares fit is badly
conditioned. This implies that the best fit is not well-defined due
to numerical error. The results may be improved by lowering the polynomial
degree or by replacing `x` by `x` - `x`.mean(). The `rcond` parameter
can also be set to a value smaller than its default, but the resulting
fit may be spurious: including contributions from the small singular
values can add numerical noise to the result.

Note that fitting polynomial coefficients is inherently badly conditioned
when the degree of the polynomial is large or the interval of sample points
is badly centered. The quality of the fit should always be checked in these
cases. When polynomial fits are not satisfactory, splines may be a good
alternative.

References
----------
.. [1] Wikipedia, "Curve fitting",
       https://en.wikipedia.org/wiki/Curve_fitting
.. [2] Wikipedia, "Polynomial interpolation",
       https://en.wikipedia.org/wiki/Polynomial_interpolation

Examples
--------
>>> import warnings
>>> x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
>>> y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
>>> z = np.polyfit(x, y, 3)
>>> z
array([ 0.08703704, -0.81349206,  1.69312169, -0.03968254]) # may vary

It is convenient to use `poly1d` objects for dealing with polynomials:

>>> p = np.poly1d(z)
>>> p(0.5)
0.6143849206349179 # may vary
>>> p(3.5)
-0.34732142857143039 # may vary
>>> p(10)
22.579365079365115 # may vary

High-order polynomials may oscillate wildly:

>>> with warnings.catch_warnings():
...     warnings.simplefilter('ignore', np.RankWarning)
...     p30 = np.poly1d(np.polyfit(x, y, 30))
...
>>> p30(4)
-0.80000000000000204 # may vary
>>> p30(5)
-0.99999999999999445 # may vary
>>> p30(4.5)
-0.10547061179440398 # may vary

Illustration:

>>> import matplotlib.pyplot as plt
>>> xp = np.linspace(-2, 6, 100)
>>> _ = plt.plot(x, y, '.', xp, p(xp), '-', xp, p30(xp), '--')
>>> plt.ylim(-2,2)
(-2, 2)
>>> plt.show()
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\numpy\lib\polynomial.py
Type:      function

In [28]:

x, k = spi(f, 0.1, 4, 1.e-6); x

Out[28]:

0.5419260701665629

Hybrid Method for Efficiency and Robustness¶

Most robust 1-D optimization methods combine the golden-section search and successive parabolic interpolation. This method was due to Richard Brent, and hence it is sometimes referred to as Brent's method. In SciPy, it can be accessed through brent or minimize_scalar.

In [29]:

optimize.brent(f, brack=(0.1, 4))

Out[29]:

0.5419260772557135

In [30]:

x = optimize.minimize_scalar(f, bracket=(0.1, 4), method='Brent'); x

Out[30]:

     fun: 5.535810445932086
 message: '\nOptimization terminated successfully;\nThe returned value satisfies the termination criteria\n(using xtol = 1.48e-08 )'
    nfev: 19
     nit: 15
 success: True
       x: 0.5419260772557135

Multivariate optimization¶

Multivariate optimization is more difficult than univariate optimization.

At a conceptual level, multivariate optimization involves the graidient and Hessian of a function.

In [31]:

x1, x2 = sympy.symbols("x_1, x_2")

In [32]:

f_sym = (x1-1)**4 + 5 * (x2-1)**2 - 2*x1*x2; f_sym

Out[32]:

$\displaystyle - 2 x_{1} x_{2} + \left(x_{1} - 1\right)^{4} + 5 \left(x_{2} - 1\right)^{2}$

In [33]:

fprime_sym = [f_sym.diff(x1), f_sym.diff(x2)]; fprime_sym

Out[33]:

[-2*x_2 + 4*(x_1 - 1)**3, -2*x_1 + 10*x_2 - 10]

In [34]:

# Gradient
fprime_sym = [f_sym.diff(x_) for x_ in (x1, x2)]; sympy.Matrix(fprime_sym)

Out[34]:

$\displaystyle \left[\begin{matrix}- 2 x_{2} + 4 \left(x_{1} - 1\right)^{3}\\- 2 x_{1} + 10 x_{2} - 10\end{matrix}\right]$

In [35]:

sols = sympy.solve(fprime_sym); sols[0]

Out[35]:

{x_1: 30**(1/3)*(sqrt(72870)/30 + 9 + (1 + (30*sqrt(72870) + 8100)**(1/3))*(30*sqrt(72870) + 8100)**(1/3)/30)/(sqrt(72870) + 270)**(2/3),
 x_2: 1/(750*(sqrt(72870)/112500 + 3/1250)**(1/3)) + (sqrt(72870)/112500 + 3/1250)**(1/3) + 6/5}

In [36]:

X_opt = [sols[0][x1].evalf(), sols[0][x2].evalf()]
X_opt

Out[36]:

[1.88292612929632, 1.37658522585926]

In [37]:

# plot the contour of the function
f_lmbda = sympy.lambdify((x1, x2), f_sym, 'numpy')

fig, ax = plt.subplots(figsize=(6, 4))
x_ = y_ = np.linspace(-1, 4, 100)
X, Y = np.meshgrid(x_, y_)
c = ax.contour(X, Y, f_lmbda(X, Y), 50)
ax.plot(X_opt[0], X_opt[1], 'r*', markersize=15)
ax.set_xlabel(r"$x_1$", fontsize=18)
ax.set_ylabel(r"$x_2$", fontsize=18)
plt.colorbar(c, ax=ax)
fig.tight_layout()

At the minimum, the eigenvalues of the Hessian matrix should be positive.

In [38]:

# Hessian
fhess_sym = [[f_sym.diff(x1_, x2_) for x1_ in (x1, x2)] for x2_ in (x1, x2)]; sympy.Matrix(fhess_sym)

Out[38]:

$\displaystyle \left[\begin{matrix}12 \left(x_{1} - 1\right)^{2} & -2\\-2 & 10\end{matrix}\right]$

In [39]:

H = np.array([[fhess_sym[0][0].subs({x1:X_opt[0], x2:X_opt[1]}), 
               fhess_sym[1][0].subs({x1:X_opt[0], x2:X_opt[1]})],
              [fhess_sym[0][1].subs({x1:X_opt[0], x2:X_opt[1]}), 
               fhess_sym[1][1].subs({x1:X_opt[0], x2:X_opt[1]})]], 
             dtype=float); H

Out[39]:

array([[ 9.3547026, -2.       ],
       [-2.       , 10.       ]])

In [40]:

np.linalg.eigvals(H)

Out[40]:

array([ 7.65149292, 11.70320968])

Numerical solution¶

The optimize package provides a few functions to for multivariate minimization.

fmin uses the downhill simplex algorithm, which is gradient free (i.e., it does not require derivatives).

In [41]:

optimize.fmin?

Signature:
optimize.fmin(
    func,
    x0,
    args=(),
    xtol=0.0001,
    ftol=0.0001,
    maxiter=None,
    maxfun=None,
    full_output=0,
    disp=1,
    retall=0,
    callback=None,
    initial_simplex=None,
)
Docstring:
Minimize a function using the downhill simplex algorithm.

This algorithm only uses function values, not derivatives or second
derivatives.

Parameters
----------
func : callable func(x,*args)
    The objective function to be minimized.
x0 : ndarray
    Initial guess.
args : tuple, optional
    Extra arguments passed to func, i.e., ``f(x,*args)``.
xtol : float, optional
    Absolute error in xopt between iterations that is acceptable for
    convergence.
ftol : number, optional
    Absolute error in func(xopt) between iterations that is acceptable for
    convergence.
maxiter : int, optional
    Maximum number of iterations to perform.
maxfun : number, optional
    Maximum number of function evaluations to make.
full_output : bool, optional
    Set to True if fopt and warnflag outputs are desired.
disp : bool, optional
    Set to True to print convergence messages.
retall : bool, optional
    Set to True to return list of solutions at each iteration.
callback : callable, optional
    Called after each iteration, as callback(xk), where xk is the
    current parameter vector.
initial_simplex : array_like of shape (N + 1, N), optional
    Initial simplex. If given, overrides `x0`.
    ``initial_simplex[j,:]`` should contain the coordinates of
    the jth vertex of the ``N+1`` vertices in the simplex, where
    ``N`` is the dimension.

Returns
-------
xopt : ndarray
    Parameter that minimizes function.
fopt : float
    Value of function at minimum: ``fopt = func(xopt)``.
iter : int
    Number of iterations performed.
funcalls : int
    Number of function calls made.
warnflag : int
    1 : Maximum number of function evaluations made.
    2 : Maximum number of iterations reached.
allvecs : list
    Solution at each iteration.

See also
--------
minimize: Interface to minimization algorithms for multivariate
    functions. See the 'Nelder-Mead' `method` in particular.

Notes
-----
Uses a Nelder-Mead simplex algorithm to find the minimum of function of
one or more variables.

This algorithm has a long history of successful use in applications.
But it will usually be slower than an algorithm that uses first or
second derivative information. In practice, it can have poor
performance in high-dimensional problems and is not robust to
minimizing complicated functions. Additionally, there currently is no
complete theory describing when the algorithm will successfully
converge to the minimum, or how fast it will if it does. Both the ftol and
xtol criteria must be met for convergence.

Examples
--------
>>> def f(x):
...     return x**2

>>> from scipy import optimize

>>> minimum = optimize.fmin(f, 1)
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 17
         Function evaluations: 34
>>> minimum[0]
-8.8817841970012523e-16

References
----------
.. [1] Nelder, J.A. and Mead, R. (1965), "A simplex method for function
       minimization", The Computer Journal, 7, pp. 308-313

.. [2] Wright, M.H. (1996), "Direct Search Methods: Once Scorned, Now
       Respectable", in Numerical Analysis 1995, Proceedings of the
       1995 Dundee Biennial Conference in Numerical Analysis, D.F.
       Griffiths and G.A. Watson (Eds.), Addison Wesley Longman,
       Harlow, UK, pp. 191-208.
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

In [42]:

def func_XY_to_X_Y(f):
    """
    Wrapper for f(X) -> f(X[0], X[1])
    """
    return lambda X: np.array(f(X[0], X[1]))

f_lmbda = sympy.lambdify((x1, x2), f_sym, 'numpy')
f = func_XY_to_X_Y(f_lmbda)

In [43]:

optimize.fmin(f, (0,0))

Optimization terminated successfully.
         Current function value: -3.867223
         Iterations: 68
         Function evaluations: 128

Out[43]:

array([1.88296217, 1.37657705])

Newton's method requires the Hessian matrix. This is offered as fmin_ncg, which stands for Newton-CG method.

In [44]:

optimize.fmin_ncg?

Signature:
optimize.fmin_ncg(
    f,
    x0,
    fprime,
    fhess_p=None,
    fhess=None,
    args=(),
    avextol=1e-05,
    epsilon=1.4901161193847656e-08,
    maxiter=None,
    full_output=0,
    disp=1,
    retall=0,
    callback=None,
)
Docstring:
Unconstrained minimization of a function using the Newton-CG method.

Parameters
----------
f : callable ``f(x, *args)``
    Objective function to be minimized.
x0 : ndarray
    Initial guess.
fprime : callable ``f'(x, *args)``
    Gradient of f.
fhess_p : callable ``fhess_p(x, p, *args)``, optional
    Function which computes the Hessian of f times an
    arbitrary vector, p.
fhess : callable ``fhess(x, *args)``, optional
    Function to compute the Hessian matrix of f.
args : tuple, optional
    Extra arguments passed to f, fprime, fhess_p, and fhess
    (the same set of extra arguments is supplied to all of
    these functions).
epsilon : float or ndarray, optional
    If fhess is approximated, use this value for the step size.
callback : callable, optional
    An optional user-supplied function which is called after
    each iteration. Called as callback(xk), where xk is the
    current parameter vector.
avextol : float, optional
    Convergence is assumed when the average relative error in
    the minimizer falls below this amount.
maxiter : int, optional
    Maximum number of iterations to perform.
full_output : bool, optional
    If True, return the optional outputs.
disp : bool, optional
    If True, print convergence message.
retall : bool, optional
    If True, return a list of results at each iteration.

Returns
-------
xopt : ndarray
    Parameters which minimize f, i.e., ``f(xopt) == fopt``.
fopt : float
    Value of the function at xopt, i.e., ``fopt = f(xopt)``.
fcalls : int
    Number of function calls made.
gcalls : int
    Number of gradient calls made.
hcalls : int
    Number of Hessian calls made.
warnflag : int
    Warnings generated by the algorithm.
    1 : Maximum number of iterations exceeded.
    2 : Line search failure (precision loss).
    3 : NaN result encountered.
allvecs : list
    The result at each iteration, if retall is True (see below).

See also
--------
minimize: Interface to minimization algorithms for multivariate
    functions. See the 'Newton-CG' `method` in particular.

Notes
-----
Only one of `fhess_p` or `fhess` need to be given.  If `fhess`
is provided, then `fhess_p` will be ignored. If neither `fhess`
nor `fhess_p` is provided, then the hessian product will be
approximated using finite differences on `fprime`. `fhess_p`
must compute the hessian times an arbitrary vector. If it is not
given, finite-differences on `fprime` are used to compute
it.

Newton-CG methods are also called truncated Newton methods. This
function differs from scipy.optimize.fmin_tnc because

1. scipy.optimize.fmin_ncg is written purely in Python using NumPy
    and scipy while scipy.optimize.fmin_tnc calls a C function.
2. scipy.optimize.fmin_ncg is only for unconstrained minimization
    while scipy.optimize.fmin_tnc is for unconstrained minimization
    or box constrained minimization. (Box constraints give
    lower and upper bounds for each variable separately.)

References
----------
Wright & Nocedal, 'Numerical Optimization', 1999, p. 140.
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

In [45]:

fprime_lmbda = sympy.lambdify((x1, x2), fprime_sym, 'numpy')
fhess_lmbda = sympy.lambdify((x1, x2), fhess_sym, 'numpy')

fprime = func_XY_to_X_Y(fprime_lmbda)
fhess = func_XY_to_X_Y(fhess_lmbda)

In [46]:

optimize.fmin_ncg(f, (0,0), fprime=fprime, fhess=fhess)

Optimization terminated successfully.
         Current function value: -3.867223
         Iterations: 8
         Function evaluations: 10
         Gradient evaluations: 10
         Hessian evaluations: 8

Out[46]:

array([1.88292613, 1.37658523])

We could also use the fmin_bfgs or fmin_cg methods, which can take advantage of the gradient vectors.

In [47]:

optimize.fmin_bfgs(f, (0,0), fprime=fprime)

Optimization terminated successfully.
         Current function value: -3.867223
         Iterations: 9
         Function evaluations: 13
         Gradient evaluations: 13

Out[47]:

array([1.88292645, 1.37658596])

In [48]:

optimize.fmin_bfgs?

Signature:
optimize.fmin_bfgs(
    f,
    x0,
    fprime=None,
    args=(),
    gtol=1e-05,
    norm=inf,
    epsilon=1.4901161193847656e-08,
    maxiter=None,
    full_output=0,
    disp=1,
    retall=0,
    callback=None,
)
Docstring:
Minimize a function using the BFGS algorithm.

Parameters
----------
f : callable ``f(x,*args)``
    Objective function to be minimized.
x0 : ndarray
    Initial guess.
fprime : callable ``f'(x,*args)``, optional
    Gradient of f.
args : tuple, optional
    Extra arguments passed to f and fprime.
gtol : float, optional
    Gradient norm must be less than `gtol` before successful termination.
norm : float, optional
    Order of norm (Inf is max, -Inf is min)
epsilon : int or ndarray, optional
    If `fprime` is approximated, use this value for the step size.
callback : callable, optional
    An optional user-supplied function to call after each
    iteration. Called as ``callback(xk)``, where ``xk`` is the
    current parameter vector.
maxiter : int, optional
    Maximum number of iterations to perform.
full_output : bool, optional
    If True, return ``fopt``, ``func_calls``, ``grad_calls``, and
    ``warnflag`` in addition to ``xopt``.
disp : bool, optional
    Print convergence message if True.
retall : bool, optional
    Return a list of results at each iteration if True.

Returns
-------
xopt : ndarray
    Parameters which minimize f, i.e., ``f(xopt) == fopt``.
fopt : float
    Minimum value.
gopt : ndarray
    Value of gradient at minimum, f'(xopt), which should be near 0.
Bopt : ndarray
    Value of 1/f''(xopt), i.e., the inverse Hessian matrix.
func_calls : int
    Number of function_calls made.
grad_calls : int
    Number of gradient calls made.
warnflag : integer
    1 : Maximum number of iterations exceeded.
    2 : Gradient and/or function calls not changing.
    3 : NaN result encountered.
allvecs : list
    The value of `xopt` at each iteration. Only returned if `retall` is
    True.

Notes
-----
Optimize the function, `f`, whose gradient is given by `fprime`
using the quasi-Newton method of Broyden, Fletcher, Goldfarb,
and Shanno (BFGS).

See Also
--------
minimize: Interface to minimization algorithms for multivariate
    functions. See ``method='BFGS'`` in particular.

References
----------
Wright, and Nocedal 'Numerical Optimization', 1999, p. 198.

Examples
--------
>>> from scipy.optimize import fmin_bfgs
>>> def quadratic_cost(x, Q):
...     return x @ Q @ x
...
>>> x0 = np.array([-3, -4])
>>> cost_weight =  np.diag([1., 10.])
>>> # Note that a trailing comma is necessary for a tuple with single element
>>> fmin_bfgs(quadratic_cost, x0, args=(cost_weight,))
Optimization terminated successfully.
        Current function value: 0.000000
        Iterations: 7                   # may vary
        Function evaluations: 24        # may vary
        Gradient evaluations: 8         # may vary
array([ 2.85169950e-06, -4.61820139e-07])

>>> def quadratic_cost_grad(x, Q):
...     return 2 * Q @ x
...
>>> fmin_bfgs(quadratic_cost, x0, quadratic_cost_grad, args=(cost_weight,))
Optimization terminated successfully.
        Current function value: 0.000000
        Iterations: 7
        Function evaluations: 8
        Gradient evaluations: 8
array([ 2.85916637e-06, -4.54371951e-07])
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

In [49]:

optimize.fmin_cg(f, (0,0), fprime=fprime)

Optimization terminated successfully.
         Current function value: -3.867223
         Iterations: 8
         Function evaluations: 18
         Gradient evaluations: 18

Out[49]:

array([1.88292612, 1.37658523])

In [50]:

optimize.fmin_cg?

Signature:
optimize.fmin_cg(
    f,
    x0,
    fprime=None,
    args=(),
    gtol=1e-05,
    norm=inf,
    epsilon=1.4901161193847656e-08,
    maxiter=None,
    full_output=0,
    disp=1,
    retall=0,
    callback=None,
)
Docstring:
Minimize a function using a nonlinear conjugate gradient algorithm.

Parameters
----------
f : callable, ``f(x, *args)``
    Objective function to be minimized. Here `x` must be a 1-D array of
    the variables that are to be changed in the search for a minimum, and
    `args` are the other (fixed) parameters of `f`.
x0 : ndarray
    A user-supplied initial estimate of `xopt`, the optimal value of `x`.
    It must be a 1-D array of values.
fprime : callable, ``fprime(x, *args)``, optional
    A function that returns the gradient of `f` at `x`. Here `x` and `args`
    are as described above for `f`. The returned value must be a 1-D array.
    Defaults to None, in which case the gradient is approximated
    numerically (see `epsilon`, below).
args : tuple, optional
    Parameter values passed to `f` and `fprime`. Must be supplied whenever
    additional fixed parameters are needed to completely specify the
    functions `f` and `fprime`.
gtol : float, optional
    Stop when the norm of the gradient is less than `gtol`.
norm : float, optional
    Order to use for the norm of the gradient
    (``-np.Inf`` is min, ``np.Inf`` is max).
epsilon : float or ndarray, optional
    Step size(s) to use when `fprime` is approximated numerically. Can be a
    scalar or a 1-D array. Defaults to ``sqrt(eps)``, with eps the
    floating point machine precision.  Usually ``sqrt(eps)`` is about
    1.5e-8.
maxiter : int, optional
    Maximum number of iterations to perform. Default is ``200 * len(x0)``.
full_output : bool, optional
    If True, return `fopt`, `func_calls`, `grad_calls`, and `warnflag` in
    addition to `xopt`.  See the Returns section below for additional
    information on optional return values.
disp : bool, optional
    If True, return a convergence message, followed by `xopt`.
retall : bool, optional
    If True, add to the returned values the results of each iteration.
callback : callable, optional
    An optional user-supplied function, called after each iteration.
    Called as ``callback(xk)``, where ``xk`` is the current value of `x0`.

Returns
-------
xopt : ndarray
    Parameters which minimize f, i.e., ``f(xopt) == fopt``.
fopt : float, optional
    Minimum value found, f(xopt). Only returned if `full_output` is True.
func_calls : int, optional
    The number of function_calls made. Only returned if `full_output`
    is True.
grad_calls : int, optional
    The number of gradient calls made. Only returned if `full_output` is
    True.
warnflag : int, optional
    Integer value with warning status, only returned if `full_output` is
    True.

    0 : Success.

    1 : The maximum number of iterations was exceeded.

    2 : Gradient and/or function calls were not changing. May indicate
        that precision was lost, i.e., the routine did not converge.

    3 : NaN result encountered.

allvecs : list of ndarray, optional
    List of arrays, containing the results at each iteration.
    Only returned if `retall` is True.

See Also
--------
minimize : common interface to all `scipy.optimize` algorithms for
           unconstrained and constrained minimization of multivariate
           functions. It provides an alternative way to call
           ``fmin_cg``, by specifying ``method='CG'``.

Notes
-----
This conjugate gradient algorithm is based on that of Polak and Ribiere
[1]_.

Conjugate gradient methods tend to work better when:

1. `f` has a unique global minimizing point, and no local minima or
   other stationary points,
2. `f` is, at least locally, reasonably well approximated by a
   quadratic function of the variables,
3. `f` is continuous and has a continuous gradient,
4. `fprime` is not too large, e.g., has a norm less than 1000,
5. The initial guess, `x0`, is reasonably close to `f` 's global
   minimizing point, `xopt`.

References
----------
.. [1] Wright & Nocedal, "Numerical Optimization", 1999, pp. 120-122.

Examples
--------
Example 1: seek the minimum value of the expression
``a*u**2 + b*u*v + c*v**2 + d*u + e*v + f`` for given values
of the parameters and an initial guess ``(u, v) = (0, 0)``.

>>> args = (2, 3, 7, 8, 9, 10)  # parameter values
>>> def f(x, *args):
...     u, v = x
...     a, b, c, d, e, f = args
...     return a*u**2 + b*u*v + c*v**2 + d*u + e*v + f
>>> def gradf(x, *args):
...     u, v = x
...     a, b, c, d, e, f = args
...     gu = 2*a*u + b*v + d     # u-component of the gradient
...     gv = b*u + 2*c*v + e     # v-component of the gradient
...     return np.asarray((gu, gv))
>>> x0 = np.asarray((0, 0))  # Initial guess.
>>> from scipy import optimize
>>> res1 = optimize.fmin_cg(f, x0, fprime=gradf, args=args)
Optimization terminated successfully.
         Current function value: 1.617021
         Iterations: 4
         Function evaluations: 8
         Gradient evaluations: 8
>>> res1
array([-1.80851064, -0.25531915])

Example 2: solve the same problem using the `minimize` function.
(This `myopts` dictionary shows all of the available options,
although in practice only non-default values would be needed.
The returned value will be a dictionary.)

>>> opts = {'maxiter' : None,    # default value.
...         'disp' : True,    # non-default value.
...         'gtol' : 1e-5,    # default value.
...         'norm' : np.inf,  # default value.
...         'eps' : 1.4901161193847656e-08}  # default value.
>>> res2 = optimize.minimize(f, x0, jac=gradf, args=args,
...                          method='CG', options=opts)
Optimization terminated successfully.
        Current function value: 1.617021
        Iterations: 4
        Function evaluations: 8
        Gradient evaluations: 8
>>> res2.x  # minimum found
array([-1.80851064, -0.25531915])
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

Brute force search for initial point¶

Now consider another function with many local minima.

In [51]:

def f(X):
    x, y = X
    return (4 * np.sin(np.pi * x) + 6 * np.sin(np.pi * y)) + (x - 1)**2 + (y - 1)**2

def func_X_Y_to_XY(f, X, Y):
    s = np.shape(X)
    return f(np.vstack([X.ravel(), Y.ravel()])).reshape(*s)

fig, ax = plt.subplots(figsize=(6, 4))
x_ = y_ = np.linspace(-3, 5, 100)
X, Y = np.meshgrid(x_, y_)
c = ax.contour(X, Y, func_X_Y_to_XY(f, X, Y), 25)
ax.plot(1.47586906, 1.48365787, 'r*', markersize=15)
ax.set_xlabel(r"$x_1$", fontsize=18)
ax.set_ylabel(r"$x_2$", fontsize=18)
plt.colorbar(c, ax=ax)
fig.tight_layout()

To find a good starting point, we could use optimize.brute.

In [52]:

optimize.brute?

Signature:
optimize.brute(
    func,
    ranges,
    args=(),
    Ns=20,
    full_output=0,
    finish=<function fmin at 0x000002EB21A411B0>,
    disp=False,
    workers=1,
)
Docstring:
Minimize a function over a given range by brute force.

Uses the "brute force" method, i.e., computes the function's value
at each point of a multidimensional grid of points, to find the global
minimum of the function.

The function is evaluated everywhere in the range with the datatype of the
first call to the function, as enforced by the ``vectorize`` NumPy
function. The value and type of the function evaluation returned when
``full_output=True`` are affected in addition by the ``finish`` argument
(see Notes).

The brute force approach is inefficient because the number of grid points
increases exponentially - the number of grid points to evaluate is
``Ns ** len(x)``. Consequently, even with coarse grid spacing, even
moderately sized problems can take a long time to run, and/or run into
memory limitations.

Parameters
----------
func : callable
    The objective function to be minimized. Must be in the
    form ``f(x, *args)``, where ``x`` is the argument in
    the form of a 1-D array and ``args`` is a tuple of any
    additional fixed parameters needed to completely specify
    the function.
ranges : tuple
    Each component of the `ranges` tuple must be either a
    "slice object" or a range tuple of the form ``(low, high)``.
    The program uses these to create the grid of points on which
    the objective function will be computed. See `Note 2` for
    more detail.
args : tuple, optional
    Any additional fixed parameters needed to completely specify
    the function.
Ns : int, optional
    Number of grid points along the axes, if not otherwise
    specified. See `Note2`.
full_output : bool, optional
    If True, return the evaluation grid and the objective function's
    values on it.
finish : callable, optional
    An optimization function that is called with the result of brute force
    minimization as initial guess. `finish` should take `func` and
    the initial guess as positional arguments, and take `args` as
    keyword arguments. It may additionally take `full_output`
    and/or `disp` as keyword arguments. Use None if no "polishing"
    function is to be used. See Notes for more details.
disp : bool, optional
    Set to True to print convergence messages from the `finish` callable.
workers : int or map-like callable, optional
    If `workers` is an int the grid is subdivided into `workers`
    sections and evaluated in parallel (uses
    `multiprocessing.Pool <multiprocessing>`).
    Supply `-1` to use all cores available to the Process.
    Alternatively supply a map-like callable, such as
    `multiprocessing.Pool.map` for evaluating the grid in parallel.
    This evaluation is carried out as ``workers(func, iterable)``.
    Requires that `func` be pickleable.

    .. versionadded:: 1.3.0

Returns
-------
x0 : ndarray
    A 1-D array containing the coordinates of a point at which the
    objective function had its minimum value. (See `Note 1` for
    which point is returned.)
fval : float
    Function value at the point `x0`. (Returned when `full_output` is
    True.)
grid : tuple
    Representation of the evaluation grid. It has the same
    length as `x0`. (Returned when `full_output` is True.)
Jout : ndarray
    Function values at each point of the evaluation
    grid, i.e., ``Jout = func(*grid)``. (Returned
    when `full_output` is True.)

See Also
--------
basinhopping, differential_evolution

Notes
-----
*Note 1*: The program finds the gridpoint at which the lowest value
of the objective function occurs. If `finish` is None, that is the
point returned. When the global minimum occurs within (or not very far
outside) the grid's boundaries, and the grid is fine enough, that
point will be in the neighborhood of the global minimum.

However, users often employ some other optimization program to
"polish" the gridpoint values, i.e., to seek a more precise
(local) minimum near `brute's` best gridpoint.
The `brute` function's `finish` option provides a convenient way to do
that. Any polishing program used must take `brute's` output as its
initial guess as a positional argument, and take `brute's` input values
for `args` as keyword arguments, otherwise an error will be raised.
It may additionally take `full_output` and/or `disp` as keyword arguments.

`brute` assumes that the `finish` function returns either an
`OptimizeResult` object or a tuple in the form:
``(xmin, Jmin, ... , statuscode)``, where ``xmin`` is the minimizing
value of the argument, ``Jmin`` is the minimum value of the objective
function, "..." may be some other returned values (which are not used
by `brute`), and ``statuscode`` is the status code of the `finish` program.

Note that when `finish` is not None, the values returned are those
of the `finish` program, *not* the gridpoint ones. Consequently,
while `brute` confines its search to the input grid points,
the `finish` program's results usually will not coincide with any
gridpoint, and may fall outside the grid's boundary. Thus, if a
minimum only needs to be found over the provided grid points, make
sure to pass in `finish=None`.

*Note 2*: The grid of points is a `numpy.mgrid` object.
For `brute` the `ranges` and `Ns` inputs have the following effect.
Each component of the `ranges` tuple can be either a slice object or a
two-tuple giving a range of values, such as (0, 5). If the component is a
slice object, `brute` uses it directly. If the component is a two-tuple
range, `brute` internally converts it to a slice object that interpolates
`Ns` points from its low-value to its high-value, inclusive.

Examples
--------
We illustrate the use of `brute` to seek the global minimum of a function
of two variables that is given as the sum of a positive-definite
quadratic and two deep "Gaussian-shaped" craters. Specifically, define
the objective function `f` as the sum of three other functions,
``f = f1 + f2 + f3``. We suppose each of these has a signature
``(z, *params)``, where ``z = (x, y)``,  and ``params`` and the functions
are as defined below.

>>> params = (2, 3, 7, 8, 9, 10, 44, -1, 2, 26, 1, -2, 0.5)
>>> def f1(z, *params):
...     x, y = z
...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
...     return (a * x**2 + b * x * y + c * y**2 + d*x + e*y + f)

>>> def f2(z, *params):
...     x, y = z
...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
...     return (-g*np.exp(-((x-h)**2 + (y-i)**2) / scale))

>>> def f3(z, *params):
...     x, y = z
...     a, b, c, d, e, f, g, h, i, j, k, l, scale = params
...     return (-j*np.exp(-((x-k)**2 + (y-l)**2) / scale))

>>> def f(z, *params):
...     return f1(z, *params) + f2(z, *params) + f3(z, *params)

Thus, the objective function may have local minima near the minimum
of each of the three functions of which it is composed. To
use `fmin` to polish its gridpoint result, we may then continue as
follows:

>>> rranges = (slice(-4, 4, 0.25), slice(-4, 4, 0.25))
>>> from scipy import optimize
>>> resbrute = optimize.brute(f, rranges, args=params, full_output=True,
...                           finish=optimize.fmin)
>>> resbrute[0]  # global minimum
array([-1.05665192,  1.80834843])
>>> resbrute[1]  # function value at global minimum
-3.4085818767

Note that if `finish` had been set to None, we would have gotten the
gridpoint [-1.0 1.75] where the rounded function value is -2.892.
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_optimize.py
Type:      function

In [53]:

x_start = optimize.brute(f, (slice(-3, 5, 0.5), slice(-3, 5, 0.5)), finish=None); x_start

Out[53]:

array([1.5, 1.5])

In [54]:

x_opt = optimize.fmin_bfgs(f, x_start); x_opt

Optimization terminated successfully.
         Current function value: -9.520229
         Iterations: 4
         Function evaluations: 21
         Gradient evaluations: 7

Out[54]:

array([1.47586906, 1.48365787])

Nonlinear least square¶

We discussed linear least squares earlier. In general, a least squares problem minimizes an objective function in the form of $$g(\beta) = \sum_{i=0}^m r_i^2(\beta) = \Vert r(\beta) \Vert^2,$$ where $r(\beta)$ is a vector containing the residual $r_i(\beta)=y_i - f(x_i,\beta)$ of $m$ data points $(x_i, y_i)$.

The problem is nonlinear if $r(\beta)$ is nonlinear in $\beta$.

In scipy.optimize, the function leastsq can solve nonlinear least squares problems.

In [55]:

def f(x, beta0, beta1, beta2):
    return beta0 + beta1 * np.exp(-beta2 * x**2)

In [56]:

beta = (0.25, 0.75, 0.5)

In [57]:

xdata = np.linspace(0, 5, 50)

In [58]:

y = f(xdata, *beta)

In [59]:

ydata = y + 0.05 * np.random.randn(len(xdata))

In [60]:

def g(beta):
    return ydata - f(xdata, *beta)

In [61]:

beta_start = (1, 1, 1)

In [62]:

beta_opt, beta_cov = optimize.leastsq(g, beta_start)

In [63]:

beta_opt

Out[63]:

array([0.2607317 , 0.73566192, 0.49722151])

In [64]:

fig, ax = plt.subplots()

ax.scatter(xdata, ydata)
ax.plot(xdata, y, 'r', lw=2)
ax.plot(xdata, f(xdata, *beta_opt), 'b', lw=2)
ax.set_xlim(0, 5)
ax.set_xlabel(r"$x$", fontsize=18)
ax.set_ylabel(r"$f(x, \beta)$", fontsize=18)

fig.tight_layout()

The same functionality can also be accessed through the function optimize.curve_fit, which is a convenient wrapper of optimize.leastsq.

In [65]:

optimize.curve_fit?

Signature:
optimize.curve_fit(
    f,
    xdata,
    ydata,
    p0=None,
    sigma=None,
    absolute_sigma=False,
    check_finite=True,
    bounds=(-inf, inf),
    method=None,
    jac=None,
    *,
    full_output=False,
    **kwargs,
)
Docstring:
Use non-linear least squares to fit a function, f, to data.

Assumes ``ydata = f(xdata, *params) + eps``.

Parameters
----------
f : callable
    The model function, f(x, ...). It must take the independent
    variable as the first argument and the parameters to fit as
    separate remaining arguments.
xdata : array_like or object
    The independent variable where the data is measured.
    Should usually be an M-length sequence or an (k,M)-shaped array for
    functions with k predictors, but can actually be any object.
ydata : array_like
    The dependent data, a length M array - nominally ``f(xdata, ...)``.
p0 : array_like, optional
    Initial guess for the parameters (length N). If None, then the
    initial values will all be 1 (if the number of parameters for the
    function can be determined using introspection, otherwise a
    ValueError is raised).
sigma : None or M-length sequence or MxM array, optional
    Determines the uncertainty in `ydata`. If we define residuals as
    ``r = ydata - f(xdata, *popt)``, then the interpretation of `sigma`
    depends on its number of dimensions:

        - A 1-D `sigma` should contain values of standard deviations of
          errors in `ydata`. In this case, the optimized function is
          ``chisq = sum((r / sigma) ** 2)``.

        - A 2-D `sigma` should contain the covariance matrix of
          errors in `ydata`. In this case, the optimized function is
          ``chisq = r.T @ inv(sigma) @ r``.

          .. versionadded:: 0.19

    None (default) is equivalent of 1-D `sigma` filled with ones.
absolute_sigma : bool, optional
    If True, `sigma` is used in an absolute sense and the estimated parameter
    covariance `pcov` reflects these absolute values.

    If False (default), only the relative magnitudes of the `sigma` values matter.
    The returned parameter covariance matrix `pcov` is based on scaling
    `sigma` by a constant factor. This constant is set by demanding that the
    reduced `chisq` for the optimal parameters `popt` when using the
    *scaled* `sigma` equals unity. In other words, `sigma` is scaled to
    match the sample variance of the residuals after the fit. Default is False.
    Mathematically,
    ``pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)``
check_finite : bool, optional
    If True, check that the input arrays do not contain nans of infs,
    and raise a ValueError if they do. Setting this parameter to
    False may silently produce nonsensical results if the input arrays
    do contain nans. Default is True.
bounds : 2-tuple of array_like, optional
    Lower and upper bounds on parameters. Defaults to no bounds.
    Each element of the tuple must be either an array with the length equal
    to the number of parameters, or a scalar (in which case the bound is
    taken to be the same for all parameters). Use ``np.inf`` with an
    appropriate sign to disable bounds on all or some parameters.

    .. versionadded:: 0.17
method : {'lm', 'trf', 'dogbox'}, optional
    Method to use for optimization. See `least_squares` for more details.
    Default is 'lm' for unconstrained problems and 'trf' if `bounds` are
    provided. The method 'lm' won't work when the number of observations
    is less than the number of variables, use 'trf' or 'dogbox' in this
    case.

    .. versionadded:: 0.17
jac : callable, string or None, optional
    Function with signature ``jac(x, ...)`` which computes the Jacobian
    matrix of the model function with respect to parameters as a dense
    array_like structure. It will be scaled according to provided `sigma`.
    If None (default), the Jacobian will be estimated numerically.
    String keywords for 'trf' and 'dogbox' methods can be used to select
    a finite difference scheme, see `least_squares`.

    .. versionadded:: 0.18
full_output : boolean, optional
    If True, this function returns additioal information: `infodict`,
    `mesg`, and `ier`.

    .. versionadded:: 1.9
**kwargs
    Keyword arguments passed to `leastsq` for ``method='lm'`` or
    `least_squares` otherwise.

Returns
-------
popt : array
    Optimal values for the parameters so that the sum of the squared
    residuals of ``f(xdata, *popt) - ydata`` is minimized.
pcov : 2-D array
    The estimated covariance of popt. The diagonals provide the variance
    of the parameter estimate. To compute one standard deviation errors
    on the parameters use ``perr = np.sqrt(np.diag(pcov))``.

    How the `sigma` parameter affects the estimated covariance
    depends on `absolute_sigma` argument, as described above.

    If the Jacobian matrix at the solution doesn't have a full rank, then
    'lm' method returns a matrix filled with ``np.inf``, on the other hand
    'trf'  and 'dogbox' methods use Moore-Penrose pseudoinverse to compute
    the covariance matrix.
infodict : dict (returned only if `full_output` is True)
    a dictionary of optional outputs with the keys:

    ``nfev``
        The number of function calls. Methods 'trf' and 'dogbox' do not
        count function calls for numerical Jacobian approximation,
        as opposed to 'lm' method.
    ``fvec``
        The function values evaluated at the solution.
    ``fjac``
        A permutation of the R matrix of a QR
        factorization of the final approximate
        Jacobian matrix, stored column wise.
        Together with ipvt, the covariance of the
        estimate can be approximated.
        Method 'lm' only provides this information.
    ``ipvt``
        An integer array of length N which defines
        a permutation matrix, p, such that
        fjac*p = q*r, where r is upper triangular
        with diagonal elements of nonincreasing
        magnitude. Column j of p is column ipvt(j)
        of the identity matrix.
        Method 'lm' only provides this information.
    ``qtf``
        The vector (transpose(q) * fvec).
        Method 'lm' only provides this information.

    .. versionadded:: 1.9
mesg : str (returned only if `full_output` is True)
    A string message giving information about the solution.

    .. versionadded:: 1.9
ier : int (returnned only if `full_output` is True)
    An integer flag. If it is equal to 1, 2, 3 or 4, the solution was
    found. Otherwise, the solution was not found. In either case, the
    optional output variable `mesg` gives more information.

    .. versionadded:: 1.9

Raises
------
ValueError
    if either `ydata` or `xdata` contain NaNs, or if incompatible options
    are used.

RuntimeError
    if the least-squares minimization fails.

OptimizeWarning
    if covariance of the parameters can not be estimated.

See Also
--------
least_squares : Minimize the sum of squares of nonlinear functions.
scipy.stats.linregress : Calculate a linear least squares regression for
                         two sets of measurements.

Notes
-----
Users should ensure that inputs `xdata`, `ydata`, and the output of `f`
are ``float64``, or else the optimization may return incorrect results.

With ``method='lm'``, the algorithm uses the Levenberg-Marquardt algorithm
through `leastsq`. Note that this algorithm can only deal with
unconstrained problems.

Box constraints can be handled by methods 'trf' and 'dogbox'. Refer to
the docstring of `least_squares` for more information.

Examples
--------
>>> import matplotlib.pyplot as plt
>>> from scipy.optimize import curve_fit

>>> def func(x, a, b, c):
...     return a * np.exp(-b * x) + c

Define the data to be fit with some noise:

>>> xdata = np.linspace(0, 4, 50)
>>> y = func(xdata, 2.5, 1.3, 0.5)
>>> rng = np.random.default_rng()
>>> y_noise = 0.2 * rng.normal(size=xdata.size)
>>> ydata = y + y_noise
>>> plt.plot(xdata, ydata, 'b-', label='data')

Fit for the parameters a, b, c of the function `func`:

>>> popt, pcov = curve_fit(func, xdata, ydata)
>>> popt
array([2.56274217, 1.37268521, 0.47427475])
>>> plt.plot(xdata, func(xdata, *popt), 'r-',
...          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

Constrain the optimization to the region of ``0 <= a <= 3``,
``0 <= b <= 1`` and ``0 <= c <= 0.5``:

>>> popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 1., 0.5]))
>>> popt
array([2.43736712, 1.        , 0.34463856])
>>> plt.plot(xdata, func(xdata, *popt), 'g--',
...          label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

>>> plt.xlabel('x')
>>> plt.ylabel('y')
>>> plt.legend()
>>> plt.show()
File:      c:\users\win10hx64\appdata\local\programs\python\python310\lib\site-packages\scipy\optimize\_minpack_py.py
Type:      function

In [66]:

beta_opt, beta_cov = optimize.curve_fit(f, xdata, ydata)

In [67]:

beta_opt

Out[67]:

array([0.2607317 , 0.73566192, 0.49722151])

17. Object-Oriented Programming (0)	2024.03.14
16. Classes (0)	2024.03.14
14. scipy-basics (0)	2024.03.14
13. Linear Algebra (0)	2024.03.14
12. matplotlib-basics (0)	2024.03.14

15. scipy-optimization