$\gamma$-ray spectra visualization and fit

In this example we want fit a power law function on one high energy $\gamma$-ray spectrum (Aharonian et al 2006) acquired by HESS. We will see, how to:

  1. visualize the spectrum
  2. overlay on spectrum the model already computed
  3. fit a new model
  4. compute the integrated energy of the 2 models

Key: error plot, scatter plot, curve fit, integration

Explore the file

In [1]:
from astropy.io import fits

tev = fits.open('data/TeVJ1826-148.fits')
tev.info()
Filename: data/TeVJ1826-148.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       9   ()      
  1  HESS_INFC     1 BinTableHDU     20   26R x 4C   [1E, 1E, 1E, 1E]   
  2  MODEL_HESS_INFC    1 BinTableHDU     20   300R x 4C   [1E, 1E, 1E, 1E]   
  3  MODEL_HESS_INFC_MAN    1 BinTableHDU     21   300R x 4C   [1E, 1E, 1E, 1E]   
In [ ]:
 

Open the proper extension (HESS_INFC) and retrieve just data

In [2]:
infc=tev['HESS_INFC'].data
In [3]:
infc
Out[3]:
FITS_rec([( 0.195958  , 6.0770583e-11, 1.2443476e-11, 1.4777032e-11),
          ( 0.241226  , 4.1031869e-11, 4.0196353e-12, 7.6927215e-12),
          ( 0.301863  , 2.6202924e-11, 3.3654117e-12, 3.1804041e-12),
          ( 0.369569  , 1.7682928e-11, 1.7322793e-12, 2.3747540e-12),
          ( 0.45494398, 1.0170147e-11, 1.3062176e-12, 1.2344121e-12),
          ( 0.553947  , 8.5277436e-12, 9.2303097e-13, 1.1452271e-12),
          ( 0.681913  , 7.0764371e-12, 5.4529211e-13, 5.9083592e-13),
          ( 0.83030903, 3.2332483e-12, 4.4736605e-13, 4.7647048e-13),
          ( 1.02772   , 1.9477987e-12, 2.3061205e-13, 2.8703380e-13),
          ( 1.24455   , 1.8305073e-12, 1.9813155e-13, 2.6975433e-13),
          ( 1.52369   , 1.1143001e-12, 8.5867430e-14, 1.9383865e-13),
          ( 1.89631   , 5.1605725e-13, 8.6426261e-14, 9.6751478e-14),
          ( 2.30898   , 5.1384363e-13, 7.6140808e-14, 9.6336303e-14),
          ( 2.82687   , 3.8885307e-13, 7.2455070e-14, 5.2221624e-14),
          ( 3.49899   , 1.9280053e-13, 2.8568985e-14, 3.8784521e-14),
          ( 4.26044   , 1.5092543e-13, 3.2270847e-14, 2.6254316e-14),
          ( 5.21602   , 7.4754798e-14, 1.9255756e-14, 1.6072552e-14),
          ( 6.35114   , 5.4631165e-14, 1.1681232e-14, 1.3284150e-14),
          ( 7.73325   , 2.1507434e-14, 7.4304602e-15, 8.1334958e-15),
          ( 9.416121  , 1.5717886e-14, 5.4302965e-15, 6.7014456e-15),
          (11.5913    , 7.5261020e-15, 3.2328266e-15, 4.5117955e-15),
          (14.664599  , 0.0000000e+00, 0.0000000e+00, 3.1852115e-15),
          (17.9538    , 1.7490052e-15, 1.1180409e-15, 1.5729794e-15),
          (21.7416    , 0.0000000e+00, 0.0000000e+00, 1.4490910e-15),
          (26.185     , 1.4579962e-15, 8.4051572e-16, 1.4745043e-15),
          (31.1933    , 0.0000000e+00, 0.0000000e+00, 1.6060789e-16)],
         dtype=(numpy.record, [('ENERGY', '>f4'), ('FLUX', '>f4'), ('FLUX_ERROR_MIN', '>f4'), ('FLUX_ERROR_MAX', '>f4')]))

prepare the plot

In [4]:
#%matplotlib qt
%matplotlib notebook
In [5]:
from matplotlib import pyplot as plt

fig = plt.figure("HESS Spectrum")
ax = fig.subplots()
In [ ]:
 

Error bars

Display points and errors

In [6]:
ax.errorbar(infc['ENERGY'], infc['FLUX'], yerr=[infc['FLUX_ERROR_MIN'], infc['FLUX_ERROR_MAX']], 
            label='HESS INFC', color='red', fmt='o')
Out[6]:
<ErrorbarContainer object of 3 artists>
In [7]:
 infc
Out[7]:
FITS_rec([( 0.195958  , 6.0770583e-11, 1.2443476e-11, 1.4777032e-11),
          ( 0.241226  , 4.1031869e-11, 4.0196353e-12, 7.6927215e-12),
          ( 0.301863  , 2.6202924e-11, 3.3654117e-12, 3.1804041e-12),
          ( 0.369569  , 1.7682928e-11, 1.7322793e-12, 2.3747540e-12),
          ( 0.45494398, 1.0170147e-11, 1.3062176e-12, 1.2344121e-12),
          ( 0.553947  , 8.5277436e-12, 9.2303097e-13, 1.1452271e-12),
          ( 0.681913  , 7.0764371e-12, 5.4529211e-13, 5.9083592e-13),
          ( 0.83030903, 3.2332483e-12, 4.4736605e-13, 4.7647048e-13),
          ( 1.02772   , 1.9477987e-12, 2.3061205e-13, 2.8703380e-13),
          ( 1.24455   , 1.8305073e-12, 1.9813155e-13, 2.6975433e-13),
          ( 1.52369   , 1.1143001e-12, 8.5867430e-14, 1.9383865e-13),
          ( 1.89631   , 5.1605725e-13, 8.6426261e-14, 9.6751478e-14),
          ( 2.30898   , 5.1384363e-13, 7.6140808e-14, 9.6336303e-14),
          ( 2.82687   , 3.8885307e-13, 7.2455070e-14, 5.2221624e-14),
          ( 3.49899   , 1.9280053e-13, 2.8568985e-14, 3.8784521e-14),
          ( 4.26044   , 1.5092543e-13, 3.2270847e-14, 2.6254316e-14),
          ( 5.21602   , 7.4754798e-14, 1.9255756e-14, 1.6072552e-14),
          ( 6.35114   , 5.4631165e-14, 1.1681232e-14, 1.3284150e-14),
          ( 7.73325   , 2.1507434e-14, 7.4304602e-15, 8.1334958e-15),
          ( 9.416121  , 1.5717886e-14, 5.4302965e-15, 6.7014456e-15),
          (11.5913    , 7.5261020e-15, 3.2328266e-15, 4.5117955e-15),
          (14.664599  , 0.0000000e+00, 0.0000000e+00, 3.1852115e-15),
          (17.9538    , 1.7490052e-15, 1.1180409e-15, 1.5729794e-15),
          (21.7416    , 0.0000000e+00, 0.0000000e+00, 1.4490910e-15),
          (26.185     , 1.4579962e-15, 8.4051572e-16, 1.4745043e-15),
          (31.1933    , 0.0000000e+00, 0.0000000e+00, 1.6060789e-16)],
         dtype=(numpy.record, [('ENERGY', '>f4'), ('FLUX', '>f4'), ('FLUX_ERROR_MIN', '>f4'), ('FLUX_ERROR_MAX', '>f4')]))

Plot setup

Setup the plot: scales, axes labels and legend

In [8]:
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlabel("E(TeV)")
ax.set_ylabel(r"Flux ($\frac{ph}{cm^2sTev}$)") #raw string, no characters escaping
Out[8]:
Text(0, 0.5, 'Flux ($\\frac{ph}{cm^2sTev}$)')
In [ ]:
 

In the same way we can retrieve data model, and plot it as a solid line without error

In [9]:
#Load the model
model=tev['MODEL_HESS_INFC'].data
ax.plot(model['ENERGY'], model['FLUX'], label='model', color='gray')
ax.legend()
Out[9]:
<matplotlib.legend.Legend at 0x7f1193ab3240>
In [10]:
model=tev['MODEL_HESS_INFC'].data
model.dtype
Out[10]:
dtype((numpy.record, [('ENERGY', '>f4'), ('FLUX', '>f4'), ('FLUX_ERROR_MIN', '>f4'), ('FLUX_ERROR_MAX', '>f4')]))

Fit a function

There are several ways to compute a fit in astropy, NumPy or SciPy.

NumPy and astropy.modeling modules offer a lot of prepared models for fitting

SciPy provides us a general purpose function for fit: curve_fit

In this example we will use SciPy and curve_fit.

In [11]:
from scipy.optimize import curve_fit
In [ ]:
 

optimize.curve_fit

curve_fit requires the definition of our function.

Signature of this function MUST be ydata = f(xdata, *params)

In our case we decide to fit a power law like that

$$N x^{-\alpha} e^{-\frac{x}{x_{off}}}$$

The parameters of the function are: $N$, $\alpha$ and $x_{off}$

In [12]:
import numpy as np

def powerlaw(x, N, alpha, x_off):
    return N * np.power(x, -alpha) * np.exp(-x/x_off)
In [ ]:
 
  • use the function with scalar
In [ ]:
powerlaw(1, 3.0, 2.0, 0.5)
In [13]:
powerlaw(1, 3.0, 2.0, 0.5)
Out[13]:
0.4060058497098381
  • and with numpy.array objects
In [ ]:
_x=np.array([0.5, 1.0, 2.0])
powerlaw(_x, 3.0, 2.0, 0.5)
In [14]:
_x=np.array([0.5, 1.0, 2.0])
powerlaw(_x, 3.0, 2.0, 0.5)
Out[14]:
array([4.41455329, 0.40600585, 0.01373673])

Compute the fit

In [15]:
xdata = infc['ENERGY']
ydata = infc['FLUX']
yerr = infc['FLUX_ERROR_MAX']
popt, pcov=curve_fit(powerlaw, xdata, ydata, sigma=yerr, p0=(1.0, 1, 6.0))
popt, pcov
Out[15]:
(array([2.97021086e-12, 1.82433667e+00, 7.68035476e+00]),
 array([[ 3.94792250e-26, -1.02996866e-14, -3.28658276e-13],
        [-1.02996866e-14,  5.28959877e-03,  1.20696691e-01],
        [-3.28658276e-13,  1.20696691e-01,  3.97762247e+00]]))
In [16]:
curve_fit?

Compute the $\chi^{2}$

$$ \chi^2 = \sum_j^n \left( \frac{f(x_j) - y_j}{y_{err}} \right)^2 $$$$ \chi^2_{r} = \frac{\chi^2}{n-n_{p}} $$
In [17]:
import numpy as np
chisq = np.sum(((powerlaw(xdata, *popt) - ydata)/yerr)**2)
reduced_chisq = (chisq)/(len(xdata)-len(popt))
In [18]:
powerlaw(xdata, *popt)
Out[18]:
array([5.6629090e-11, 3.8531258e-11, 2.5393347e-11, 1.7400369e-11,
       1.1777737e-11, 8.1182812e-12, 5.4646461e-12, 3.7425735e-12,
       2.4717812e-12, 1.6946432e-12, 1.1297008e-12, 7.2203468e-13,
       4.7777412e-13, 3.0874850e-13, 1.9168949e-13, 1.2121016e-13,
       7.3990015e-14, 4.4563884e-14, 2.5991438e-14, 1.4577113e-14,
       7.5163781e-15, 3.2801306e-15, 1.4776295e-15, 6.3637580e-16,
       2.5416812e-16, 9.6217808e-17], dtype=float32)

Plot the final model

In [19]:
# Compute the fitted function on the grid of x
x=np.linspace(xdata[0], xdata[-1], 1000)
yfit=powerlaw(x, *popt)

#Setup a new figure
fig=plt.figure()
ax = fig.subplots()

#Plot the fitted function
ax.plot(x, yfit, color='blue', label='powerlaw fit')

#Plot data
ax.errorbar(xdata, ydata, yerr=[infc['FLUX_ERROR_MIN'], infc['FLUX_ERROR_MAX']], label='HESS', color='red', fmt='o')

#Plot original model
ax.plot(model['ENERGY'], model['FLUX'], label='model', color='gray', ls='--')
ax.legend()
ax.set_xscale('log')
ax.set_yscale('log')

Fix some parameters

curve_fit assumes $f$ must have this signature ydata = f(xdata, *params) there are not additional options for fixed parameters.

curve_fit provides also the bounds parameter to set a lower and upper limits in parameters space exploration. In any case each lower bound must be strictly less than each upper bound.

In [ ]:
# This call raises an error!
popt, pcov=curve_fit(powerlaw, xdata, ydata, sigma=yerr, p0=(2.97, 1, 6.0), bounds=((2.97, 0, 0),
                                                                                    (2.97, 10., np.inf)))

Goal: we must find a way provide some parameter to the final function (the powerlaw in this case), but this parameter can not be a function argument.

We can define an auxiliary function and take advantage of variables scopes.

Variables scope

Variables can only reach the area in which they are defined, which is called scope. Think of it as the area of code where variables can be used. Python supports global variables (usable in the entire program) and local variables.

By default, all variables declared in a function are local variables. To access a global variable inside a function, it’s required to explicitly define global variable.

In [ ]:
def powerlaw(x, N, alpha, x_off):
    return N * np.power(x, -alpha) * np.exp(-x/x_off)

We can define "new" powerlaw function, with only 2 free parameter: $\alpha$ and $x_{off}$. The N parameter is defined out of powerlaw function.

In [ ]:
    def powerlaw(x, alpha, x_off):
        return N * np.power(x, -alpha) * np.exp(-x/x_off)

We nest powerlaw inside another function which provides N. In this way N is defined outside powerlaw, but it is in the powerlaw scope.

In [20]:
def powerlawN(N):
    # Variable N lives in this scope
    def powerlaw(x, alpha, x_off):
        return N * np.power(x, -alpha) * np.exp(-x/x_off)
    return powerlaw

powerlawN(2.97e-12) returns a powerlaw with a fixed N, and we can call the curve_fit in this way

In [21]:
popt2, pcov2=curve_fit(powerlawN(2.97e-12), xdata, ydata, sigma=yerr, p0=(1, 6.0))
popt2,pcov2
Out[21]:
(array([1.82439046, 7.68200904]), array([[0.00249409, 0.03350657],
        [0.03350657, 1.19053626]]))
In [ ]:
 

Comute the integrate energy: scipy.integrate

The scipy.integrate sub-package provides several integration techniques:

  • Methods for Integrating Functions given function object
  • Methods for Integrating Functions given fixed samples
  • Interface to numerical integrators of ODE systems

We use quad function, which use a technique from the Fortran library QUADPACK, on the analytic model

In [22]:
from scipy.integrate import quad

value, err = quad(powerlaw, 0.2, 20, args=tuple(popt))
value
Out[22]:
1.1697759599647136e-11

And the simps function, which use the composite Simpson’s rule, to integrate the sampled model.

In [23]:
from scipy.integrate import simps

idx = (model['ENERGY'] >=0.2) & (model['ENERGY'] <=20.0)
value = simps(model['FLUX'][idx], model['ENERGY'][idx])
value
Out[23]:
9.068625001793289e-12
In [24]:
idx
Out[24]:
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

Fill area in plot

In [25]:
xfill = np.linspace(0.2, 20, 100)
ax.fill_between(xfill, 0, powerlaw(xfill, *popt), color='orange')
Out[25]:
<matplotlib.collections.PolyCollection at 0x7f11937d2198>
In [26]:
ax
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f11926e1160>

and close the file

In [27]:
tev.close()

Optional arguments

args and kwargs

In [28]:
def sum(*args):
    s=0
    for i in args:
        s+=i
    print(s)
In [ ]:
 
In [ ]:
sum(1, 2, 3)
sum([1, 2, 3]) # Error!
sum(*[1, 2, 3])
In [29]:
def show(**kargs):
    print(kargs)
    print("keys", *kargs)
In [30]:
show(a=1, b="x")
{'a': 1, 'b': 'x'}
keys a b

curve_fit: N dimensional fit

In this example we fit the hyperbolic paraboloid defined by 3 parameters (a, b, c):

$$ z = ax^2-by^2+c $$

By Nicoguaro - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=20570051

load the observed point

In [31]:
x,y,z=np.loadtxt("data/saddle.dat", unpack=True)

Plot the points in 3D plot

In [32]:
from mpl_toolkits.mplot3d import Axes3D
In [33]:
#Defined a 3D projection figure
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')

#Plot the data
ax.scatter(x,y,z)
Out[33]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7f1192109630>
In [ ]:
 

Define the hyperbolic paraboloid function

In [34]:
def hypar(x, y, a, b, c):
  return a*x*x - b*y*y + c

PROBLEM: we can not use hypar function with curve_fit

$f$ interface must be ydata = f(xdata, *params)

In [ ]:
popt, pcov = curve_fit(hypar, x, y , z) # x and y not allow!

Define a new interface, where $x$ and $y$ data are packed together into a generic data

In [35]:
def hypar_interface(data, a, b, c):
  _x = data[:len(data)//2]
  _y = data[len(data)//2:]
  return hypar(_x, _y, a, b, c)

prepare the input data for hypar_interface

In [36]:
#Concatenate x,y for curve_fit
input_data = np.concatenate((x, y))

compute the fit

In [ ]:
#The invalid syntax just for comparison
popt, pcov = curve_fit(hypar, x, y , z)
In [37]:
#Compute the fit
popt, pcov = curve_fit(hypar_interface, input_data, z)
popt, pcov
Out[37]:
(array([2.11000788, 3.51269877, 1.03722845]),
 array([[ 5.97524246e-05,  1.52870003e-06, -1.95994463e-03],
        [ 1.52870003e-06,  5.87180572e-05,  1.83185189e-03],
        [-1.95994463e-03,  1.83185189e-03,  1.77500926e-01]]))

Show the results

Evaluate the function on a regular grid of X and Y. Plot the surface obtained by the fit

In [38]:
#Evaluate the function on a grid of values
X,Y = np.meshgrid(np.linspace(-10,10,20), np.linspace(-10,10,20))

zfit = hypar(X, Y, *popt)

#Plot the surface
from matplotlib import cm
surf = ax.plot_surface(X, Y, zfit, cmap=cm.rainbow,
                       linewidth=0, alpha=0.5)
In [ ]: