Spectra extraction

In this example, we will create a wavelength calibration relation ($\lambda \rightarrow$ pix) and we will use it to extract a 2D (LUCI) spectrum lambda calibrated. We will see how to:

  • extract a sky 1D spectrum (the background spectrum)
  • fit expected line positions with a gaussian model
  • fit a polynomial to obtain the wavelength calibration relation
  • apply this relation on the raw frame to obtain 2D extracted spectrum
  • create the WCS of spectrum from scratch and save it

Keys: astropy models, numpy polynomial, interpolator, wcs

Wavelength calibration relation

The idea is to use OH lines (near infrared) of raw LUCI frame to derive the relation from Ã… to pixels, using OH lines (the sky lines) as reference positions.

In [1]:
# Load the frame data
import numpy as np
from astropy.io import fits

with fits.open('data/luci1.20170418.0030.fits.gz') as images:
    data = images[0].data.astype(float) #Convert int16 into floats

Create a sky spectrum

  • We identify a region on the frame near the spectrum to extract, without any other objects (the "sky region" in cyan on the picture).
  • Use this region to compute a wavelength calibration model $$p:\lambda \rightarrow pix\hspace{1cm} p(\lambda) = \sum_{i=0}^{n} c_i\lambda^i $$
  • Apply the model to the region containing the target (the blue region)

The strong assumption is: no optical distortions exist along the spatial direction, i.e. the variation in $\lambda$ is just along the $x$ direction of the picture (moving along $y$ axis $\lambda$ doesn't change)

The sky spectrum

Create the 1D spectrum

In [2]:
rows=data[1080:1090,:] # Select rows
sky=rows.mean(axis=0) # Create 1D sky spectrum

and display it

In [3]:
#%matplotlib qt
%matplotlib notebook
In [4]:
import matplotlib.pyplot as plt
fig = plt.figure("Sky 1D spectrum")
plt.plot(sky)
Out[4]:
[<matplotlib.lines.Line2D at 0x7fe6bdfa7be0>]
In [ ]:
 

Measure line positions

The idea is to find bright and isolated lines (we use Rousselot et al. 2000 as catalog) and compute their positions on the frame (e.g. fitting a gaussian)

In [5]:
#~600pix 16030.83A
flux=sky[560:640]
pix=np.arange(560, 640)

plt.cla()
plt.plot(pix, flux, 'ko', color='gray', alpha=0.5)
plt.plot(pix, flux, '--', color='gray', alpha=0.5)
Out[5]:
[<matplotlib.lines.Line2D at 0x7fe6bdfb8eb8>]

We can use scipy.otimize.curve_fit like in the previous example or use tools like astropy models

astropy.modelling

astropy.modeling provides a framework for representing models and performing model evaluation and fitting (with parameter constraints) of: 1-D models, 2-D models and compound models

In [6]:
from astropy.modeling import models
model=models.Gaussian1D(mean=600) #Define the model

This module provides also the Fitters: wrappers around some Numpy and Scipy fitting functions. All Fitters can be called as functions

In [7]:
from astropy.modeling import fitting
fitter=fitting.LevMarLSQFitter() # Levenberg-Marquardt algorithm and least squares statistic

Fit the line

In [8]:
gauss=fitter(model, pix, flux) #Compute the fit

In the model instance are stored the fitted parameters, they can be retrieved by name(.param_names)

In [9]:
gauss.param_names
Out[9]:
('amplitude', 'mean', 'stddev')
In [10]:
gauss.mean
Out[10]:
Parameter('mean', value=586.581072388651)
In [11]:
fitter.fit_info
Out[11]:
{'nfev': 42,
 'fvec': array([-1.48040000e+02, -1.69120000e+02, -1.83500000e+02, -1.82500000e+02,
        -1.62180000e+02, -1.45280000e+02, -1.23260000e+02, -1.05440000e+02,
        -9.80000000e+01, -9.44600000e+01, -1.19220000e+02, -1.70620000e+02,
        -2.23860000e+02, -2.72519999e+02, -2.98619984e+02, -2.77219657e+02,
        -2.43874405e+02, -2.14968982e+02, -2.20299436e+02, -2.52468905e+02,
        -2.63255561e+02, -1.79285708e+02,  1.39380709e+02,  3.70429558e+02,
        -1.86893091e+02, -2.20858134e+02,  2.32535538e+02,  3.41811859e-01,
        -1.57153119e+02,  1.52402154e+02,  1.47129207e+01, -1.17846488e+02,
        -1.28347848e+02, -1.42017109e+02, -2.05982321e+02, -2.44708439e+02,
        -2.67555290e+02, -2.72251406e+02, -2.33519452e+02, -1.68859973e+02,
        -1.45039999e+02, -1.61660000e+02, -1.61000000e+02, -1.62060000e+02,
        -1.68720000e+02, -1.38340000e+02, -1.16280000e+02, -1.22600000e+02,
        -1.18840000e+02, -1.12640000e+02, -1.10320000e+02, -9.70400000e+01,
        -9.94400000e+01, -9.95200000e+01, -9.28600000e+01, -1.01660000e+02,
        -9.93800000e+01, -1.56460000e+02, -2.13540000e+02, -2.45060000e+02,
        -2.59300000e+02, -2.37340000e+02, -1.55900000e+02, -1.14280000e+02,
        -9.42400000e+01, -9.11800000e+01, -1.19080000e+02, -1.55200000e+02,
        -1.60720000e+02, -1.71480000e+02, -1.47840000e+02, -1.15040000e+02,
        -8.95200000e+01, -8.07200000e+01, -7.45800000e+01, -7.59000000e+01,
        -7.14400000e+01, -6.82800000e+01, -7.24200000e+01, -7.46000000e+01]),
 'fjac': array([[-6.13688225e+003,  1.54557129e-034,  7.94289620e-032,
          3.16201516e-029,  9.74808935e-027,  2.32650436e-024,
          4.29689956e-022,  6.13885594e-020,  6.78086757e-018,
          5.78758753e-016,  3.81442549e-014,  1.93967769e-012,
          7.60289002e-011,  2.29438723e-009,  5.32312472e-008,
          9.47753763e-007,  1.29198626e-005,  1.34450133e-004,
          1.06386005e-003,  6.36597506e-003,  2.85846676e-002,
          9.52017995e-002,  2.30859648e-001,  3.94618606e-001,
          4.45679799e-001,  2.82572730e-001,  5.01209420e-002,
          2.65913305e-002,  2.41979265e-001,  4.33553737e-001,
          4.15003413e-001,  2.58175255e-001,  1.12370062e-001,
          3.54671243e-002,  8.28312787e-003,  1.44934136e-003,
          1.91575089e-004,  1.92395576e-005,  1.47416651e-006,
          8.64459962e-008,  3.88890232e-009,  1.34463988e-010,
          3.57874688e-012,  7.34055210e-014,  1.16153529e-015,
          1.41906608e-017,  1.33949996e-019,  9.77485412e-022,
          5.51729243e-024,  2.40979602e-026,  8.14776699e-029,
          2.13326924e-031,  4.32642094e-034,  6.79830186e-037,
          8.27867205e-040,  7.81445543e-043,  5.71866352e-046,
          3.24504043e-049,  1.42804008e-052,  4.87432253e-056,
          1.29061054e-059,  2.65113593e-063,  4.22541338e-067,
          5.22573856e-071,  5.01539281e-075,  3.73573091e-079,
          2.15969028e-083,  9.69130167e-088,  3.37578547e-092,
          9.12840776e-097,  1.91631232e-101,  3.12327780e-106,
          3.95228741e-111,  3.88328430e-116,  2.96265155e-121,
          1.75513090e-126,  8.07424264e-132,  2.88451152e-137,
          8.00267035e-143,  1.72426064e-148],
        [ 1.86164108e-011,  5.01074337e+003,  7.88272209e-033,
          3.27113567e-030,  1.05310847e-027,  2.62983817e-025,
          5.09313730e-023,  7.64802257e-021,  8.90251448e-019,
          8.03064490e-017,  5.61196230e-015,  3.03689981e-013,
          1.27200113e-011,  4.12126865e-010,  1.03216059e-008,
          1.99638850e-007,  2.97869858e-006,  3.42330425e-005,
          3.02441486e-004,  2.04848492e-003,  1.05958313e-002,
          4.16127248e-002,  1.22936167e-001,  2.68821173e-001,
          4.21232787e-001,  4.35992258e-001,  2.10422991e-001,
         -1.54842614e-001, -4.16019209e-001, -4.37236674e-001,
         -2.96113847e-001, -1.42526270e-001, -5.05864980e-002,
         -1.34791016e-002, -2.72364659e-003, -4.19963445e-004,
         -4.96175370e-005, -4.50473994e-006, -3.14933538e-007,
         -1.69808140e-008, -7.06979732e-010, -2.27494439e-011,
         -5.66206157e-013, -1.09064076e-014, -1.62670498e-016,
         -1.87947300e-018, -1.68273328e-020, -1.16781806e-022,
         -6.28385449e-025, -2.62218479e-027, -8.48729799e-030,
         -2.13116435e-032, -4.15211538e-035, -6.27744810e-038,
         -7.36559932e-041, -6.70793605e-044, -4.74204413e-047,
         -2.60240031e-050, -1.10878395e-053, -3.66786645e-057,
         -9.42107796e-061, -1.87902502e-064, -2.91025940e-068,
         -3.50040611e-072, -3.26972725e-076, -2.37207415e-080,
         -1.33654807e-084, -5.84918348e-089, -1.98826336e-093,
         -5.24968529e-098, -1.07667623e-102, -1.71530152e-107,
         -2.12280241e-112, -2.04080737e-117, -1.52414673e-122,
         -8.84285042e-128, -3.98571659e-133, -1.39565029e-138,
         -3.79673119e-144, -8.02440280e-150],
        [-1.08480838e+000,  4.19114817e-015,  1.53415072e+000,
          2.21195371e-029,  6.81249274e-027,  1.62406670e-024,
          2.99567933e-022,  4.27345212e-020,  4.71215082e-018,
          4.01364928e-016,  2.63882327e-014,  1.33793781e-012,
          5.22557422e-011,  1.57003118e-009,  3.62249919e-008,
          6.40427314e-007,  8.65012074e-006,  8.89072803e-005,
          6.91465506e-004,  4.03531715e-003,  1.74351179e-002,
          5.44563222e-002,  1.16951449e-001,  1.49548042e-001,
          3.36250663e-002, -2.75868079e-001, -5.89231310e-001,
         -6.18764229e-001, -3.34642172e-001, -5.23058065e-003,
          1.44051669e-001,  1.26920919e-001,  6.33547510e-002,
          2.14567733e-002,  5.22378117e-003,  9.38791793e-004,
          1.26377132e-004,  1.28586097e-005,  9.94818409e-007,
          5.87679347e-008,  2.65898845e-009,  9.23588211e-011,
          2.46721252e-012,  5.07597129e-014,  8.05220420e-016,
          9.85829550e-018,  9.32221857e-020,  6.81320928e-022,
          3.85070782e-024,  1.68380728e-026,  5.69882614e-029,
          1.49339441e-031,  3.03106457e-034,  4.76613790e-037,
          5.80756680e-040,  5.48493852e-043,  4.01590084e-046,
          2.27983258e-049,  1.00368933e-052,  3.42715020e-056,
          9.07736613e-060,  1.86521907e-063,  2.97364488e-067,
          3.67857557e-071,  3.53134515e-075,  2.63091026e-079,
          1.52128289e-083,  6.82782233e-088,  2.37876027e-092,
          6.43341106e-097,  1.35075995e-101,  2.20182967e-106,
          2.78662699e-111,  2.73831281e-116,  2.08936656e-121,
          1.23791460e-126,  5.69543952e-132,  2.03488273e-137,
          5.64600539e-143,  1.21659576e-148]]),
 'ipvt': array([3, 2, 1], dtype=int32),
 'qtf': array([-0.05315933, -0.04154148, -0.03812429]),
 'message': 'Both actual and predicted relative reductions in the sum of squares\n  are at most 0.000000',
 'ierr': 1,
 'param_jac': None,
 'param_cov': array([[ 1.29276110e+04, -1.08130729e-14, -2.28519632e+00],
        [-1.08130729e-14,  1.21185319e-03,  5.58760418e-18],
        [-2.28519632e+00,  5.58760418e-18,  1.21185319e-03]]),
 'njev': 37,
 'cov_x': array([[ 4.24877707e-01, -3.55381486e-19, -7.51050577e-05],
        [-3.55381486e-19,  3.98286585e-08,  1.83641699e-22],
        [-7.51050577e-05,  1.83641699e-22,  3.98286585e-08]])}

Errors of the fit on the parameters are stored in the fitter instance (same order of parameters model)

In [12]:
fitter.fit_info['param_cov'] #Covariance matrix on fit parameters
var =  np.diag(fitter.fit_info['param_cov']) # variance on paramters
err = np.sqrt(var)

Display the fit results

In [13]:
#Display the fitted model
xx=np.linspace(560, 640, 1000)
plt.plot(xx, gauss(xx), color='green')
Out[13]:
[<matplotlib.lines.Line2D at 0x7fe6bb860940>]

Repeat this recipe

Since we want obtain this measure for a set of lines, we iterate this fitting recipe for a pool of selected lines.

In the example directory there is the wcalib.py script which does that.

This script contains a dictionary with the lines (in Angstrom) and as values, the raw expected positions on the frame (in pixels) as keys. The script loops over the line positions and, for each line, computes a gaussian fit. The results are saved into the tmp/line_positions.dat file.

In [14]:
%pfile ../examples/wcalib.py
Object `../examples/wcalib.py` not found.
In [15]:
  %run ../examples/wcalib.py
Saved line positions in slides/tmp/line_positions.dat file
In [16]:
%cat tmp/line_positions.dat
# line  position  error
15597.6  7.16253  0.314887
15702.5  99.3664  0.293066
16030.8  395.693  0.0371374
16128.6  486.347  0.0263762
16235.4  586.581  0.0348117
16502.4  843.44  0.0446273
16553.8  893.983  0.0768731
16840.5  1181.87  0.059238
16903.7  1246.78  0.0473328
17008.8  1355.99  0.0407782
17123.7  1477.16  0.0364429
17330.9  1700.58  0.0870685
17427  1806.6  0.0943249
17450  1831.95  0.0553364

Display computed positions

In [17]:
#Load the data
positions = np.loadtxt("tmp/line_positions.dat")

plt.cla() #Clear the plot
plt.plot(sky) #Plot the sky
plt.vlines(positions[:,1], 0, sky.max(), color='red') #plot the lines

#Add labels to identify lines
for row in positions:
    plt.text(row[1], 8000, f"$\lambda$ {row[0]}", rotation='vertical', fontsize='x-large')
In [ ]:
 

Compound models

Models can be combined together in a simple way. Here we briefly see, how to combine models, to fit a double gaussian on a pair of lines quite close each other.

In [18]:
#Defined a sub region for fit with 2 sky lines very close
start, stop = 1705, 1750
pix = np.arange(start, stop)
lines = sky[start:stop]

fig = plt.figure()
plt.plot(pix, lines)
plt.plot(pix, lines, 'o', alpha=0.5)
Out[18]:
[<matplotlib.lines.Line2D at 0x7fe6b77b06a0>]
In [ ]:
 

Define and fit the compound model

In [19]:
#Defined the new model
cmodel = models.Gaussian1D(mean=1725, amplitude=700.) + \
        models.Gaussian1D(mean=1735, amplitude=700.) + \
        models.Const1D(amplitude=100)

#Use the previous fitting instance on the new compound model
double_gauss=fitter(cmodel, pix, lines)

#Check the results
xx=np.linspace(pix[0], pix[-1], 1000)
plt.plot(xx, double_gauss(xx), color='magenta')
Out[19]:
[<matplotlib.lines.Line2D at 0x7fe6bf751f60>]
In [ ]:
 

Fix and tied parameters

All fitters support fixed (see some examples) parameters through the fixed argument to models or setting the fixed attribute directly on a parameter.

A parameter can be tied (linked to another parameter).

In [20]:
def tiedf(model):
    return model.mean_0+15.0

cmodel.amplitude_0.fixed=True
cmodel.mean_1.tied = tiedf
In [21]:
tied_gauss=fitter(cmodel, pix, lines)
plt.plot(xx, tied_gauss(xx), color='green')
Out[21]:
[<matplotlib.lines.Line2D at 0x7fe6b778d668>]

Find the lambda calibration relation

We have to find the polynomial relation to convert wavelength values (in Angstrom, the expected lines positions) into pixels (the measured line positions in pixels on the frame)

$$ p(\lambda) = \sum_i c_i\lambda^i $$

numpy.polynomials

Polynomials in NumPy can be created, manipulated, and even fitted using the classes of the numpy.polynomial package.

The newer Polynomial package is more complete than older numpy.poly1d (prior to Numpy 1.4) and its convenience classes are better behaved in the numpy environment. Therefore Polynomial is recommended for new coding.

NOTE: The various routines in the Polynomial package all deal with series whose coefficients go from degree zero upward, which is the reverse order of the Poly1d convention.

Display the measures

We start displaying points and errors computed, just to have a rough idea about the relation we are looking for.

In [22]:
fig=plt.figure()
plt.errorbar(positions[:,0],positions[:,1],yerr=positions[:,2],fmt='.r')
Out[22]:
<ErrorbarContainer object of 3 artists>
In [ ]:
 

$$p(\lambda)$$

In [23]:
from numpy.polynomial import polynomial

waves = positions[:, 0]
pixels = positions[:, 1]
errs = positions[:, 2]
 
c3=polynomial.polyfit(waves, pixels, 3, w=1/errs)
c3
Out[23]:
array([-1.75683674e+04,  2.44549412e+00, -1.52856816e-04,  4.37968118e-09])

Create the polynomial

In [24]:
# Define the polynomial starting from it's coefficients
p3 = polynomial.Polynomial(c3)
p3
Out[24]:
$x \mapsto \text{-17568.36735811161} + \text{2.445494116264633}\,x - \text{0.0001528568162202772}\,x^{2} + \text{4.379681180598495e-09}\,x^{3}$
In [25]:
p3.deriv()
Out[25]:
$x \mapsto \text{2.445494116264633} - \text{0.0003057136324405544}\,x + \text{1.3139043541795484e-08}\,x^{2}$

Plot polynomial and residuals

In [26]:
#Define lambda range for plot reasons
w = np.linspace(waves[0], waves[-1], 10000)

#New plotting window with 2 subplots
fig=plt.figure()
ax1=fig.add_subplot(121)

#The points
ax1=plt.errorbar(waves, pixels,yerr=errs,fmt='.r', label="positions") 

#The polynomial p3(w)=evaluate polynomial in all points of w
ax1=plt.plot(w, p3(w), color='green', label="3rd order")
plt.legend()

#The with residuals
ax2 = fig.add_subplot(122)
ax2.plot(waves, p3(waves)-pixels, 'o', color='orange')

#Add a reference line
ax2.hlines(0, waves[0], waves[-1], ls='--', alpha=0.3)
Out[26]:
<matplotlib.collections.LineCollection at 0x7fe6bf661dd8>
In [ ]:
 

Extract the spectra

Select the 2D spectrum region, not lambda calibrated and display it

In [27]:
#Select a slice containing the target
spectrum = data[1075:1125]

#Select a log scale to highlight the object
from matplotlib.colors import LogNorm
fig = plt.figure()
ax=fig.add_subplot(111)
ax.imshow(spectrum[:,500:700], cmap='gray', norm=LogNorm())
Out[27]:
<matplotlib.image.AxesImage at 0x7fe6bf61ca90>

Apply wavelength calibration

As final step we want to apply the p3 relation row by row independently. We are allow to do that, since in this example we are assuming no variation of the calibration solution along the cross dispersion direction.

The p3 gives a the "float" pixel of each lambda we want to extract.

In [28]:
wrange=np.arange(16000., 17300, 0.8)
pixs=p3(wrange) # The "float" pixels
wrange,pixs
Out[28]:
(array([16000. , 16000.8, 16001.6, ..., 17297.6, 17298.4, 17299.2]),
 array([ 367.36766546,  368.1018391 ,  368.83608617, ..., 1664.30078055,
        1665.17176845, 1666.04285161]))
In [ ]:
 

This is a spectrum row details. Red crosses are the point where we want extract the spectrum according with the p3 solution (a subset of pixs array) Since we want to have the intensity of the spectrum at these "float" pixels, we must interpolate some how the spectrum in these points

Interpolate the spectrum

There are several general interpolation facilities available in SciPy (Scipy interpolator), for data in 1, 2, and higher dimensions:

  • a class representing an interpolant (interp1d) in 1-D, offering several interpolation methods.

  • convenience function griddata offering a simple interface to interpolation in N dimensions (N = 1, 2, 3, 4, …)

  • functions for 1- and 2-dimensional (smoothed) cubic-spline interpolation, based on the FORTRAN library FITPACK.

In [29]:
from scipy.interpolate import interp1d
interp=interp1d(np.arange(len(spectrum[0])), spectrum, kind='cubic', bounds_error=False)

NOTE: we created the interpolator object with x.shape=(N,) and y.shape=(...,N)

The row detail show before, with the interpolated values

Creating the interpolator in this way: x.shape=(N,) and y.shape=(...,N)) allow us to compute the spectrum 2D interpolated using a single call, without interpolate row by row and loop over rows:

In [ ]:
# Less efficient way (and less clear code...) to interpolate the spectrum
spec2d = []
for row in spectrum:
    interp_row=interp1d(np.arange(len(row)), row, kind='cubic', bounds_error=False)
    spec2d.append(interp_row(pixs))
In [30]:
spec2d=interp(pixs)

Create the WCS from scratch

spec2d object is a 2D spectrum wavelength calibrated. To properly display it, we create a WCS for this spectrum from scratch

In [31]:
from astropy import wcs
from astropy import units as u

w = wcs.WCS(naxis=2)

w.wcs.crpix = [1, 1]

# Along x (dispersion direction) cdelt is the delta lambda increment
# Along y (the cross dispersion direction) no rebinning has been performed cdelt is 1
w.wcs.cdelt = np.array([0.8, 1])

# Starting wavelength along x is 16000A. Starting pixel is 0 (numpy like)
w.wcs.crval = [16000, 0]

w.wcs.cunit =[u.Angstrom, u.pixel]
In [32]:
ax=plt.subplot(projection=w)
ax.imshow(spec2d, cmap='autumn', norm=LogNorm())
Out[32]:
<matplotlib.image.AxesImage at 0x7fe6bf634080>
In [33]:
specfile = fits.PrimaryHDU(data=spec2d, header=w.to_header())
specfile.writeto("tmp/spectrum.fits", overwrite=True)

2D fit with numpy.polynomial

The polynomial package provides the polyfit function, just for 1D fitting. For higher dimensional fit numpy doesn't provide a function, but we can use the Pseudo-Vandermonde matrix to perform the fit.

We are going to see how to use it on our usual saddle.dat dataset.

In [34]:
x,y,z=np.loadtxt("data/saddle.dat", unpack=True)

from mpl_toolkits.mplot3d import Axes3D
#Defined a 3D projection figure
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')

#Plot the data
ax.scatter(x,y,z)
Out[34]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7fe6be99a7b8>

2D poly fit in a nutshell

Given a set of data $x_k,y_k,z_k$ we can rewrite the polynomial relation

$$ z_k = \sum_{i,j} c_{i,j}x_k^iy_k^j\hspace{2cm} (z=ax^2-by^2+c)$$

in matrix notation

$$ z = V c \hspace{2cm} z_k=V_kc$$

where V is the Pseudo-Vandermonde matrix defined by (here for details)

$$ V_k = [1, y, ..., y^m, x, xy, ..., xy^m, ..., x^n, x^ny, ..., x^ny^m]$$

Since we want to find the fit coefficients, we must solve (her for details)

$$ Vc=z \hspace{2cm} V^TVc=V^Tz \hspace{2cm} c=\underbrace{(V^TV)^{-1}V^T}_Hz$$

$(V^TV)^{-1}Vz$ in python...

In [35]:
# Pseudo-Vandermonde
V = polynomial.polyvander2d(x, y, (2, 2))

#Compute H
H = np.linalg.inv(V.T@V)@V.T

#Compute the coeffs
coeffs = H@z

#Reshape them according with polyval2d interface
coeffs.shape = (3,3)

#Plot the fitted surface
from matplotlib import cm

X,Y = np.meshgrid(np.linspace(-10,10,20), np.linspace(-10,10,20))

zfit = polynomial.polyval2d(X, Y, coeffs)
surf = ax.plot_surface(X, Y, zfit, cmap=cm.rainbow,
                       linewidth=0, alpha=0.5)
In [ ]: