Image data (Numpy array)

In this exercise, we will:

  1. have a NumPy array overview
  2. apply array slicing and reduction methods to remove bias from one optical CCD MODS1R frame, computing the bias levels of prescan/overscan regions

Keys: NumPy, array indexing, slicing, view and reduction, astropy.stats and masked arrays

Image Data

In [1]:
#Open the file
from astropy.io import fits

with fits.open('data/mods1r.20171125.0026.fits') as hdulist:
    #Retrieve image data copy
    data = hdulist[0].data.copy()
data
Out[1]:
array([[   0, 1373,  488, ...,  981, 2860,    0],
       [ 645,  317,  322, ...,  320,  302,  636],
       [ 644,  321,  324, ...,  320,  305,  639],
       ...,
       [ 645,  304,  322, ...,  335,  324,  670],
       [ 639,  302,  323, ...,  334,  323,  665],
       [   0, 1332,  499, ...,  756, 1942,    0]], dtype=uint16)

Numpy array

  • np.array is a grid of values
  • which is indexed by a tuple of integers
  • the indexing order is the C language order by default (row major, items go from 1st row left to right, then 2nd row and so on)
  • the number of dimensions is the rank of the array
  • the shape of an array is a tuple of integers giving the size of the array along each dimension
  • shape can be easily modified

In [2]:
import numpy as np

# 1 dimensional array
a = np.array([10., 20., 30., 40., 50.])

# 2 dimensional array
b = np.array([[ 10., 20., 30., 40.,  50.], \
              [ 60., 70., 80., 90., 100.]])

# 3 dimensional array
c = np.array([[[ 10.,  20.,  30.,  40.,  50.], \
               [ 60.,  70.,  80.,  90., 100.]],\
          
              [[ 110., 120., 130., 140., 150.], \
               [ 160., 170., 180., 190., 200.]],\
            
              [[ 210., 220., 230., 240., 250.], \
               [ 260., 270., 280., 290., 300.]]])

# and so on......
In [ ]:
 

Array reshaping

  • reshape to obtain new array
In [3]:
#1D array
a_shape = np.arange(0, 15)

#Create 2D array by reshape
b_shape = a_shape.reshape((3, 5))

BE CAREFUL: (we will see later) this is not a new array, but a view of the the original array

In [4]:
b_shape
Out[4]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

We can also reshape in place

In [5]:
a_shape.shape=(5,3)
a_shape
Out[5]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])
In [ ]:
 

Array creation

There are several general mechanisms for creating arrays

In [ ]:
import numpy as np
#Array full of 1.0 of length 2
np.ones((2,))
#Empty array of length 4
np.empty((4,)) 
#Array full of -infinite of length
np.full((5,), -np.inf)
z=np.zeros((3,))
#Array with the same shape of z full of 3.14
np.full_like(z, 3.14) 
#Values are generated between the half-open interval [-2.3,1.5) 0.4 is the step
np.arange(-2.3, 1.5, 0.4) 
#Returns 5 evenly spaced numbers over the closed interval [-2.3, 1.5]
np.linspace(-2.3, 1.5, 5)
In [6]:
np.linspace(-2.3, 1.5, 5)
Out[6]:
array([-2.3 , -1.35, -0.4 ,  0.55,  1.5 ])
  • conversion from other Python structures (e.g., lists, tuples)
In [7]:
import numpy as np
a = np.array([-1, 0, 3])

Numpy array types

  • numpy arrays comprise elements of a single data type
  • the data type object is accessible through the .dtype attribute
In [ ]:
# Data type
data.dtype.type  ### All items have the same type: np.array are homogenus!
# Data type name
data.dtype.name
#Memory size (byte) of each elements
data.dtype.itemsize
In [8]:
a
Out[8]:
array([-1,  0,  3])
  • array dtypes are usually automatically inferred
In [9]:
#a = np.array([-1, 0, 3])
b = np.array([-1.,0.,3.])
a.dtype, b.dtype
Out[9]:
(dtype('int64'), dtype('float64'))
  • array dtype can be defined by the user
In [10]:
c=np.array(a, dtype=bool)
In [11]:
c
Out[11]:
array([ True, False,  True])

Indexing

  • indexes start from 0
In [12]:
a = 0.8*np.arange(0, 5) #Define the array
a[0] # First element
a[4] # Last element
Out[12]:
3.2
In [13]:
 a[4]
Out[13]:
3.2
  • negative indexes are allowed (from right to left)
In [14]:
a[-1] # Last element
a[-5] # First element
Out[14]:
0.0
In [15]:
a[-5]
Out[15]:
0.0
  • slices are allowed (open right intervals)
In [16]:
#Get 2nd and 3th element [1,3)
a[1:3]
Out[16]:
array([0.8, 1.6])
In [17]:
a[1:3]
Out[17]:
array([0.8, 1.6])

Basic Slicing

Basic slicing extends Python’s basic concept of slicing to N dimensions. Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets)

Slicing and view

  • simple assignments do not create copies of an array (same as Python)
In [18]:
b=a
b is a # True
Out[18]:
True
In [ ]:
 
  • slicing operations do not create copies either: they return views of the original array
  • an array view contains a pointer to the original data
In [19]:
c = a[3:5]
c+=1
a #changed
Out[19]:
array([0. , 0.8, 1.6, 3.4, 4.2])
In [ ]:
 

... back to reshape

reshape returns a new view object as well

N-d array

Just define the array for examples

In [20]:
a=10.*np.arange(0, 40).reshape((5,8))
a
Out[20]:
array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.],
       [ 80.,  90., 100., 110., 120., 130., 140., 150.],
       [160., 170., 180., 190., 200., 210., 220., 230.],
       [240., 250., 260., 270., 280., 290., 300., 310.],
       [320., 330., 340., 350., 360., 370., 380., 390.]])
  • retrieve a single element: row,col indexing
In [21]:
a[0][1] #Element of 1st row, 2nd column
a[0,1] #The same
Out[21]:
10.0

NOTE: arrays are used to map image data, but arrays indexing is different from image (x,y) coordinates

In [22]:
 a[0][1]
Out[22]:
10.0

Retrieve rows

In [23]:
a[0] #The 1st row
a[0,:]
Out[23]:
array([ 0., 10., 20., 30., 40., 50., 60., 70.])
In [24]:
a[0]
Out[24]:
array([ 0., 10., 20., 30., 40., 50., 60., 70.])

Retrieve columns

In [25]:
a[:,1] #The 2nd column
Out[25]:
array([ 10.,  90., 170., 250., 330.])
In [26]:
a[:,1]
Out[26]:
array([ 10.,  90., 170., 250., 330.])

Retrieve matrices

In [27]:
a[2:4,3:6]
a[::2,0::3]
Out[27]:
array([[  0.,  30.,  60.],
       [160., 190., 220.],
       [320., 350., 380.]])
In [ ]:
 

Advanced indexing

There are two types of advanced indexing: integer and boolean. Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

In [28]:
i = np.array([0, 2, 3])
j = np.array([1, 1, 5])
b = a[i,j]
b
Out[28]:
array([ 10., 170., 290.])
In [ ]:
 

In [29]:
i = np.array([True, False, False, True, True])
c = a[:,3][i] # b is not a view of a
c
Out[29]:
array([ 30., 270., 350.])
In [ ]:
 

Scalar operations

In [30]:
a=np.arange(1, 5)
  • sum/subtraction a-0.5
  • multiplication a*2.5
  • division a/2 (returns float)
  • floor division a//2 (rounded away from +infinity)
  • mod operation a%3
  • power a**2
  • numpy functions np.log(a)
In [31]:
-5//2.
Out[31]:
-3.0

Arrays operations

In [32]:
a=np.array([1., 2., 3.])
b=np.array([2., -1., 0.])
  • sum/subtraction a-b
  • multiplication a*b
  • float division a/b
  • floor division a//b
  • mod operation a%b
  • power a**2
  • dot product a.dot(b) or a@b
  • tensor product np.tensordot(a,b,axes=0)
In [33]:
np.tensordot(a,b,axes=0)
Out[33]:
array([[ 2., -1.,  0.],
       [ 4., -2.,  0.],
       [ 6., -3.,  0.]])

Reduction operations

Operations like sum, mean, max, ... are array reduction operations.

We call these reduction operations because operations like sum have the effect of slicing out n-D arrays from the input array, and reducing these n-D arrays to (n-m)-D arrays or scalars.

In [34]:
a=10.*np.arange(0, 40).reshape((5,8))
  • mean: a.mean()
  • std: a.std()
  • sum: a.sum()
  • min: data.min()
  • ...
In [ ]:
 

Rows sum

In [35]:
xprofile = data.sum(axis=0)
In [36]:
 xprofile
Out[36]:
array([1998260,  974663,  999642, ..., 1013854,  982238, 2025375],
      dtype=uint64)

Columns sum

In [37]:
yprofile = data.sum(axis=1)
np.allclose(xprofile.sum(), yprofile.sum())
Out[37]:
True
In [38]:
xprofile.sum()
Out[38]:
8939461635

Floats comparison

Due to rounding errors, most floating-point numbers end up being slightly imprecise. As long as this imprecision stays small, it can usually be ignored. However, it also means that numbers expected to be equal (e.g. when calculating the same result through different correct methods) often differ slightly, and a simple equality test fails

In [39]:
a=0.15+0.15
b=0.1+0.2
a==b
Out[39]:
False

numpy.allclose: use this function to compare arrays (or scalars)

If the following equation is element-wise True

$$ |a - b| \le a_{tol} + r_{tol}|b|$$

then allclose returns True.

In [40]:
a=np.array([.5, 1., 1.05+0.45])
b=np.array([.5, 1., 1.+0.5])
a==b
np.allclose(a,b)
Out[40]:
True
In [41]:
np.allclose?

Remove MODS bias

Load the MODS frame

In [42]:
with fits.open("data/mods1r.20171125.0026.fits") as hdulist:
    image = hdulist[0].data.astype(float)

NOTE: LUCI and MODS files have been downloaded from i2a archives in Trieste.

Quadrants and prescan regions

In [43]:
quadrants = [image[:3088//2,:8288//2],
             image[:3088//2,8288//2:],
             image[3088//2:,:8288//2],
             image[3088//2:,8288//2:]]
prscans = [image[:3088//2,:48],
           image[:3088//2,-48:],
           image[3088//2:,:48],
           image[3088//2:,-48:]]

Single prescan/overscan region

For each prescan region, we compute the bias level following this recipe:

  1. we compute the median levels along columns of the prescan region
  2. we compute to separated means: for even and odd channels.

8 Bias levels

Here we compute the bias level for each channel and remove these levels from image

In [44]:
from astropy import stats

for i in range(4):
  med = np.median(prscans[i], axis=0)
  bias_even = stats.sigma_clip(med[::2], sigma=2.0, maxiters=4)
  bias_odd = stats.sigma_clip(med[1::2], sigma=2.0, maxiters=4)
  
  quadrants[i][:, ::2]-=bias_even.mean()
  quadrants[i][:, 1::2]-=bias_odd.mean()
In [45]:
fits.PrimaryHDU(data=image).writeto("tmp/bias_sub.fits", overwrite=True)

Stats

The astropy.stats package holds statistical functions or algorithms used in astronomy. While the scipy.stats and statsmodel packages contains a wide range of statistical tools, they are general-purpose packages and are missing some tools that are particularly useful or specific to astronomy. This package is intended to provide such functionality, but not to replace scipy.stats if its implementation satisfies astronomers’ needs.

In [ ]:
 

Masked array

  • masked arrays are arrays that may have missing or invalid entries.
  • these arrays are implemented by the NumPy module numpy.ma
  • can be used in image processing to mask outliers, bad pixels, cosmic
  • in catalog columns to mark invalid or missing entries
  • ...

Create a masked array

In [46]:
a=np.ma.array([1,2,3],mask=[False,True,False])
In [ ]:
 

Get data

In [47]:
a.data
Out[47]:
array([1, 2, 3])
In [ ]:
 

Get mask

In [48]:
a.mask
Out[48]:
array([False,  True, False])
In [ ]:
 

data operations

  • sum no masked data (default)
In [49]:
a.sum()
Out[49]:
4
  • sum all data
In [50]:
a.data.sum()
Out[50]:
6
  • apply numpy functions
In [51]:
np.log(a)
Out[51]:
masked_array(data=[0.0, --, 1.0986122886681098],
             mask=[False,  True, False],
       fill_value=1e+20)

mask operations

  • get no masked data
In [52]:
a[~a.mask]
Out[52]:
masked_array(data=[1, 3],
             mask=[False, False],
       fill_value=999999)
  • change mask
In [53]:
a.mask=[True, False, False]
  • change mask assigning a vald value
In [54]:
a[0]=2
a
Out[54]:
masked_array(data=[2, 2, 3],
             mask=[False, False, False],
       fill_value=999999)
In [ ]: