Image data (Numpy array)¶

In this exercise, we will:

have a NumPy array overview
apply array slicing and reduction methods to remove bias from one optical CCD MODS1R frame, computing the bias levels of prescan/overscan regions

Keys: NumPy, array indexing, slicing, view and reduction, astropy.stats and masked arrays

Image Data¶

In [1]:

#Open the file
from astropy.io import fits

with fits.open('data/mods1r.20171125.0026.fits') as hdulist:
    #Retrieve image data copy
    data = hdulist[0].data.copy()
data

Out[1]:

array([[   0, 1373,  488, ...,  981, 2860,    0],
       [ 645,  317,  322, ...,  320,  302,  636],
       [ 644,  321,  324, ...,  320,  305,  639],
       ...,
       [ 645,  304,  322, ...,  335,  324,  670],
       [ 639,  302,  323, ...,  334,  323,  665],
       [   0, 1332,  499, ...,  756, 1942,    0]], dtype=uint16)

Numpy array¶

np.array is a grid of values
which is indexed by a tuple of integers
the indexing order is the C language order by default (row major, items go from 1st row left to right, then 2nd row and so on)
the number of dimensions is the rank of the array
the shape of an array is a tuple of integers giving the size of the array along each dimension
shape can be easily modified

In [2]:

import numpy as np

# 1 dimensional array
a = np.array([10., 20., 30., 40., 50.])

# 2 dimensional array
b = np.array([[ 10., 20., 30., 40.,  50.], \
              [ 60., 70., 80., 90., 100.]])

# 3 dimensional array
c = np.array([[[ 10.,  20.,  30.,  40.,  50.], \
               [ 60.,  70.,  80.,  90., 100.]],\
          
              [[ 110., 120., 130., 140., 150.], \
               [ 160., 170., 180., 190., 200.]],\
            
              [[ 210., 220., 230., 240., 250.], \
               [ 260., 270., 280., 290., 300.]]])

# and so on......

In [ ]:

Array reshaping¶

reshape to obtain new array

In [3]:

#1D array
a_shape = np.arange(0, 15)

#Create 2D array by reshape
b_shape = a_shape.reshape((3, 5))

BE CAREFUL: (we will see later) this is not a new array, but a view of the the original array

In [4]:

b_shape

Out[4]:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

We can also reshape in place

In [5]:

a_shape.shape=(5,3)
a_shape

Out[5]:

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [ ]:

Array creation¶

There are several general mechanisms for creating arrays

intrinsic numpy array creation objects (e.g., arange, ones, zeros, etc.)

In [ ]:

import numpy as np
#Array full of 1.0 of length 2
np.ones((2,))
#Empty array of length 4
np.empty((4,)) 
#Array full of -infinite of length
np.full((5,), -np.inf)
z=np.zeros((3,))
#Array with the same shape of z full of 3.14
np.full_like(z, 3.14) 
#Values are generated between the half-open interval [-2.3,1.5) 0.4 is the step
np.arange(-2.3, 1.5, 0.4) 
#Returns 5 evenly spaced numbers over the closed interval [-2.3, 1.5]
np.linspace(-2.3, 1.5, 5)

In [6]:

np.linspace(-2.3, 1.5, 5)

Out[6]:

array([-2.3 , -1.35, -0.4 ,  0.55,  1.5 ])

conversion from other Python structures (e.g., lists, tuples)

In [7]:

import numpy as np
a = np.array([-1, 0, 3])

Numpy array types¶

numpy arrays comprise elements of a single data type
the data type object is accessible through the .dtype attribute

In [ ]:

# Data type
data.dtype.type  ### All items have the same type: np.array are homogenus!
# Data type name
data.dtype.name
#Memory size (byte) of each elements
data.dtype.itemsize

In [8]:

Out[8]:

array([-1,  0,  3])

array dtypes are usually automatically inferred

In [9]:

#a = np.array([-1, 0, 3])
b = np.array([-1.,0.,3.])
a.dtype, b.dtype

Out[9]:

(dtype('int64'), dtype('float64'))

array dtype can be defined by the user

In [10]:

c=np.array(a, dtype=bool)

In [11]:

Out[11]:

array([ True, False,  True])

Indexing¶

indexes start from 0

In [12]:

a = 0.8*np.arange(0, 5) #Define the array
a[0] # First element
a[4] # Last element

Out[12]:

3.2

In [13]:

 a[4]

Out[13]:

3.2

negative indexes are allowed (from right to left)

In [14]:

a[-1] # Last element
a[-5] # First element

Out[14]:

0.0

In [15]:

a[-5]

Out[15]:

0.0

slices are allowed (open right intervals)

In [16]:

#Get 2nd and 3th element [1,3)
a[1:3]

Out[16]:

array([0.8, 1.6])

In [17]:

a[1:3]

Out[17]:

array([0.8, 1.6])

Basic Slicing¶

Basic slicing extends Python’s basic concept of slicing to N dimensions. Basic slicing occurs when obj is a slice object (constructed by start:stop:step notation inside of brackets)

Slicing and view¶

simple assignments do not create copies of an array (same as Python)

In [18]:

b=a
b is a # True

Out[18]:

True

In [ ]:

slicing operations do not create copies either: they return views of the original array
an array view contains a pointer to the original data

In [19]:

c = a[3:5]
c+=1
a #changed

Out[19]:

array([0. , 0.8, 1.6, 3.4, 4.2])

In [ ]:

... back to reshape¶

reshape returns a new view object as well

N-d array¶

Just define the array for examples

In [20]:

a=10.*np.arange(0, 40).reshape((5,8))
a

Out[20]:

array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.],
       [ 80.,  90., 100., 110., 120., 130., 140., 150.],
       [160., 170., 180., 190., 200., 210., 220., 230.],
       [240., 250., 260., 270., 280., 290., 300., 310.],
       [320., 330., 340., 350., 360., 370., 380., 390.]])

retrieve a single element: row,col indexing

In [21]:

a[0][1] #Element of 1st row, 2nd column
a[0,1] #The same

Out[21]:

10.0

NOTE: arrays are used to map image data, but arrays indexing is different from image (x,y) coordinates

In [22]:

 a[0][1]

Out[22]:

10.0

Retrieve rows¶

In [23]:

a[0] #The 1st row
a[0,:]

Out[23]:

array([ 0., 10., 20., 30., 40., 50., 60., 70.])

In [24]:

a[0]

Out[24]:

array([ 0., 10., 20., 30., 40., 50., 60., 70.])

Retrieve columns¶

In [25]:

a[:,1] #The 2nd column

Out[25]:

array([ 10.,  90., 170., 250., 330.])

In [26]:

a[:,1]

Out[26]:

array([ 10.,  90., 170., 250., 330.])

Retrieve matrices¶

In [27]:

a[2:4,3:6]
a[::2,0::3]

Out[27]:

array([[  0.,  30.,  60.],
       [160., 190., 220.],
       [320., 350., 380.]])

In [ ]:

Advanced indexing¶

There are two types of advanced indexing: integer and boolean. Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

In [28]:

i = np.array([0, 2, 3])
j = np.array([1, 1, 5])
b = a[i,j]
b

Out[28]:

array([ 10., 170., 290.])

In [ ]:

In [29]:

i = np.array([True, False, False, True, True])
c = a[:,3][i] # b is not a view of a
c

Out[29]:

array([ 30., 270., 350.])

In [ ]:

Scalar operations¶

In [30]:

a=np.arange(1, 5)

sum/subtraction a-0.5
multiplication a*2.5
division a/2 (returns float)
floor division a//2 (rounded away from +infinity)
mod operation a%3
power a**2
numpy functions np.log(a)

In [31]:

-5//2.

Out[31]:

-3.0

Arrays operations¶

In [32]:

a=np.array([1., 2., 3.])
b=np.array([2., -1., 0.])

sum/subtraction a-b
multiplication a*b
float division a/b
floor division a//b
mod operation a%b
power a**2
dot product a.dot(b) or a@b
tensor product np.tensordot(a,b,axes=0)

In [33]:

np.tensordot(a,b,axes=0)

Out[33]:

array([[ 2., -1.,  0.],
       [ 4., -2.,  0.],
       [ 6., -3.,  0.]])

Reduction operations¶

Operations like sum, mean, max, ... are array reduction operations.

We call these reduction operations because operations like sum have the effect of slicing out n-D arrays from the input array, and reducing these n-D arrays to (n-m)-D arrays or scalars.

In [34]:

a=10.*np.arange(0, 40).reshape((5,8))

mean: a.mean()
std: a.std()
sum: a.sum()
min: data.min()
...

In [ ]:

Rows sum¶

In [35]:

xprofile = data.sum(axis=0)

In [36]:

 xprofile

Out[36]:

array([1998260,  974663,  999642, ..., 1013854,  982238, 2025375],
      dtype=uint64)

Columns sum¶

In [37]:

yprofile = data.sum(axis=1)
np.allclose(xprofile.sum(), yprofile.sum())

Out[37]:

True

In [38]:

xprofile.sum()

Out[38]:

8939461635

Floats comparison¶

Due to rounding errors, most floating-point numbers end up being slightly imprecise. As long as this imprecision stays small, it can usually be ignored. However, it also means that numbers expected to be equal (e.g. when calculating the same result through different correct methods) often differ slightly, and a simple equality test fails

In [39]:

a=0.15+0.15
b=0.1+0.2
a==b

Out[39]:

False

numpy.allclose: use this function to compare arrays (or scalars)

If the following equation is element-wise True

$$ |a - b| \le a_{tol} + r_{tol}|b|$$

then allclose returns True.

In [40]:

a=np.array([.5, 1., 1.05+0.45])
b=np.array([.5, 1., 1.+0.5])
a==b
np.allclose(a,b)

Out[40]:

True

In [41]:

np.allclose?

Remove MODS bias¶

Load the MODS frame¶

In [42]:

with fits.open("data/mods1r.20171125.0026.fits") as hdulist:
    image = hdulist[0].data.astype(float)

NOTE: LUCI and MODS files have been downloaded from i2a archives in Trieste.

Quadrants and prescan regions¶

In [43]:

quadrants = [image[:3088//2,:8288//2],
             image[:3088//2,8288//2:],
             image[3088//2:,:8288//2],
             image[3088//2:,8288//2:]]
prscans = [image[:3088//2,:48],
           image[:3088//2,-48:],
           image[3088//2:,:48],
           image[3088//2:,-48:]]

Single prescan/overscan region¶

For each prescan region, we compute the bias level following this recipe:

we compute the median levels along columns of the prescan region
we compute to separated means: for even and odd channels.

8 Bias levels¶

Here we compute the bias level for each channel and remove these levels from image

In [44]:

from astropy import stats

for i in range(4):
  med = np.median(prscans[i], axis=0)
  bias_even = stats.sigma_clip(med[::2], sigma=2.0, maxiters=4)
  bias_odd = stats.sigma_clip(med[1::2], sigma=2.0, maxiters=4)
  
  quadrants[i][:, ::2]-=bias_even.mean()
  quadrants[i][:, 1::2]-=bias_odd.mean()

In [45]:

fits.PrimaryHDU(data=image).writeto("tmp/bias_sub.fits", overwrite=True)

Stats¶

The astropy.stats package holds statistical functions or algorithms used in astronomy. While the scipy.stats and statsmodel packages contains a wide range of statistical tools, they are general-purpose packages and are missing some tools that are particularly useful or specific to astronomy. This package is intended to provide such functionality, but not to replace scipy.stats if its implementation satisfies astronomers’ needs.

In [ ]:

Masked array¶

masked arrays are arrays that may have missing or invalid entries.
these arrays are implemented by the NumPy module numpy.ma
can be used in image processing to mask outliers, bad pixels, cosmic
in catalog columns to mark invalid or missing entries
...

Create a masked array¶

In [46]:

a=np.ma.array([1,2,3],mask=[False,True,False])

In [ ]:

Get data¶

In [47]:

a.data

Out[47]:

array([1, 2, 3])

In [ ]:

Get mask¶

In [48]:

a.mask

Out[48]:

array([False,  True, False])

In [ ]:

`data` operations¶

sum no masked data (default)

In [49]:

a.sum()

Out[49]:

sum all data

In [50]:

a.data.sum()

Out[50]:

apply numpy functions

In [51]:

np.log(a)

Out[51]:

masked_array(data=[0.0, --, 1.0986122886681098],
             mask=[False,  True, False],
       fill_value=1e+20)

`mask` operations¶

get no masked data

In [52]:

a[~a.mask]

Out[52]:

masked_array(data=[1, 3],
             mask=[False, False],
       fill_value=999999)

change mask

In [53]:

a.mask=[True, False, False]

change mask assigning a vald value

In [54]:

a[0]=2
a

Out[54]:

masked_array(data=[2, 2, 3],
             mask=[False, False, False],
       fill_value=999999)

In [ ]:

Image data (Numpy array)¶

Image Data¶

Numpy array¶

Array reshaping¶

Array creation¶

Numpy array types¶

Indexing¶

Basic Slicing¶

Slicing and view¶

... back to reshape¶

N-d array¶

Retrieve rows¶

Retrieve columns¶

Retrieve matrices¶

Advanced indexing¶

Scalar operations¶

Arrays operations¶

Reduction operations¶

Rows sum¶

Columns sum¶

Floats comparison¶

Remove MODS bias¶

Load the MODS frame¶

Quadrants and prescan regions¶

Single prescan/overscan region¶

8 Bias levels¶

Stats¶

Masked array¶

Create a masked array¶

Get data¶

Get mask¶

data operations¶

mask operations¶

`data` operations¶

`mask` operations¶