API

Common classes and utilities in The Cannon are documented here. For more details, view the source code.

CannonModel

class thecannon.CannonModel(training_set_labels, training_set_flux, training_set_ivar, vectorizer, dispersion=None, regularization=None, censors=None, **kwargs)

A model for The Cannon which includes L1 regularization and pixel censoring.

Parameters:
  • training_set_labels – A set of objects with labels known to high fidelity. This can be given as a numpy structured array, or an astropy table.
  • training_set_flux – An array of normalised fluxes for stars in the labelled set, given as shape (num_stars, num_pixels). The num_stars should match the number of rows in training_set_labels.
  • training_set_ivar – An array of inverse variances on the normalized fluxes for stars in the training set. The shape of the training_set_ivar array should match that of training_set_flux.
  • vectorizer – A vectorizer to take input labels and produce a design matrix. This should be a sub-class of vectorizer.BaseVectorizer.
  • dispersion – [optional] The dispersion values corresponding to the given pixels. If provided, this should have a size of num_pixels.
  • regularization – [optional] The strength of the L1 regularization. This should either be None, a float-type value for single regularization strength for all pixels, or a float-like array of length num_pixels.
  • censors – [optional] A dictionary containing label names as keys and boolean censoring masks as values.
censors

Return the wavelength censor masks for the labels.

design_matrix

Return the design matrix for this model.

dispersion

Return the dispersion points for all pixels.

in_convex_hull(labels)

Return whether the provided labels are inside a complex hull constructed from the labelled set.

Parameters:labels – A NxK array of N sets of K labels, where K is the number of labels that make up the vectorizer.
Returns:A boolean array as to whether the points are in the complex hull of the labelled set.
is_trained

Return true or false for whether the model is trained.

classmethod read(path, **kwargs)

Read a saved model from disk.

Parameters:path – The path where to load the model from.
regularization

Return the strength of the L1 regularization for this model.

reset()

Clear any attributes that have been trained.

s2

Return the intrinsic variance (s^2) for all pixels.

test(model, *args, **kwargs)

Run the test step on spectra.

Parameters:
  • flux – The (pseudo-continuum-normalized) spectral flux.
  • ivar – The inverse variance values for the spectral fluxes.
  • initial_labels – [optional] The initial labels to try for each spectrum. This can be a single set of initial values, or one set of initial values for each star.
  • threads – [optional] The number of parallel threads to use.
theta

Return the theta coefficients (spectral model derivatives).

train(threads=None, **kwargs)

Train the model.

Parameters:threads – [optional] The number of parallel threads to use.
Returns:A three-length tuple containing the spectral coefficients theta, the squared scatter term at each pixel s2, and metadata related to the training of each pixel.
training_set_flux

Return the training set fluxes.

training_set_ivar

Return the inverse variances of the training set fluxes.

training_set_labels

Return the labels in the training set.

vectorizer

Return the vectorizer for this model.

write(path, include_training_set_spectra=False, overwrite=False, protocol=-1)

Serialise the trained model and save it to disk. This will save all relevant training attributes, and optionally, the training data.

Parameters:
  • path – The path to save the model to.
  • include_training_set_spectra – [optional] Save the labelled set, normalised flux and inverse variance used to train the model.
  • overwrite – [optional] Overwrite the existing file path, if it already exists.
  • protocol – [optional] The Python pickling protocol to employ. Use 2 for compatibility with previous Python releases, -1 for performance.

Censoring

class thecannon.censoring.Censors(label_names, num_pixels, items=None, **kwargs)

A dictionary sub-class that allows for label censoring masks to be applied on a per-pixel basis to CannonModel objects.

Parameters:
  • label_names – A list containing the label names that form the model vectorizer.
  • num_pixels – The number of pixels per star.
  • items – [optional] A dictionary containing label names as keys and masks as values.

Utilities to deal with wavelength censoring.

class thecannon.censoring.Censors(label_names, num_pixels, items=None, **kwargs)

A dictionary sub-class that allows for label censoring masks to be applied on a per-pixel basis to CannonModel objects.

Parameters:
  • label_names – A list containing the label names that form the model vectorizer.
  • num_pixels – The number of pixels per star.
  • items – [optional] A dictionary containing label names as keys and masks as values.
thecannon.censoring.create_mask(dispersion, censored_regions)

Return a boolean censoring mask based on a structured list of (start, end) regions.

Parameters:
  • dispersion – An array of dispersion values.
  • censored_regions – A list of two-length tuples containing the (start, end) points of a censored region.
Returns:

A boolean mask indicating whether the pixels in the dispersion array are masked.

thecannon.censoring.design_matrix_mask(censors, vectorizer)

Return a mask of which indices in the design matrix columns should be used for a given pixel.

Parameters:
  • censors – A censoring dictionary.
  • vectorizer – The model vectorizer:
Returns:

A mask of which indices in the model design matrix should be used for a given pixel.

Continuum

Continuum-normalization.

thecannon.continuum.sines_and_cosines(dispersion, flux, ivar, continuum_pixels, L=1400, order=3, regions=None, fill_value=1.0, **kwargs)

Fit the flux values of pre-defined continuum pixels using a sum of sine and cosine functions.

Parameters:
  • dispersion – The dispersion values.
  • flux – The flux values for all pixels, as they correspond to the dispersion array.
  • ivar – The inverse variances for all pixels, as they correspond to the dispersion array.
  • continuum_pixels – A mask that selects pixels that should be considered as ‘continuum’.
  • L – [optional] The length scale for the sines and cosines.
  • order – [optional] The number of sine/cosine functions to use in the fit.
  • regions

    [optional] Specify sections of the spectra that should be fitted separately in each star. This may be due to gaps between CCDs, or some other physically- motivated reason. These values should be specified in the same units as the dispersion, and should be given as a list of [(start, end), …] values. For example, APOGEE spectra have gaps near the following wavelengths which could be used as regions:

    >> regions = ([15090, 15822], [15823, 16451], [16452, 16971])

  • fill_value – [optional] The continuum value to use for when no continuum was calculated for that particular pixel (e.g., the pixel is outside of the regions).
  • full_output – [optional] If set as True, then a metadata dictionary will also be returned.
Returns:

The continuum values for all pixels, and a dictionary that contains metadata about the fit.

Fitting

Fitting functions for use in The Cannon.

thecannon.fitting.fit_spectrum(flux, ivar, initial_labels, vectorizer, theta, s2, fiducials, scales, dispersion=None, **kwargs)

Fit a single spectrum by least-squared fitting.

Parameters:
  • flux – The normalized flux values.
  • ivar – The inverse variance array for the normalized fluxes.
  • initial_labels – The point(s) to initialize optimization from.
  • vectorizer – The vectorizer to use when fitting the data.
  • theta – The theta coefficients (spectral derivatives) of the trained model.
  • s2 – The pixel scatter (s^2) array for each pixel.
  • dispersion – [optional] The dispersion (e.g., wavelength) points for the normalized fluxes.
Returns:

A three-length tuple containing: the optimized labels, the covariance matrix, and metadata associated with the optimization.

thecannon.fitting.fit_pixel_fixed_scatter(flux, ivar, initial_thetas, design_matrix, regularization, censoring_mask, **kwargs)

Fit theta coefficients and noise residual for a single pixel, using an initially fixed scatter value.

Parameters:
  • flux – The normalized flux values.
  • ivar – The inverse variance array for the normalized fluxes.
  • initial_thetas – A list of initial theta values to start from, and their source. For example: `[(theta_0, “guess”), (theta_1, “old_theta”)]
  • design_matrix – The model design matrix.
  • regularization – The regularization strength to apply during optimization (Lambda).
  • censoring_mask – A per-label censoring mask for each pixel.
  • op_method – The optimization method to use. Valid options are: l_bfgs_b, powell.
  • op_kwds – A dictionary of arguments that will be provided to the optimizer.
Returns:

The optimized theta coefficients, the noise residual s2, and metadata related to the optimization process.

thecannon.fitting.fit_theta_by_linalg(flux, ivar, s2, design_matrix)

Fit theta coefficients to a set of normalized fluxes for a single pixel.

Parameters:
  • flux – The normalized fluxes for a single pixel (across many stars).
  • ivar – The inverse variance of the normalized flux values for a single pixel across many stars.
  • s2 – The noise residual (squared scatter term) to adopt in the pixel.
  • design_matrix – The model design matrix.
Returns:

The label vector coefficients for the pixel, and the inverse variance matrix.

thecannon.fitting.chi_sq(theta, design_matrix, flux, ivar, axis=None, gradient=True)

Calculate the chi-squared difference between the spectral model and flux.

Parameters:
  • theta – The theta coefficients.
  • design_matrix – The model design matrix.
  • flux – The normalized flux values.
  • ivar – The inverse variances of the normalized flux values.
  • axis – [optional] The axis to sum the chi-squared values across.
  • gradient – [optional] Return the chi-squared value and its derivatives (Jacobian).
Returns:

The chi-squared difference between the spectral model and flux, and optionally, the Jacobian.

thecannon.fitting.L1Norm_variation(theta)

Return the L1 norm of theta (except the first entry) and its derivative.

Parameters:theta – An array of finite values.
Returns:A two-length tuple containing: the L1 norm of theta (except the first entry), and the derivative of the L1 norm of theta.

Utilities

General utility functions.

thecannon.utils.short_hash(contents)

Return a short hash string of some iterable content.

Parameters:contents – The contents to calculate a hash for.
Returns:A concatenated string of 10-character length hashes for all items in the contents provided.
class thecannon.utils.wrapper(f, args, kwds, N, message=None, size=100)

A generic wrapper with a progressbar, which can be used either in serial or in parallel.

Parameters:
  • f – The function to apply.
  • args – Additional arguments to supply to the function f.
  • kwds – Keyword arguments to supply to the function f.
  • N – The number of items that will be iterated over.
  • message – [optional] An information message to log before showing the progressbar.
  • size – [optional] The width of the progressbar in characters.
Returns:

A generator.

Vectorizer

BaseVectorizer

A base vectorizer for The Cannon.

class thecannon.vectorizer.base.BaseVectorizer(label_names, terms, **kwargs)

A vectorizer class that models spectral fluxes and its derivatives.

get_label_vector(labels, *args, **kwargs)

Return the label vector based on the labels provided.

Parameters:labels – The values of the labels. These should match the length and order of the label_names attribute.
get_label_vector_derivative(labels, *args, **kwargs)

Return the derivative of the label vector with respect to the given label.

Parameters:labels – The values of the labels to calculate the label vector for.
label_names

Return the label names that are used in this vectorizer.

terms

Return the terms provided for this vectorizer.

PolynomialVectorizer

A polynomial vectorizer for The Cannon.

class thecannon.vectorizer.polynomial.PolynomialVectorizer(label_names=None, order=None, terms=None, **kwargs)

A vectorizer that models spectral fluxes as combination of polynomial terms. Note that either label_names and order must be provided, or the terms keyword argument needs to be explicitly specified.

Parameters:
  • label_names – [optional] A list of label names that are terms in the label vector.
  • order – [optional] The maximal order for the vectorizer.
  • terms – [optional] A structured list of tuples that defines the full extent of the label vector. Note that terms must be None if label_names or order are provided.
get_human_readable_label_term(term_index, label_names=None, **kwargs)

Return a human-readable form of a single term in the label vector.

Parameters:
  • term_index – The term in the label vector to return.
  • label_names – [optional] The label names to use. For example, these could be LaTeX representations of the label names.
Returns:

A human-readable string representing a single term in the label vector.

get_human_readable_label_vector(mul=u'*', pow=u'^', bracket=False)

Return a human-readable form of the label vector.

Parameters:
  • mul – [optional] String to use to represent a multiplication operator. For example, if giving LaTeX label definitions one may want to use ‘cdot’ for the mul term.
  • pow – [optional] String to use to represent a power operator.
  • bracket – [optional] Show brackets around each term.
Returns:

A human-readable string representing the label vector.

get_label_vector(labels)

Return the values of the label vector, given the scaled labels.

Parameters:labels – The scaled and offset labels to use to calculate the label vector(s). This can be a ond-dimensional vector of K labels, or a two-dimensional array of N by K labels.
get_label_vector_derivative(labels)

Return the derivatives of the label vector with respect to fluxes.

Parameters:labels – The scaled labels to calculate the label vector derivatives. This can be a one-dimensional vector of K labels (using the same order and length provided by self.label_names), or a two-dimensional array of N by K values. The returning array will be of shape (N, D), where D is the number of terms in the label vector description.
human_readable_label_vector

Return a human-readable form of the label vector.

tc command line utility

The Cannon code includes a command line utility called tc.

This command line tool can be used to fit spectra using a pre-trained model saved to disk, and to join many results into a single table of output labels.

usage: tc [-h] {fit,join} ...

The Cannon

optional arguments:
  -h, --help  show this help message and exit

action:
  Specify the action to perform.

  {fit,join}
    fit       Fit stacked spectra using a trained model.
    join      Join results from individual stars into a single table.

http://TheCannon.io

The fit argument requires the following input:

usage: tc fit [-h] [-v] [-t THREADS] [--parallel-chunks PARALLEL_CHUNKS]
              [--clobber] [--output-suffix OUTPUT_SUFFIX] [--from-filename]
              model_filename spectrum_filenames [spectrum_filenames ...]

positional arguments:
  model_filename        The path of a trained Cannon model.
  spectrum_filenames    Paths of spectra to fit.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose logging mode.
  -t THREADS, --threads THREADS
                        The number of threads to use.
  --parallel-chunks PARALLEL_CHUNKS
                        The number of spectra to fit in a chunk.
  --clobber             Overwrite existing output files.
  --output-suffix OUTPUT_SUFFIX
                        A string suffix that will be added to the spectrum
                        filenames when creating the result filename
  --from-filename       Read spectrum filenames from file

Once the test step is complete, the results from individual files will be saved to disk. For example, if a spectrum was saved to disk as spectrum.pkl, then the command tc fit cannon.model spectrum.pkl would produce an output file called spectrum.pkl.result. The tc join command can then collate the output from many *.result files into a single table:

usage: tc join [-h] [-v] [-t THREADS] [--from-filename] [--errors] [--cov]
               [--clobber]
               output_filename result_filenames [result_filenames ...]

positional arguments:
  output_filename       The path to write the output filename.
  result_filenames      Paths of result files to include.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Verbose logging mode.
  -t THREADS, --threads THREADS
                        The number of threads to use.
  --from-filename       Read result filenames from a file.
  --errors              Include formal errors in destination table.
  --cov                 Include covariance matrix in destination table.
  --clobber             Ovewrite an existing table file.