wiki:ippToPsps

Context Navigation

Version 69 (modified by rhenders, 16 years ago) ( diff )
--

IPP to PSPS interface: `ippToPsps`

IPP to PSPS interface: ippToPsps
Configuration
Architecture
Notes about the different batch types
1. Detections
2. Diffs
Unresolved fields
1. Unresolved fields for camera stage detections
Recovery system design
1. Previous design
2. New design
  1. Advantages over previous design
Links

ippToPsps is the interface between IPP and PSPS. In short, ippToPsps creates FITS files from IPP data, then publishes them to a datastore in the form of batches. On the PSPS side, the DXLayer polls the datastore, collects batches when they become available, then converts the contents to csv files before sending them on to SQL Server loader software, which merges them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which ippToPsps will need to respond.

It is intended that the binary tables in the FITS files generated by ippToPsps match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect ippToPsps code, and not the DXLayer. A certain amount of data validation will be performed by ippToPsps before publication, more validation occurring at the loading and merge stages on the PSPS side.

The outputs of ippToPsps are referred to as 'batches', and are detailed below.

Batch name	PSPS name	Description	IPP Source
Initialisation	IN	Metadata for the other batches eg filter ID, survey ID etc	generated from XML config
Detection	P2	Single exposure detections	generated from one `smf` file per exposure plus associated DVO database
Difference	?	difference image detections	generated from one `cmf` file per skycell per exposure
Stack	ST	stack image detections	generated from...

Configuration

Due to the potential for changes in both input and output for ippToPsps, the code is heavily configurable. Configuration files are in an XML format as this affords the most flexibility (human and machine readable, expandable, self-describing etc). ippToPsps is pointed to a config directory, under which subdirectories for each batch type hold the various XML config files.

Table shapes

All FITS tables mirror PSPS database tables. Since the PSPS schema will probably remain in a state of flux for some time, rather than hard-coding table descriptions, instead ippToPsps reads table shapes from an XML config. This config can be regenerated from the master PSPS schema using a Perl script (pspsSchema2xml.pl) in the scripts directory. The same script also generates C header files for each batch-type. These headers contains enums for each PSPS table and are used by the code at runtime. This helps minimise code changes.

Initialisation data

The table shapes of the initialisation batch are handled as above. The actual initialisation data (lists of filters etc), which is liable to change, is held in a config and used by ippToPsps to populate the tables in the FITS file. This data is also used when generating other batch types, detections for example, as look-up tables for setting survey ID etc.

IPP to PSPS mappings

Most data to be loaded into the FITS tables comes from IPP smf or cmf files. For many columns, there is a direct mapping between these files and the PSPS database column. These mappings are detailed in a config.

Architecture

ippToPsps

ippToPsps is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of ippToPsps is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a batch, then publishes it to the datastore, ready for collection by the DXLayer.

DXLayer

The DXLayer polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.

ODM Loader

Performs validation on incoming data based on metadata previously loaded as an initialisation batch (see above). If validation is successful, new batches are merged into the PSPS database. One basic requirement of the ODM is that all detections in a detection batch have unique object IDS. Object IDs are assigned by the IPP DVO to each detection on a chip. The number is formulated from the Ra and Dec of the detection.

Notes about the different batch types

Detections

The input for the detection batch is one IPP camera-stage smf file for a given exposure, as well as an associated DVO database from which to retrieve object, and other, IDs. One FITS file is generated for each exposure. The extensions are:

1 Primary extension
1 FrameMeta extension
1 ImageMeta extension per chip
1 Detection extension per chip
1 SkinnyObject extension per chip
1 ObjectCalColor extension per chip

So, 242 extensions in all, including the obligatory primary header. The 'object ID' is featured in the last three tables, and must remain unique across the exposure (it is generated within DVO). In the merged PSPS database, the primary key on the detections table is both the object ID and detection ID, meaning the the same object can appear in multiple, overlapping exposures as they will have different detection IDs.

Diffs

The input for difference batches is a set of cmf files, one for each skycell covered by a particular exposure. A FITS output file is generated with the following extensions:

Unresolved fields

Below are tables detailing which fields in the PSPS FITS files are still not populated by ippToPsps.

Unresolved fields for camera stage detections

PSPS field	PSPS type	PSPS Description	Comments
`FrameMeta`
frameName	STRING	frame name provided by camera software
cameraID	SHORT	camera identifier	1?
cameraConfigID	SHORT	camera configuration identifier
analysisVer	STRING	IPP software analysis release	need added to smf?
p1Recp	STRING	IPP phase 1 MD5 Checksum	need added to smf?
p2Recip	STRING	IPP phase 2 MD5 Checksum	need added to smf?
p3Recip	STRING	IPP phase 3 MD5 Checksum	need added to smf?
numPhotoRef	LONG	number of photometric reference sources
calibModNum	SHORT	calibration modification number	for future
dataRelease	BYTE	Data release	for future
`ImageMeta`
photoCalID	LONG	photometry reduction code identifier	will use IPP dvo.photcodes in PSPS init batch
bias	FLOAT	detector bias level (unit = ADU)	need added to smf?
biasScat	FLOAT	scatter in bias level (unit = ADU)	need added to smf?
numPhotoRef	LONG	number of photometric reference sources
psfModelID	LONG	PSF model identifier	need from smf?
momentTheta	FLOAT	model PSF parameters at chip center (unit = deg)	have major/minor, but angle?
detectorID	SHORT	identifier for actual CCD chip
qaFlags	LONG	Q/A flags for this OTA	need from DVO?
calibModNum	SHORT	calibration modification number	for future
dataRelease	BYTE	Data release	for future
`Detection`
psfLikelihood	FLOAT	PSF likelihood	need in smf
momentWidMajor	FLOAT	PSF width in major axis from moments (unit = arcsec)	only have MOMENT_XX/XY/YY in psf table
momentWidMinor	FLOAT	PSF width in minot axis from moments (unit = arcsec)	only have MOMENT_XX/XY/YY in psf table
momentTheta	FLOAT	PSF orientation angle from moments (unit = deg)	same as 'ANGLE' used for psf?
crLikelihood	FLOAT	Likelihood the source is a cosmic ray	need added to smf?
infoFlag	LONG	flag indicating provenance information
historyModeNum	SHORT	modification number in the O-D association history	for future
dataRelease	BYTE	Data release when this detection was originally taken. Recalibrations do not affect this value.	for future
`SkinnyObject`
projectionCellID	LONG	projection cell identifier at discovery time	???
`ObjectCalColor`
calColor	FLOAT	color adopted for magnitude calculation (unit = mag)	for future
calColorErr	FLOAT	error in calibrating color (unit = mag)	for future

Recovery system design

Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by ippToPsps and posted on an IPP instance of the datastore. These batches are collected by the DXLayer on the PSPS side. As a basis for a future recovery system, the IPP urgently requires some feedback from PSPS so that it may learn which batches have succeeded and which have failed, and the reason why. With this information data can be either deleted, or regenerated accordingly. This is important simply because, with such large data volumes, we cannot afford the high levels of redundancy currently in place. At present, for a given batch, the following copies exist within the pipeline:

a copy exists on the IPP cluster after generation by ippToPsps program
a copy exists on the IPP datastore after publication by ippToPsps
the DXLayer retains a copy after it has sent the csv version to the ODM
the DXLayer also keeps a copy of these (larger) csv files

We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore.

Previous design

Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, ippToPsps could acquire a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer could delete its copies of the same redundant data.

The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and what is required to be done by the IPP.

New design

Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the ippToPsps code will poll the ODM directly, bypassing the DXLayer altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, ippToPsps can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The DXLayer need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.

Since ippToPsps will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the DXLayer, which currently has its own local database for this information.

Rather than waste the code already written for the DXLayer, it can be used within ippToPsps, for example, the ODM polling scripts.

The question remains of what should be done with the copies of the data currently retained by the DXLayer? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the DXLayer should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.

Advantages over previous design

no need for second datastore (not a big overhead, but additional systems administration in an already complicated system).
no need to define new XML standard that incorporates the whole array of recovery options
no need for the DXLayer to poll the ODM
no need for the DXLayer to have a database to log the batches (already done on the IPP side)
no need for the DXLayer to keep data at all?

Context Navigation

IPP to PSPS interface: `ippToPsps`

Configuration

Table shapes

Initialisation data

IPP to PSPS mappings

Architecture

ippToPsps

DXLayer

ODM Loader

Notes about the different batch types

Detections

Diffs

Unresolved fields

Unresolved fields for camera stage detections

Recovery system design

Previous design

New design

Advantages over previous design

Links

Attachments (2)

Download in other formats:

Context Navigation

IPP to PSPS interface: ippToPsps

Configuration

Table shapes

Initialisation data

IPP to PSPS mappings

Architecture

ippToPsps

DXLayer

ODM Loader

Notes about the different batch types

Detections

Diffs

Unresolved fields

Unresolved fields for camera stage detections

Recovery system design

Previous design

New design

Advantages over previous design

Links

Attachments (2)

Download in other formats:

IPP to PSPS interface: `ippToPsps`