IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links
wiki:ippToPsps_Database

Back to ippToPsps main page

The ippToPsps database

Overview

The ippToPsps database is a central component. It is currently hosted on the same machine as gpc1, but all databases used by ippToPsps are detailed in the settings file, here, while it's schema can be found in svn, here.

cascading deletes where necessary

+---------------------+
| Tables_in_ipptopsps |
+---------------------+
| batch               | 
| box                 | 
| clients             | 
| config              | 
| detection           | 
| object              | 
| pending             | 
| stack               | 
| stripe              | 
+---------------------+

The config table

Note: Configs can be created from scratch, or edited, using any ippToPsps program by specifying edit as the config argument at start-up. More details can be found here.

The config table contains all available configurations of 'loading campaigns'. Generally this means a row for 3pi, a row for MDF, SAS etc. All ippToPsps programs require the name of a config as their first argument when run from the command line, but changes can be made to config while programs are running and they will update accordingly.

Field Type Null Key Default value Description
timestamp timestamp NO CURRENT_TIMESTAMP all clients update this timestamp regularly, like a heartbeat
name varchar(30) NO PRI NULL the text name of this config
datastore_product varchar(30) YES NULL the datastore product, thus far only PSPS_test and PSPS_JHU have been used
datastore_type varchar(30) YES IPP_PSPS the datastore type. The defauls is only ever used
datastore_publish tinyint(1) YES 0 1 or 0 - should we publish or not?
dvo_label varchar(100) YES NULL the label for querying gpc1 to get contents of the DVO database
dvo_location varchar(1000) YES NULL the location of this DVO database on disk
min_ra double YES 0 the min RA of the box on the sky we should load from (in degrees)
max_ra double YES 360 the max RA of the box on the sky we should load from (in degrees)
min_dec double YES -30 the min Dec of the box on the sky we should load from (in degrees)
max_dec double YES 90 the max Dec of the box on the sky we should load from (in degrees)
box_size double YES 4 the size of the box (in degrees) that we should use to tile up the sky
base_path varchar(1000) YES /data/ipp005.0/ipptopsps/ the base path for all data to be stored
data_release smallint(6) YES 0 data release to end up in PSPS meta tables
delete_local tinyint(1) YES 0 0 or 1 - should we delete local data when ready?
delete_datastore tinyint(1) YES 1 0 or 1 - should we delete from datastore when ready?
delete_dxlayer tinyint(1) YES 1 0 or 1 - should we delete from DXLayer when ready?
epoch timestamp NO 0000-00-00 00:00:00 when queuing, this is taken as the beginning of time. Means we can reload surveys more than once - if we are careful
survey varchar(10) YES NULL what survey? eg 3PI, MDF (see available surveys in IN table here)
psps_survey varchar(10) YES NULL PSPS survey, eg 'SA3' for SAS 3 year survey. Always a 3-character name
queue_P2 tinyint(1) YES 1 0 or 1 - should we queue P2 batches?
queue_ST tinyint(1) YES 0 0 or 1 - should we queue ST batches?
queue_OB tinyint(1) YES 0 0 or 1 - should we queue OB batches?
active tinyint(1) YES 1 defines whether this config is in active use or not

The clients table

Whenever an ippToPsps program is started, it registers with this table in the database. It does this by storing its hostname, PID and program name as shown here:

+---------------------+------------+-------+--------+--------+--------+--------+---------------------+-----+
| started             | type       | pid   | host   | config | paused | killed | timestamp           | id  |
+---------------------+------------+-------+--------+--------+--------+--------+---------------------+-----+
| 2012-03-06 22:22:09 | queue.py   | 15477 | ipp007 | new3pi |      0 |      0 | 2012-03-06 22:25:13 | 171 | 
| 2012-03-06 16:44:03 | loader.py  | 15510 | ipp033 | new3pi |      0 |      1 | 2012-03-07 07:25:51 | 166 | 
| 2012-03-05 16:54:26 | loader.py  | 25131 | ipp005 | new3pi |      0 |      0 | 2012-03-07 09:43:34 |  94 | 
| 2012-03-07 09:01:28 | pollOdm.py |  9202 | ipp006 | new3pi |      0 |      0 | 2012-03-07 10:20:11 | 174 | 
| 2012-03-07 09:05:09 | loader.py  | 25782 | ipp005 | new3pi |      0 |      0 | 2012-03-07 10:48:13 | 175 | 
+---------------------+------------+-------+--------+--------+--------+--------+---------------------+-----+

the timestamp column acts as form of 'heartbeat' for a client. All ippToPsps programs check-in with this table routinely and update timestamp so it is easy to see of a client has stopped. To manually pause or kill a client, simply update the relevant column in this table to '1'.

The queue tables

The queue.py program is in charge of queuing up unprocessed items (be they stacks, detections or whatever). It populates the tables below so that the loader.py programs can get to work.

The box table

When queuing, the sky is split into equally sized boxes, the size of which is determined in the config in use. The box tables lists the RA/Dec centers of each box, as well as the box size, for each config that currently has items queued.

+---------------------+-------+--------+-----------+------------+----------+
| timestamp           | id    | config | ra_center | dec_center | box_side |
+---------------------+-------+--------+-----------+------------+----------+
| 2012-04-03 16:47:35 | 41879 | new3pi |         2 |        -28 |        4 | 
| 2012-04-03 16:47:35 | 41880 | new3pi |         2 |        -24 |        4 | 
| 2012-04-03 16:47:35 | 41881 | new3pi |         2 |        -20 |        4 | 
| 2012-04-03 16:47:35 | 41882 | new3pi |         2 |        -16 |        4 | 
| 2012-04-03 16:47:35 | 41883 | new3pi |         2 |        -12 |        4 | 
| 2012-04-03 16:47:35 | 41884 | new3pi |         2 |         -8 |        4 | 
| 2012-04-03 16:47:35 | 41885 | new3pi |         2 |         -4 |        4 | 
| 2012-04-03 16:47:35 | 41886 | new3pi |         2 |          0 |        4 | 
| 2012-04-03 16:47:35 | 41887 | new3pi |         2 |          4 |        4 | 
| 2012-04-03 16:47:35 | 41888 | new3pi |         2 |          8 |        4 | 
| 2012-04-03 16:47:35 | 41889 | new3pi |         2 |         12 |        4 | 
etc

The config column has a foreign key constraint with name from the config table. This means if a particular config is deleted from the config table then all related entries here in the box table will also be removed automatically.

The id column is unique and is acts as a foreign key for the pending table, i.e. removal of a box here will remove all related entries in the pending table.

The stripe table

When queuing, the sky is split up into equally size boxes. Because of the way the Dvo class ingests DVO files, it is more efficient for an instance of loader.py to process boxes in stripes so that any DVO files that overlap the boundary of two neighboring boxes can be retained when the next box-worth of DVO files is ingested.

The stripe table is where loader.py instances register that stripe that they are working on. New instances of loader.py look first for stripes that are not being processed by other clients, but failing that will choose the one with the most pending items.

The pending table

the pending table is first populated by queue.py then read by instances loader.py. It consists of unique stage_ids for each batch type that has been queues for each box on the sky, as shown here:

+--------+------------+----------+
| box_id | batch_type | stage_id |
+--------+------------+----------+
|  44579 | OB         |    93141 | 
|  44579 | OB         |    93142 | 
|  44579 | OB         |    93143 | 
|  44579 | OB         |    93144 | 
|  44579 | OB         |    93145 | 
|  44579 | OB         |    93146 | 
|  44579 | OB         |    93147 | 
|  44579 | OB         |    93148 | 
|  44579 | OB         |    93149 | 
|  44579 | OB         |    93150 | 
|  44579 | OB         |    93151 | 
|  44579 | OB         |    93152 | 
|  44579 | OB         |    93153 |
etc

A foreign key constraint on box_id with the box table means that entries will be deleted automatically if a box definition is removed from the box table.

Also, loader.py instances will remove rows from this table as they begin to process them. If they fail to process the item in question for whatever reason, it will be re-queued the next time queue.py is run.

The batch tables

The batch table

batch contains an entry for every batch, regardless of type, that has ever been processed or attempted to be processed through ippToPsps. The detection, stack and object tables all have foreign key relationships with this table using the batch_id column. batch_id is an auto-incrementing integer and ends up forming the batch name of the final file submitted to PSPS.

Setting the purged column to 1 will mean this batch will be removed from the datastore and local disk by cleanup.py regardless of load status. See here.

The detection table

This table contains one row per detection (P2) batch attempted. It references the batch table (above) with a foreign key constraint on the batch_id column.

The stack table

This table contains one row per stack (ST) batch attempted. It references the batch table (above) with a foreign key constraint on the batch_id column.

The object table

This table contains one row per object (OB) batch attempted. It references the batch table (above) with a foreign key constraint on the batch_id column.

Some handy SQL

Count pending items per strip

SELECT ra_center, COUNT(*) AS num  FROM box JOIN pending ON (id = box_id) WHERE config = 'new3pi' GROUP BY ra_center;

See what each client is working on

SELECT clients.config, host, timestamp, ra_center FROM stripe JOIN clients ON (client_id = id) ORDER BY ra_center;

Counts of pending stuff for a given config

SELECT config, batch_type, COUNT(*) FROM pending JOIN box ON (id = box_id) WHERE config = 'newmd4' GROUP BY batch_type;

Find duplicated exposures

SELECT timestamp, exp_id, batch_type, batch_id,  COUNT(exp_id) AS NumOccurrences FROM detection JOIN batch USING (batch_id) WHERE dvo_db = 'LAP.ThreePi.20110809' AND loaded_to_datastore  GROUP BY exp_id HAVING NumOccurrences > 1 order by timestamp;
Last modified 14 years ago Last modified on Apr 11, 2012, 9:38:31 AM
Note: See TracWiki for help on using the wiki.