Context Navigation

Changes between Version 1 and Version 2 of GPC1_DataDistribution

Timestamp:: Mar 4, 2009, 4:25:16 PM (17 years ago)
Author:: bills
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

GPC1_DataDistribution

-              v1
+              v2
+Overview of the PS1-IPP Bulk Data distribution system
+------------------------------------------------------
+= GPC1 Data Distribution =
+In this note we outline our plans for distributing PS1 IPP data
+to remote sites and describe some of the key features of the software
+that we are building to facilitate this process.
+The system is based on the existence of one or more mirror IPP sites.
+A mirror site will have a copy of the magic de-streaked raw images and
+the database tables for the various processing stages. It is possible for
+a site to be populated with subsets of the data.  At this point only the
+MPG cluster in Garching is known to be ready to accept the
+full raw data volume.
+As the data is processed on Maui, distribution bundles will be
+created and posted on the MHPCC IPP Data Store. Each bundle will contain the
+results for a 'run' for a particular stage and a file containing
+the IPP database information for the run.
+The remote site will have an IPP installation that includes software
+that uses the 'Data Store protocol to manage the transfer of data bundles
+to the remote sites and tools to manage the database mirror.
+Due to the vast size of the PS1 data, not all of the images will
+be transferred to full-scale mirror sites. Instead the IPP's
+'clean and update' system will be used to allow processed images to be
+remade by re-running portions of processing steps at the mirror site.
+Distribution bundles are built in either 'full' or 'clean' state. In the full state
+all of the associated data products are included. In the clean state
+the larger images are omitted. Once the dependent products are available
+at the mirror site, a run can be set to update state which causes the images to be
+re-created.
+To insure that all of the database information remains consistent, the mirror database
+shall not be used for queuing new ipp processing runs (for example, new chipRuns
+shall not be queued for an exposure).  If such processing is desired the dependent
+data must be inserted into a different ipp database and processed from there.
+The data transfer software is being designed in such a way that a mirror
+site can serve as a source for other mirror sites.
+This 'bucket brigade' feature allows the load on the UH IfA network, computers, and
+the intercontinental network to be reduced from the level that would be required to service several
+clients.
+The 'default' set of bundles that is to be packaged has not been determined at this time.
+We expect that bundles in 'full' state will be produced for a subset of the data.
+The IPP Postage Stamp Server will be integrated with this system and will enable sites to request
+specially produced bundles using the postage stamp request system.
+The remote site will notify the IPP that a bundle has been received successfully by posting data
+on a Data Store at the site. The ipp will query this site and once all receivers give the all
+clear the distribution bundles will be purged from the data store when space is needed.
+* [wiki:GPC1_DataDistribution_Overview Overview Document by Bill Sweeney]