| Version 1 (modified by , 17 years ago) ( diff ) |
|---|
Overview of the PS1-IPP Bulk Data distribution system
In this note we outline our plans for distributing PS1 IPP data to remote sites and describe some of the key features of the software that we are building to facilitate this process.
The system is based on the existence of one or more mirror IPP sites.
A mirror site will have a copy of the magic de-streaked raw images and the database tables for the various processing stages. It is possible for a site to be populated with subsets of the data. At this point only the MPG cluster in Garching is known to be ready to accept the full raw data volume.
As the data is processed on Maui, distribution bundles will be created and posted on the MHPCC IPP Data Store. Each bundle will contain the results for a 'run' for a particular stage and a file containing the IPP database information for the run.
The remote site will have an IPP installation that includes software that uses the 'Data Store protocol to manage the transfer of data bundles to the remote sites and tools to manage the database mirror.
Due to the vast size of the PS1 data, not all of the images will be transferred to full-scale mirror sites. Instead the IPP's 'clean and update' system will be used to allow processed images to be remade by re-running portions of processing steps at the mirror site.
Distribution bundles are built in either 'full' or 'clean' state. In the full state all of the associated data products are included. In the clean state the larger images are omitted. Once the dependent products are available at the mirror site, a run can be set to update state which causes the images to be re-created.
To insure that all of the database information remains consistent, the mirror database shall not be used for queuing new ipp processing runs (for example, new chipRuns shall not be queued for an exposure). If such processing is desired the dependent data must be inserted into a different ipp database and processed from there.
The data transfer software is being designed in such a way that a mirror site can serve as a source for other mirror sites.
This 'bucket brigade' feature allows the load on the UH IfA network, computers, and the intercontinental network to be reduced from the level that would be required to service several clients.
The 'default' set of bundles that is to be packaged has not been determined at this time.
We expect that bundles in 'full' state will be produced for a subset of the data.
The IPP Postage Stamp Server will be integrated with this system and will enable sites to request specially produced bundles using the postage stamp request system.
The remote site will notify the IPP that a bundle has been received successfully by posting data on a Data Store at the site. The ipp will query this site and once all receivers give the all clear the distribution bundles will be purged from the data store when space is needed.
