IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links

Changes between Version 1 and Version 2 of GPC1_DataDistribution


Ignore:
Timestamp:
Mar 4, 2009, 4:25:16 PM (17 years ago)
Author:
bills
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GPC1_DataDistribution

    v1 v2  
    1 Overview of the PS1-IPP Bulk Data distribution system
    2 ------------------------------------------------------
     1= GPC1 Data Distribution =
    32
    4 In this note we outline our plans for distributing PS1 IPP data
    5 to remote sites and describe some of the key features of the software
    6 that we are building to facilitate this process.
    7 
    8 The system is based on the existence of one or more mirror IPP sites.
    9 
    10 A mirror site will have a copy of the magic de-streaked raw images and
    11 the database tables for the various processing stages. It is possible for
    12 a site to be populated with subsets of the data.  At this point only the
    13 MPG cluster in Garching is known to be ready to accept the
    14 full raw data volume.
    15 
    16 As the data is processed on Maui, distribution bundles will be
    17 created and posted on the MHPCC IPP Data Store. Each bundle will contain the
    18 results for a 'run' for a particular stage and a file containing
    19 the IPP database information for the run.
    20 
    21 The remote site will have an IPP installation that includes software
    22 that uses the 'Data Store protocol to manage the transfer of data bundles
    23 to the remote sites and tools to manage the database mirror.
    24 
    25 Due to the vast size of the PS1 data, not all of the images will
    26 be transferred to full-scale mirror sites. Instead the IPP's
    27 'clean and update' system will be used to allow processed images to be
    28 remade by re-running portions of processing steps at the mirror site.
    29 
    30 Distribution bundles are built in either 'full' or 'clean' state. In the full state
    31 all of the associated data products are included. In the clean state
    32 the larger images are omitted. Once the dependent products are available
    33 at the mirror site, a run can be set to update state which causes the images to be
    34 re-created.
    35 
    36 To insure that all of the database information remains consistent, the mirror database
    37 shall not be used for queuing new ipp processing runs (for example, new chipRuns
    38 shall not be queued for an exposure).  If such processing is desired the dependent
    39 data must be inserted into a different ipp database and processed from there.
    40 
    41 The data transfer software is being designed in such a way that a mirror
    42 site can serve as a source for other mirror sites.
    43 
    44 This 'bucket brigade' feature allows the load on the UH IfA network, computers, and
    45 the intercontinental network to be reduced from the level that would be required to service several
    46 clients.
    47 
    48 The 'default' set of bundles that is to be packaged has not been determined at this time.
    49 
    50 We expect that bundles in 'full' state will be produced for a subset of the data.
    51 
    52 The IPP Postage Stamp Server will be integrated with this system and will enable sites to request
    53 specially produced bundles using the postage stamp request system.
    54 
    55 The remote site will notify the IPP that a bundle has been received successfully by posting data
    56 on a Data Store at the site. The ipp will query this site and once all receivers give the all
    57 clear the distribution bundles will be purged from the data store when space is needed.
     3* [wiki:GPC1_DataDistribution_Overview Overview Document by Bill Sweeney]