IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links

Changes between Version 67 and Version 68 of ippToPsps


Ignore:
Timestamp:
May 18, 2010, 1:15:23 PM (16 years ago)
Author:
rhenders
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ippToPsps

    v67 v68  
    44[[PageOutline]]
    55
    6 {{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the DXLayer polls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond.
     6{{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the {{{DXLayer}}} polls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond.
    77
    8 It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the DXLayer. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side.
     8It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the {{{DXLayer}}}. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side.
    99
    1010The outputs of {{{ippToPsps}}} are referred to as 'batches', and are detailed below.
     
    3838== ippToPsps ==
    3939
    40 {{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the DXLayer.
     40{{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the {{{DXLayer}}}.
    4141
    4242== DXLayer ==
    4343
    44 The DXLayer polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.
     44The {{{DXLayer}}} polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.
    4545
    4646== ODM Loader ==
     
    126126 - a copy exists on the IPP cluster after generation by ippToPsps program
    127127 - a copy exists on the IPP datastore after publication by ippToPsps
    128  - the DXLayer retains a copy after it has sent the csv version to the ODM
    129  - the DXLayer also keeps a copy of these (larger) csv files
     128 - the {{{DXLayer}}} retains a copy after it has sent the csv version to the ODM
     129 - the {{{DXLayer}}} also keeps a copy of these (larger) csv files
    130130
    131131We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore.
     
    133133== Previous design ==
    134134
    135 Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer can also delete the same redundant data.
     135Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The {{{DXLayer}}} would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the {{{DXLayer}}} could delete its copies of the same redundant data.
    136136
    137 The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and why and what is required to be done by the IPP.
     137The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and what is required to be done by the IPP.
    138138
    139139== New design ==
    140140
    141 Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only enables loaing, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.
     141Instead of creating a new datastore instance within PSPS and using the {{{DXLayer}}} as communication layer between the ODM and the IPP, we propose that the {{{DXLayer}}} forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.
    142142
    143143Since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information.
    144144
    145 Rather than waste the code already written for the DXLayer, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts.
     145Rather than waste the code already written for the {{{DXLayer}}}, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts.
    146146
    147 The question remains of what should be done with the copies of the data currently retained by the DXLayer? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the DXLayer should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
     147The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
    148148
    149149=== Advantages over previous design ===