Context Navigation

Changes between Version 67 and Version 68 of ippToPsps

Timestamp:: May 18, 2010, 1:15:23 PM (16 years ago)
Author:: rhenders
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ippToPsps

-              v67
+              v68
 [[PageOutline]]
 {{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the DXLayer polls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond.
+{{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the {{{DXLayer}}} polls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond.
 It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the DXLayer. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side.
+It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the {{{DXLayer}}}. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side.
 The outputs of {{{ippToPsps}}} are referred to as 'batches', and are detailed below.
 …
 == ippToPsps ==
 {{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the DXLayer.
+{{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the {{{DXLayer}}}.
 == DXLayer ==
 The DXLayer polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.
+The {{{DXLayer}}} polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.
 == ODM Loader ==
 …
  - a copy exists on the IPP cluster after generation by ippToPsps program
  - a copy exists on the IPP datastore after publication by ippToPsps
  - the DXLayer retains a copy after it has sent the csv version to the ODM
  - the DXLayer also keeps a copy of these (larger) csv files
+ - the {{{DXLayer}}} retains a copy after it has sent the csv version to the ODM
+ - the {{{DXLayer}}} also keeps a copy of these (larger) csv files
 We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore.
 …
 == Previous design ==
 Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer can also delete the same redundant data.
+Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The {{{DXLayer}}} would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the {{{DXLayer}}} could delete its copies of the same redundant data.
 The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and why and what is required to be done by the IPP.
+The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and what is required to be done by the IPP.
 == New design ==
 Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only enables loaing, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.
+Instead of creating a new datastore instance within PSPS and using the {{{DXLayer}}} as communication layer between the ODM and the IPP, we propose that the {{{DXLayer}}} forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.
 Since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information.
 Rather than waste the code already written for the DXLayer, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts.
+Rather than waste the code already written for the {{{DXLayer}}}, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts.
 The question remains of what should be done with the copies of the data currently retained by the DXLayer? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the DXLayer should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
+The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
 === Advantages over previous design ===