Context Navigation

Changes between Version 83 and Version 84 of ippToPsps

Timestamp:: May 18, 2010, 2:59:32 PM (16 years ago)
Author:: rhenders
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ippToPsps

-              v83
+              v84
 = Recovery system design =
 Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by {{{ippToPsps}}} and posted on an IPP instance of the datastore. These batches are collected by the {{{DXLayer}}} on the PSPS side. As a basis for a future recovery system, the IPP urgently requires some feedback from PSPS so that it may learn which batches have succeeded and which have failed (and why). With this information data can be either deleted, or regenerated accordingly. This is important simply because, with such large data volumes, we cannot afford the high levels of redundancy currently in place. At present, for a given batch, the following copies exist within the pipeline:
+Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by {{{ippToPsps}}} and posted on an IPP instance of the datastore. These batches are collected by the {{{DXLayer}}} on the PSPS side. The IPP urgently requires some feedback from PSPS to determine which batches have succeeded and which have failed (and why they failed). With this information data can be either deleted or regenerated accordingly. This is important simply because, with such large data volumes, we cannot afford the high levels of redundancy currently in place. At present, for a given batch, the following copies exist within the pipeline:
  - a copy exists on the IPP cluster after generation by ippToPsps program
 …
  - the {{{DXLayer}}} also keeps a copy of these (larger) csv files
 We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore.
+We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore. This will also form the basis for a more comprehensive recovery system, to be developed at a future date.
 == Previous design ==
 …
 Instead of creating a new datastore instance within PSPS and using the {{{DXLayer}}} as communication layer between the ODM and the IPP, we propose that the {{{DXLayer}}} forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.
 This design would therefore mean simplifying a major PSPS component, the {{{DXLayer}}}, but rather than waste the code already written, it could be taken and used within {{{ippToPsps}}}, for example the ODM polling scripts. We would simply be shifting responsibility over from PSPS to IPP. Over parts could be dropped completely, for example, since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information.
+This design would therefore mean simplifying a major PSPS component, the {{{DXLayer}}}, but rather than waste the code already written, it would be taken and used within {{{ippToPsps}}} (for example, the ODM polling scripts). We would simply be shifting responsibility over from PSPS to IPP. Over parts could be dropped completely. For example, since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information.
 The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
+The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send a list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.
 === Advantages over previous design ===
  - no need for second datastore (not a big overhead, but additional systems administration in an already complicated system).
+ - no need for second datastore (not a big overhead, but it would require additional systems administration in an already complicated system).
  - no need to define new XML standard that incorporates the whole array of recovery options
  - no need for the {{{DXLayer}}} to poll the ODM