Changes between Version 83 and Version 84 of ippToPsps
- Timestamp:
- May 18, 2010, 2:59:32 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ippToPsps
v83 v84 124 124 = Recovery system design = 125 125 126 Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by {{{ippToPsps}}} and posted on an IPP instance of the datastore. These batches are collected by the {{{DXLayer}}} on the PSPS side. As a basis for a future recovery system, the IPP urgently requires some feedback from PSPS so that it may learn which batches have succeeded and which have failed (and why). With this information data can be either deleted,or regenerated accordingly. This is important simply because, with such large data volumes, we cannot afford the high levels of redundancy currently in place. At present, for a given batch, the following copies exist within the pipeline:126 Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by {{{ippToPsps}}} and posted on an IPP instance of the datastore. These batches are collected by the {{{DXLayer}}} on the PSPS side. The IPP urgently requires some feedback from PSPS to determine which batches have succeeded and which have failed (and why they failed). With this information data can be either deleted or regenerated accordingly. This is important simply because, with such large data volumes, we cannot afford the high levels of redundancy currently in place. At present, for a given batch, the following copies exist within the pipeline: 127 127 128 128 - a copy exists on the IPP cluster after generation by ippToPsps program … … 131 131 - the {{{DXLayer}}} also keeps a copy of these (larger) csv files 132 132 133 We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore. 133 We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore. This will also form the basis for a more comprehensive recovery system, to be developed at a future date. 134 134 135 135 == Previous design == … … 181 181 Instead of creating a new datastore instance within PSPS and using the {{{DXLayer}}} as communication layer between the ODM and the IPP, we propose that the {{{DXLayer}}} forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM. 182 182 183 This design would therefore mean simplifying a major PSPS component, the {{{DXLayer}}}, but rather than waste the code already written, it could be taken and used within {{{ippToPsps}}}, for example the ODM polling scripts. We would simply be shifting responsibility over from PSPS to IPP. Over parts could be dropped completely, for example, since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information.183 This design would therefore mean simplifying a major PSPS component, the {{{DXLayer}}}, but rather than waste the code already written, it would be taken and used within {{{ippToPsps}}} (for example, the ODM polling scripts). We would simply be shifting responsibility over from PSPS to IPP. Over parts could be dropped completely. For example, since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information. 184 184 185 The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.185 The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send a list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies. 186 186 187 187 188 188 === Advantages over previous design === 189 189 190 - no need for second datastore (not a big overhead, but additional systems administration in an already complicated system).190 - no need for second datastore (not a big overhead, but it would require additional systems administration in an already complicated system). 191 191 - no need to define new XML standard that incorporates the whole array of recovery options 192 192 - no need for the {{{DXLayer}}} to poll the ODM
