| | 122 | Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by the IPP and posted on an IPP instance of the datastore. These batches are collected by the DXLatyer on the PSPS side. As a basis for a future recovery system, the IPP requires some feeback from PSPS as to which batches have succeeded and which have failed, and the reason why. With this information data can be regenerated accordingly. |
| | 123 | |
| | 124 | Currently, for a given batch, multiple copies exists throughout the pipeline: |
| | 125 | |
| | 126 | - a copy exists locally on disk after generation by ippToPsps program |
| | 127 | - a copy also exists on the datastore after publication by ippToPsps |
| | 128 | - the DXLayer retains a copy after it has sent on a csv version to the ODM |
| | 129 | - the DXLayer also keeps a copy of the (larger) csv files |
| | 130 | |
| | 131 | |
| | 132 | With such a large data volumes, it is not practical, or necessary, for so many copies to exist. Therefore, we need to quickly implement the basics of the feedback loop described above such that the IPP can learn if a given batch has successfully been merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore. |
| | 133 | |
| | 134 | == Previous design == |
| | 135 | |
| | 136 | Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore. The IPP, polling this, would then have a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer can also delete it's redundant data. |
| | 137 | |
| | 138 | The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and why and what is required by the IPP. |
| | 139 | |
| | 140 | == New design == |
| | 141 | |
| | 142 | Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only polls the IPP datastore for new data, converts it to csv files and sends it on to the ODM. Instead, to complete the circle, the ippToPsps code will poll the ODM directly, bypassing the DXLayer altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, ippToPsps can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The DXLayer need know nothing of the how or why a certain batch is being submitted, it should just convert it and pass it along. |
| | 143 | |
| | 144 | Since ippToPsps will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the DXLayer, which currently has its own local database for this information. |
| | 145 | |
| | 146 | Rather than waste the code already written for the DXLayer, it can be copied over and used within ippToPsps, for example, the ODM polling scripts. |
| | 147 | |
| | 148 | The question remains of the copies of the data currently retained by the DXLayer. This can either be deleted automatically after a defined amount of time, or the IPP can send a special 'batch' which is simply a list of batches it is safe to delete or perhaps the DXLayer should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold copies. |
| | 149 | |
| | 150 | === Advantages over previous design === |
| | 151 | |
| | 152 | - no need for second datastore. Not a big overhead, but additional systems admin in an already complicated system. |
| | 153 | - no need to define new XML standard that incorporates the whole array of recovery options. |
| | 154 | - no need for rthe DXLayer to keep data at all |
| | 155 | - no need fo rthe DXLayer to poll the ODM |
| | 156 | - no need fo rthe DXLayer to have a database to log the batches (already done on the IPP side) |