IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links

Changes between Version 63 and Version 64 of ippToPsps


Ignore:
Timestamp:
May 18, 2010, 12:54:03 PM (16 years ago)
Author:
rhenders
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ippToPsps

    v63 v64  
    118118||calColorErr||FLOAT|| error in calibrating color (unit = mag)|| for future ||
    119119
     120= Recovery system design =
    120121
     122Currently, the IPP to PSPS interface is a 'one-way' system. Batches are created by the IPP and posted on an IPP instance of the datastore. These batches are collected by the DXLatyer on the PSPS side. As a basis for a future recovery system, the IPP requires some feeback from PSPS as to which batches have succeeded and which have failed, and the reason why. With this information data can be regenerated accordingly.
     123
     124Currently, for a given batch, multiple copies exists throughout the pipeline:
     125
     126- a copy exists locally on disk after generation by ippToPsps program
     127- a copy also exists on the datastore after publication by ippToPsps
     128- the DXLayer retains a copy after it has sent on a csv version to the ODM
     129- the DXLayer also keeps a copy of the (larger) csv files
     130
     131
     132With such a large data volumes, it is not practical, or necessary, for so many copies to exist. Therefore, we need to quickly implement the basics of the feedback loop described above such that the IPP can learn if a given batch has successfully been merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore.
     133
     134== Previous design ==
     135
     136Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore. The IPP, polling this, would then have a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer can also delete it's redundant data.
     137
     138The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and why and what is required by the IPP.
     139
     140== New design ==
     141
     142Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only polls the IPP datastore for new data, converts it to csv files and sends it on to the ODM. Instead, to complete the circle, the ippToPsps code will poll the ODM directly, bypassing the DXLayer altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, ippToPsps can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The DXLayer need know nothing of the how or why a certain batch is being submitted, it should just convert it and pass it along.
     143
     144Since ippToPsps will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the DXLayer, which currently has its own local database for this information.
     145
     146Rather than waste the code already written for the DXLayer, it can be copied over and used within ippToPsps, for example, the ODM polling scripts.
     147
     148The question remains of the copies of the data currently retained by the DXLayer. This can either be deleted automatically after a defined amount of time, or the IPP can send a special 'batch' which is simply a list of batches it is safe to delete or perhaps the DXLayer should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold copies.
     149
     150=== Advantages over previous design ===
     151
     152- no need for second datastore. Not a big overhead, but additional systems admin in an already complicated system.
     153- no need to define new XML standard that incorporates the whole array of recovery options.
     154- no need for rthe DXLayer to keep data at all
     155- no need fo rthe DXLayer to poll the ODM
     156- no need fo rthe DXLayer to have a database to log the batches (already done on the IPP side)
    121157
    122158= Links =