Changes between Version 67 and Version 68 of ippToPsps
- Timestamp:
- May 18, 2010, 1:15:23 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ippToPsps
v67 v68 4 4 [[PageOutline]] 5 5 6 {{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the DXLayerpolls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond.6 {{{ippToPsps}}} is the interface between IPP and PSPS. In short, {{{ippToPsps}}} creates FITS files from IPP data, then publishes them to a datastore in the form of ''batches''. On the PSPS side, the {{{DXLayer}}} polls the datastore, collects batches when they become available, then converts the contents to {{{csv}}} files before sending them on to SQL Server loader software, which ''merges'' them into the PSPS database. Ultimately there will be feedback from PSPS regarding errors in the received data, to which {{{ippToPsps}}} will need to respond. 7 7 8 It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the DXLayer. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side.8 It is intended that the binary tables in the FITS files generated by {{{ippToPsps}}} match the PSPS database schemas perfectly, the consequence being that any alterations to the PSPS database schema will only affect {{{ippToPsps}}} code, and not the {{{DXLayer}}}. A certain amount of data validation will be performed by {{{ippToPsps}}} before publication, more validation occurring at the loading and merge stages on the PSPS side. 9 9 10 10 The outputs of {{{ippToPsps}}} are referred to as 'batches', and are detailed below. … … 38 38 == ippToPsps == 39 39 40 {{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the DXLayer.40 {{{ippToPsps}}} is a C program within the IPP build. When given the correct arguments it will generate a single FITS for the specified product (above). The program is run from a Perl script, which itself generates a list of exposure IDs based on arguments provided by the user (label etc). An instance of {{{ippToPsps}}} is run per exposure ID. Upon completion, the calling script bundles the resultant FITS files up as a ''batch'', then publishes it to the datastore, ready for collection by the {{{DXLayer}}}. 41 41 42 42 == DXLayer == 43 43 44 The DXLayerpolls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM.44 The {{{DXLayer}}} polls the datastore waiting for new batches. Upon receipt of a new batch, the FITS files are converted to a csv format, suitable for ingest by the ODM. 45 45 46 46 == ODM Loader == … … 126 126 - a copy exists on the IPP cluster after generation by ippToPsps program 127 127 - a copy exists on the IPP datastore after publication by ippToPsps 128 - the DXLayerretains a copy after it has sent the csv version to the ODM129 - the DXLayeralso keeps a copy of these (larger) csv files128 - the {{{DXLayer}}} retains a copy after it has sent the csv version to the ODM 129 - the {{{DXLayer}}} also keeps a copy of these (larger) csv files 130 130 131 131 We therefore need to quickly implement the basic framework of a feedback loop such that the IPP can quickly learn if a given batch has been successfully merged into the PSPS database or not. This will enable it to safely delete the data files and remove the copy from the datastore. … … 133 133 == Previous design == 134 134 135 Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The DXLayer would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the DXLayer can also deletethe same redundant data.135 Previously, Conrad and I had discussed a design whereby a second datastore instance was utilized, this time on the PSPS cluster. The {{{DXLayer}}} would act as the 'middle-man', polling the ODM for updates on loading progress, then posting the results on the PSPS datastore for the IPP. Polling this, {{{ippToPsps}}} could acquire a list of batches it knows are safe to be discarded. Simultaneously, the {{{DXLayer}}} could delete its copies of the same redundant data. 136 136 137 The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and wh y and what is required to be done by the IPP.137 The update placed on the PSPS datastore could take the form of an XML file. At first this would simply detail those files it is safe to delete, but could evolve into a more complex recovery report, i.e. which batches failed, and what is required to be done by the IPP. 138 138 139 139 == New design == 140 140 141 Instead of creating a new datastore instance within PSPS and using the DXLayer as communication layer between the ODM and the IPP, we propose that the DXLayer forms no part of the feedback system. It should be simplified such that it only enables loaing, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM.141 Instead of creating a new datastore instance within PSPS and using the {{{DXLayer}}} as communication layer between the ODM and the IPP, we propose that the {{{DXLayer}}} forms no part of the feedback system. It should be simplified such that it only facilitates loading, i.e. polling the IPP datastore for new data, converting it to csv files then sending these on to the ODM. Instead, to complete the circle, the {{{ippToPsps}}} code will poll the ODM directly, bypassing the {{{DXLayer}}} altogether. The IPP then knows which batches have merged successfully and can delete them accordingly. This also forms the basis of a full recovery system as, at a later date, {{{ippToPsps}}} can be coded to respond intelligently to the myriad of errors that may occur within the ODM. The {{{DXLayer}}} need know nothing of the how or why a certain batch is being submitted by the IPP, it should just grab it, convert it and pass it along to the ODM. 142 142 143 143 Since {{{ippToPsps}}} will (soon) keep a record of all the jobs and corresponding exposure IDs in the IPP database, it is unnecessary for this information to be duplicated by the {{{DXLayer}}}, which currently has its own local database for this information. 144 144 145 Rather than waste the code already written for the DXLayer, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts.145 Rather than waste the code already written for the {{{DXLayer}}}, it can be used within {{{ippToPsps}}}, for example, the ODM polling scripts. 146 146 147 The question remains of what should be done with the copies of the data currently retained by the DXLayer? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the DXLayershould not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies.147 The question remains of what should be done with the copies of the data currently retained by the {{{DXLayer}}}? The options are that it can either be deleted automatically after a defined amount of time, or the IPP can send list of batches it is safe to delete through the datastore, or perhaps the {{{DXLayer}}} should not retain files at all. Since it can quickly and easily acquire data from the IPP datastore anyway, it is probably unnecessary for it to hold any copies. 148 148 149 149 === Advantages over previous design ===
