The text in the box at the bottom of this page gives a somewhat detailed description of how to fix faults due to damaged files. There are 4 programs in the tools directy that are of interest * runchipimfile.pl (not yet written) * runcameraexp.pl * runwarpskyfile.pl * rundiffskyfile.pl It would be useful if someone would do the pod documentation in these files {{{ We have 2 diff faults today September 7, 2010. Let's fix them. The techniques described here can be used for other stages. *************************************** diff_id 76390 skycell.1194.140 fails because it can't read input warp file on ipp045 (This was a node that had some troublesome days last month) Confirm that file is corrupt with funpack -S /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits > broken.fits FITSIO status = 108: error reading from FITS file Error reading data buffer from file: /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o .204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits Error reading elements 1 thru 2147418112 from column 1 (ffgclb). error reading compressed byte stream from binary table Yep it's garbage. To regenerate the file execute the script in the tools directory. perl $path_to_ipp_src/tools/runwarpskycell.pl --warp_id 90720 --skycell_id skycell.1194.140 --redirect-output Unfortunately runwarpskycell.pl fails because input chips not found. This is SweetSpot data from early August being diffed against the stacks. To find out why the data can't be found use chiptool -processedimfile -chip_id 121251 -class_id xy52 to list the state of one of the required inputs. We find chipRun has been cleaned (The data was previously processed as warp warp diffs. We've been keeping the warps around to run against the stacks which were made at the end of the month) We can get the chip images back with chiptool -setimfiletoupdate -chip_id 121251 -set_label update For extra credit we could use warptool -scmap -warp_id 90720 -skycell_id skycell.1194.140 to find the subset of chips to process, but I didn't. We'd queue them for updates 1 at a time by adding -class_id xy?? to chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy52 chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy53 etc. Since we'll clean the data soon why bother. Wait for updates to complete. 5 minutes or so. Check with chiptool -listrun -chip_id 121251 to see that the state is full. Then run the runwarpupdate.pl script. Success Now revert the diff failure difftool -revertdiffskyfile -diff_id 76390 A few minutes later the diff is complete Now we can go clean up the chips chiptool -updaterun -set_state goto_cleaned -set_label goto_cleaned -chip_id 121251 ***************************************** Case 2. ?????????????????????????????????????????????Assertion failed in function psThreadLauncher at psThread.c:244. Error stack: Unable to perform ppSub: 4 at /home/panstarrs/ipp/psconfig//ipp-20100823.lin64/bin/diff_skycell.pl line 400. Running [/home/panstarrs/ipp/psconfig/ipp-20100823.lin64/bin/difftool -diff_id 76373 -skycell_id skycell.1460.009 -fault 4 -adddiffskyfile -dtime_script 62.9999801516533 -hostname ipp053 -path_base neb://ipp053.0/gpc1/SweetSpot.nightlyscience/2010/09/07/RINGS.V0/skycell.1460.009/RINGS.V0.skycell.1460.009.dif.76373 -dbname gpc1]... This is a recurring bug at the diff stage. Ticket #1422 covers it. I have had success fixing these by running the command by hand without threads. They also sometimes succeed after reverting. Let's try that difftool -revertdiffskyfile -diff_id 76373 Wait few minutes difftool -diffskyfile -diff_id 76373 -skycell_id skycell.1460.009 | grep state data_state STR full state STR full It worked. It would be nice to fix this bug. If the skycell continues to fault the script rundiffskyfile may be used to fix the problem. This script requires that the component have a fault in the database. First turn off diff in stdscience because we are going to revert the fault. perl $path_to_ipp_src/tools/rundiffskyfile.pl --diff_id 76373 --skycell_id skycell.1460.009 --redirect-output --update Once the script finishes echo $? to check the return status. If the value is zero you're done. Don't forget to turn diff processing back on in stdscience. --update tells the script to revert the fault, run the program, and update the database. If this parameter is not supplied the images will be made but the database will not be updated. }}}