| Version 6 (modified by , 16 years ago) ( diff ) |
|---|
The text in the box at the bottom of this page gives a somewhat detailed description of how to fix faults due to damaged files. There are 4 programs in the tools directy that are of interest
- runchipimfile.pl (not yet written)
- runcameraexp.pl
- runwarpskyfile.pl
- rundiffskyfile.pl
It would be useful if someone would do the pod documentation in these files
We have 2 diff faults today September 7, 2010. Let's fix them. The techniques described here can
be used for other stages.
***************************************
diff_id 76390 skycell.1194.140 fails because it can't read input warp file on ipp045
(This was a node that had some troublesome days last month)
Confirm that file is corrupt with
funpack -S /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits > broken.fits
FITSIO status = 108: error reading from FITS file
Error reading data buffer from file:
/data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o
.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits
Error reading elements 1 thru 2147418112 from column 1 (ffgclb).
error reading compressed byte stream from binary table
Yep it's garbage.
To regenerate the file execute the script in the tools directory.
perl $path_to_ipp_src/tools/runwarpskycell.pl --warp_id 90720 --skycell_id skycell.1194.140 --redirect-output
Unfortunately runwarpskycell.pl fails because input chips not found.
This is SweetSpot data from early August being diffed against the stacks.
To find out why the data can't be found use
chiptool -processedimfile -chip_id 121251 -class_id xy52
to list the state of one of the required inputs.
We find chipRun has been cleaned (The data was previously processed as warp warp diffs. We've been
keeping the warps around to run against the stacks which were made at the end of the month)
We can get the chip images back with
chiptool -setimfiletoupdate -chip_id 121251 -set_label update
For extra credit we could use warptool -scmap -warp_id 90720 -skycell_id skycell.1194.140 to find
the subset of chips to process, but I didn't. We'd queue them for updates 1 at a time by
adding -class_id xy?? to
chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy52
chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy53
etc.
Since we'll clean the data soon why bother.
Wait for updates to complete. 5 minutes or so. Check with
chiptool -listrun -chip_id 121251
to see that the state is full.
Then run the runwarpupdate.pl script. Success
Now revert the diff failure
difftool -revertdiffskyfile -diff_id 76390
A few minutes later the diff is complete
Now we can go clean up the chips
chiptool -updaterun -set_state goto_cleaned -set_label goto_cleaned -chip_id 121251
*****************************************
Case 2.
?????????????????????????????????????????????Assertion failed in function psThreadLauncher at psThread.c:244. Error stack:
Unable to perform ppSub: 4 at /home/panstarrs/ipp/psconfig//ipp-20100823.lin64/bin/diff_skycell.pl line 400.
Running [/home/panstarrs/ipp/psconfig/ipp-20100823.lin64/bin/difftool -diff_id 76373 -skycell_id skycell.1460.009 -fault 4 -adddiffskyfile -dtime_script 62.9999801516533 -hostname ipp053 -path_base neb://ipp053.0/gpc1/SweetSpot.nightlyscience/2010/09/07/RINGS.V0/skycell.1460.009/RINGS.V0.skycell.1460.009.dif.76373 -dbname gpc1]...
This is a recurring bug at the diff stage. Ticket #1422 covers it.
I have had success fixing these by running the command by hand without threads.
They also sometimes succeed after reverting. Let's try that
difftool -revertdiffskyfile -diff_id 76373
Wait few minutes
difftool -diffskyfile -diff_id 76373 -skycell_id skycell.1460.009 | grep state
data_state STR full
state STR full
It worked. It would be nice to fix this bug.
If the skycell continues to fault the script rundiffskyfile may be used to fix the problem.
This script requires that the component have a fault in the database.
First turn off diff in stdscience because we are going to revert the fault.
perl $path_to_ipp_src/tools/rundiffskyfile.pl --diff_id 76373 --skycell_id skycell.1460.009 --redirect-output --update
Once the script finishes echo $? to check the return status. If the value is zero you're done.
Don't forget to turn diff processing back on in stdscience.
--update tells the script to revert the fault, run the program, and update the database. If this parameter is not supplied
the images will be made but the database will not be updated.
Note:
See TracWiki
for help on using the wiki.
