| Version 2 (modified by , 16 years ago) ( diff ) |
|---|
We have 2 diff faults today September 7, 2010. Let's fix them. The techniques described here can
be used for other stages.
***************************************
diff_id 76390 skycell.1194.140 fails because it can't read input warp file on ipp045
(This was a node that had some troublesome days last month)
Confirm that file is corrupt with
funpack -S /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits > broken.fits
FITSIO status = 108: error reading from FITS file
Error reading data buffer from file:
/data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o
.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits
Error reading elements 1 thru 2147418112 from column 1 (ffgclb).
error reading compressed byte stream from binary table
Yep it's garbage.
To regenerate the file execute the new script that I am going to finish and add to the tools directory.
(Or perhaps I will build the feature into warp_skycell.pl)
perl runwarpskycell.pl --warp_id 90720 --skycell_id skycell.1194.140 --no-update
We use --no-update because we don't want to update the database (The warpRun is full. The warpSkyfile is in the database.
We just want to fix the bits.)
Unfortunately runwarpskycell.pl fails because input chips not found.
This is SweetSpot data from early August being diffed against the stacks.
To find out why the data can't be found use
chiptool -processedimfile -chip_id 121251 -class_id xy52
to list the state of one of the required inputs.
We find chipRun has been cleaned (The data was previously processed as warp warp diffs. We've been
keeping the warps around to run against the stacks which were made at the end of the month)
We can get the chip images back with
chiptool -setimfiletoupdate -chip_id 121251 -set_label update
For extra credit we could use warptool -scmap -warp_id 90720 -skycell_id skycell.1194.140 to find
the subset of chips to process, but I didn't. We'd queue them for updates 1 at a time by
adding -class_id xy?? to
chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy52
chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy53
etc.
Since we'll clean the data soon why bother.
Wait for updates to complete. 5 minutes or so. Check with
chiptool -listrun -chip_id 121251
to see that the state is full.
Then run the runwarpupdate.pl script. Success
Now revert the diff failure
difftool -revertdiffskyfile -diff_id 76390
A few minutes later the diff is complete
Now we can go clean up the chips
chiptool -updaterun -set_state goto_cleaned -set_label goto_cleaned -chip_id 121251
*****************************************
Case 2.
?????????????????????????????????????????????Assertion failed in function psThreadLauncher at psThread.c:244. Error stack:
Unable to perform ppSub: 4 at /home/panstarrs/ipp/psconfig//ipp-20100823.lin64/bin/diff_skycell.pl line 400.
Running [/home/panstarrs/ipp/psconfig/ipp-20100823.lin64/bin/difftool -diff_id 76373 -skycell_id skycell.1460.009 -fault 4 -adddiffskyfile -dtime_script 62.9999801516533 -hostname ipp053 -path_base neb://ipp053.0/gpc1/SweetSpot.nightlyscience/2010/09/07/RINGS.V0/skycell.1460.009/RINGS.V0.skycell.1460.009.dif.76373 -dbname gpc1]...
This is a recurring bug at the diff stage. Ticket #1422 covers it.
I have had success fixing these by running the command by hand without threads.
They also sometimes succeed after reverting. Let's try that
difftool -revertdiffskyfile -diff_id 76373
Wait few minutes
difftool -diffskyfile -diff_id 76373 -skycell_id skycell.1460.009 | grep state
data_state STR full
state STR full
It worked. It would be nice to fix this bug.
Note:
See TracWiki
for help on using the wiki.
