IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links
wiki:diff_fixits

Version 9 (modified by Serge CHASTEL, 15 years ago) ( diff )

--

How to fix Diff Warp Chip Cam

The text in the box at the bottom of this page gives a somewhat detailed description of how to fix faults due to damaged files. There are 4 programs in the tools directory that are of interest

  • runchipimfile.pl (not yet written)
  • runcameraexp.pl
  • runwarpskycell.pl
  • rundiffskyfile.pl

Pod documentation is supplied. For example see perl runwarpskycell.pl --help in src/<tag>/tools

We have 2 diff faults today September 7, 2010. Let's fix them. The techniques described here can
be used for other stages.

***************************************

diff_id 76390 skycell.1194.140 fails because it can't read input warp file on ipp045 
(This was a node that had some troublesome days last month)

Confirm that file is corrupt with

    funpack -S /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o.204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits > broken.fits

    FITSIO status = 108: error reading from FITS file
    Error reading data buffer from file:
    /data/ipp045.0/nebulous/2d/e8/391435564.gpc1:SweetSpot.nt:2010:08:08:o5416g0096o
    .204249:SR_o5416g0096o.204249.wrp.90720.skycell.1194.140.wt.fits
    Error reading elements 1 thru 2147418112 from column 1 (ffgclb).
    error reading compressed byte stream from binary table

Yep it's garbage.

To regenerate the file execute the script in the tools directory.

    perl $path_to_ipp_src/tools/runwarpskycell.pl --warp_id 90720 --skycell_id skycell.1194.140 --redirect-output 


Unfortunately runwarpskycell.pl fails because input chips not found.

This is SweetSpot data from early August being diffed against the stacks.

To find out why the data can't be found use

    chiptool -processedimfile -chip_id 121251 -class_id xy52

to list the state of one of the required inputs.

We find chipRun has been cleaned (The data was previously processed as warp warp diffs. We've been
keeping the warps around to run against the stacks which were made at the end of the month)

We can get the chip images back with

 chiptool -setimfiletoupdate -chip_id 121251 -set_label update

For extra credit we could use warptool -scmap -warp_id 90720 -skycell_id skycell.1194.140 to find
the subset of chips to process, but I didn't. We'd queue them for updates 1 at a time by
adding -class_id xy?? to 
     chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy52
     chiptool -setimfiletoupdate -chip_id 121251 -set_label update -class_id xy53
     etc.

Since we'll clean the data soon why bother.

Wait for updates to complete. 5 minutes or so. Check with 
       chiptool -listrun -chip_id 121251
to see that the state is full.

Then run the runwarpupdate.pl script. Success

Now revert the diff failure
    difftool -revertdiffskyfile -diff_id 76390

A few minutes later the diff is complete

Now we can go clean up the chips
   chiptool -updaterun -set_state goto_cleaned -set_label goto_cleaned -chip_id 121251

*****************************************
Case 2.

?????????????????????????????????????????????Assertion failed in function psThreadLauncher at psThread.c:244. Error stack:
Unable to perform ppSub: 4 at /home/panstarrs/ipp/psconfig//ipp-20100823.lin64/bin/diff_skycell.pl line 400.
Running [/home/panstarrs/ipp/psconfig/ipp-20100823.lin64/bin/difftool -diff_id 76373 -skycell_id skycell.1460.009 -fault 4 -adddiffskyfile -dtime_script 62.9999801516533 -hostname ipp053 -path_base neb://ipp053.0/gpc1/SweetSpot.nightlyscience/2010/09/07/RINGS.V0/skycell.1460.009/RINGS.V0.skycell.1460.009.dif.76373 -dbname gpc1]...

This is a recurring bug at the diff stage. Ticket #1422 covers it.

I have had success fixing these by running the command by hand without threads.

They also sometimes succeed after reverting. Let's try that

    difftool -revertdiffskyfile -diff_id 76373

Wait few minutes

 difftool -diffskyfile -diff_id 76373 -skycell_id skycell.1460.009 | grep state
 data_state       STR       full            
 state            STR       full

It worked. It would be nice to fix this bug.


If the skycell continues to fault the script rundiffskyfile may be used to fix the problem.

This script requires that the component have a fault in the database.
First turn off diff in stdscience because we are going to revert the fault.

   perl $path_to_ipp_src/tools/rundiffskyfile.pl --diff_id 76373 --skycell_id skycell.1460.009 --redirect-output --update

Once the script finishes echo $? to check the return status. If the value is zero you're done.

Don't forget to turn diff processing back on in stdscience.

--update tells the script to revert the fault, run the program, and update the database. If this parameter is not supplied
the images will be made but the database will not be updated.
Note: See TracWiki for help on using the wiki.