== This page contains a running log of the "special activities" of the IPP Processing Czar == 2010-06-09 Bill Bad weather last night never opened. Took the opportunity to evaluate the existing faults and unfinished processing. * chip processing was generating lots of errors due to missing detrends. It turns out that some engineering exposures got made with obs_mode = '3PI'. Set these chipRuns to drop. * camRun 85769 repeatedly failed. exp_name o5354g0523o. jpeg images for exposures around this time have lots of missing chips. Observing log says 'g0490o - g0523o Most of these exposures were taken in twilight and need to be redone' Set camRun to drop with note to this effect * 4 warpSkyfiles were repeatedly failing. This was because one of the input images chip_id 100112 class_id XY17 was corrupted. Re-ran chip_imfile.pl by hand, reverted warp, and it completed. The command line was {{{ chip_imfile.pl --no-update --verbose --exp_id 176615 --chip_id 100112 --chip_imfile_id 5832194 --class_id XY17 --uri neb://ipp026.0/gpc1/20100603/o5350g0442o/o5350g0442o.ota17.fits --camera GPC1 --run-state new --deburned 0 --outroot neb://ipp026.0/gpc1/ThreePi.nt/2010/06/03/o5350g0442o.176615/o5350g0442o.176615.ch.100112 --redirect-output --dbname gpc1 }}} * What!? camRun 85769 got set changed from drop to new! Looks like camtool -revertprocessedexp is obsolete. Paul investigated and fixed in trunk. * 4 of the twilight exposures from MJD 55354 failed at warp stage. Set the corresponding runs' state to 'drop'. * Eric Morganson reports that some postage stamp requests he submitted last night (HST) didn't complete. Turns out each one referenced some warpRuns in unexpected state: state = 'full' magicked = -1 Fixed by setting the corresponding magicDSRuns to state new. This shouldn't have happened. Magicked gets set to -1 when cleaned and at update time, since the chips are magicked the warps are magicked. These were the only runs in this state (at any stage) so I let this go for now. * Investigated incomplete runs * warp - 38 runs were descendents of camRuns with poor quality. This is a bug in camtool. It should not advance run to fake if the quality is poor. Set to drop. Checked fix to camtool into the trunk. * stackRun 116184 gets 'no fake sources suitable for PSF fitting' * distRun 177188 component X15 from chipRun 94549 is corrupt (streaksremove output) * 117 magicDSRuns were in state new but the corresponding warpRuns had been purged. Set to goto_cleaned. * magicDSRun 52543 for camera stage cannot proceed because the chipRun has been cleaned. Set to goto_cleaned. * magicDSRun 162057 for chipRun 99733 cannot proceed because the camera mask file for chip XY17 is corrupt (camRun 83891) * magicDSRun 152644 for diffRun 56728 repeatedly fails on skycell.2266.020 with fault 4 The program segvs due to a bug in the function censorSources. It looks like the input cmf file may be corrupt. Fix to streaksremove committed to the trunk. * 4 diff stage distRuns were in state new but the corresponding diffRuns had been cleaned. Set to goto_cleaned * 49 diffRuns in state new but the corresponding warpRuns have been cleaned. * several magicRuns were in state new with diffRuns that have been cleaned. Set to drop. Changed magictool to allow state = 'drop' * 23 magicRuns were in state new whose corresponding diffRuns were in state 'new' This is quite bizarre. The data_group was ThreePi.20100XXX where XXX in 221, 222, 223, 227, 228, 301. Set magicRun.state to drop. * diffRun 58718 and stackRun 116037 are blocked because one of their inputs warpSkyfile 70847 skycell.075 is corrupt. Reprocessed with the command {{{ warp_skycell.pl --no-update --verbose --warp_id 70847 --warp_skyfile_id 6667056 --skycell_id skycell.075 --tess_dir MD06 --outroot neb://ipp036.0/gpc1/MD06.nt/2010/06/06/o5353g0124o.178114/o5353g0124o.178114.wrp.70847.skycell.075 --run-state new --camera GPC1 }}} ---- 2010-06-10 Bill High winds on the summit again. No science exposures * 3 postage stamp requests got stuck. There were a couple of problems with destreaking that needed some manual intervention. * a couple of runs failed at the end due to a database deadlock updating the magic status of a chipProcessedImfile. After reverting these completed. * some others failed due to an inconsistency in the database. The diffRuns that were the input to the magic analysis were in state 'cleaned' but the data_state of the skyfiles was 'full'. Because of this magic_destreak.pl assumed that the skyfiles were available to be used for the 'diffed pixel' calculations but the files didn't exist. About 4000 diffRuns were in this state. I updated the database by hand and ran revert on destreak. Upon re-running the script found the data_state as 'cleaned' so it created temporary skeleton skycells. ---- 2010-06-14 15:05 Chris * shutdown cleanup pantasks as requested via phone. ---- 2010-06-15 Paul * Camera run 85769 failing because only 3 chips have good 'quality' flag: {{{ mysql> update camRun set state = 'drop' where cam_id = 85769; }}} * Burntooling old data for MD04 reference stack: {{{ pantasks: ns.add.date 2009-12-21 pantasks: ns.add.date 2009-11-28 pantasks: ns.add.date 2009-11-26 pantasks: ns.add.date 2009-12-02 }}} ---- 2010-06-25 Paul * Manually re-running corrupted files: {{{ chip_imfile.pl --exp_id 182679 --chip_id 105326 --chip_imfile_id 6145034 --class_id XY17 --uri neb://ipp026.0/gpc1/20100617/o5364g0161o/o5364g0161o.ota17.fits --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/17/o5364g0161o.182679/o5364g0161o.182679.ch.105326 --run-state new --no-update --verbose --redirect-output --dbname gpc1 chip_imfile.pl --exp_id 184898 --chip_id 106408 --chip_imfile_id 6209945 --class_id XY05 --uri neb://ipp023.0/gpc1/20100620/o5367g0467o/o5367g0467o.ota05.fits --camera GPC1 --outroot neb://any/gpc1/STS.nt/2010/06/20/o5367g0467o.184898/o5367g0467o.184898.ch.106408 --run-state new --no-update --verbose --redirect-output --dbname gpc1 camera_exp.pl --exp_tag o5371g0244o.186524 --cam_id 91658 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/24/o5371g0244o.186524/o5371g0244o.186524.cm.91658 --run-state new --no-update --verbose --redirect-output --dbname gpc1 camera_exp.pl --exp_tag o5369g0208o.185868 --cam_id 91339 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/22/o5369g0208o.185868/o5369g0208o.185868.cm.91339 --run-state new --no-update --verbose --redirect-output --dbname gpc1 camera_exp.pl --exp_tag o5366g0349o.184150 --cam_id 90117 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/19/o5366g0349o.184150/o5366g0349o.184150.cm.90117 --run-state new --no-update --verbose --redirect-output --dbname gpc1 camera_exp.pl --exp_tag o5363g0305o.182212 --cam_id 88947 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/16/o5363g0305o.182212/o5363g0305o.182212.cm.88947 --run-state new --no-update --verbose --redirect-output --dbname gpc1 warp_skycell.pl --warp_id 76965 --warp_skyfile_id 7257921 --skycell_id skycell.2332.089 --tess_dir RINGS.V0 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/22/o5369g0092o.185751/o5369g0092o.185751.wrp.76965.skycell.2332.089 --run-state new --no-update --threads 8 --redirect-output --dbname gpc1 warp_skycell.pl --warp_id 77486 --warp_skyfile_id 7309832 --skycell_id skycell.1404.127 --tess_dir RINGS.V0 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/24/o5371g0408o.186686/o5371g0408o.186686.wrp.77486.skycell.1404.127 --redirect-output --dbname gpc1 --verbose --threads 8 --run-state new --no-update diff_skycell.pl --diff_id 61705 --skycell_id skycell.1404.127 --outroot neb://any/gpc1/ThreePi.nightlyscience/2010/06/24/RINGS.V0/skycell.1404.127/RINGS.V0.skycell.1404.127.dif.61705 --run-state new --diff_skyfile_id 3316579 --no-update --redirect-output --verbose --threads 8 --dbname gpc1 camera_exp.pl --exp_tag o5348g0312o.175971 --cam_id 83891 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/01/o5348g0312o.175971/o5348g0312o.175971.cm.83891 --dbname gpc1 --redirect-output --verbose --no-update }}} * Dropped runs that cannot complete because the inputs are not available: {{{ stacktool -updaterun -set_state drop -stack_id 116184 -dbname gpc1 magictool -updaterun -state drop -magic_id 20031 -dbname gpc1 magictool -updaterun -state drop -magic_id 20032 -dbname gpc1 mysql> update diffRun join diffInputSkyfile using(diff_id) join warpRun on warp1 = warp_id set diffRun.state = 'drop' where diffRun.label = 'ThreePi.nightlyscience' and diffRun.state = 'new' and warpRun.state = 'cleaned'; Query OK, 48 rows affected (0.06 sec) Rows matched: 48 Changed: 48 Warnings: 0 }}} ---- 2010-06-30 Paul * Manually re-running corrupted files: {{{ warp_skycell.pl --warp_id 79174 --warp_skyfile_id 7478005 --skycell_id skycell.2008.028 --tess_dir RINGS.V0 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/28/o5375g0534o.188432/o5375g0534o.188432.wrp.79174.skycell.2008.028 --run-state new --no-update --threads 4 --dbname gpc1 --redirect-output --verbose warp_skycell.pl --warp_id 79400 --warp_skyfile_id 7500333 --skycell_id skycell.1477.006 --tess_dir RINGS.V0 --camera GPC1 --outroot neb://any/gpc1/ThreePi.nt/2010/06/29/o5376g0268o.188733/o5376g0268o.188733.wrp.79400.skycell.1477.006 --run-state new --no-update --threads 4 --dbname gpc1 --redirect-output --verbose }}} ---- 2010-07-01 Bill * we're out of space. Gene and Chris determined that part of the problem is that the non-destreaked files are being kept around. This effectively doubles the space used. Gene shut off processing. Bill issued some mysql queries to queue existing data for cleanup. * 10:00 shut off destreak/distribution and increased the number of nodes doing cleanup * MOPS's postage stamp requests were failing. Inverse images were missing. Turned out to be a problem diff updates. Queued diffRun's for cleanup, killed off the jobs and asked Jan to resubmit. * shut everything down to allow /data/ipp005.0 to be mounted from ipp037. Restart went smoothly. Since we have some space now turned on summit copy and registration. Distribution/desreak and stdscience are still off for the time being. * Jan re-submitted 2 requests with 100 jobs each. They required some updates to process. These might be blocked waiting for the previous cleanup to finish. * 15:14 turned off destreak cleanup to give diff.cleanup a chance to finish the 498 diff runs pending cleanup. There are 14556 destreak runs left to go. --- 2010-07-28 Bill * They reportedly had a good night on the summit. 754 exposures were taken. Some of those were tests. * As of 7:45 burntool is running. The MD08.redo warps (V2 tessellation are running as well) Ganglia is glowing red and nebulous space used is up to 89% - Need to check on cleanup. * got some unfinished postage stamp requests (as usual) * updated ipp-20100701 with some changes to support construction of background preserved chip stage images.