| Version 26 (modified by , 15 years ago) ( diff ) |
|---|
PS1 IPP Czar Logs for the week 2010.12.13 - 2010.12.19
(Up to PS1 IPP Czar Logs)
Monday : 2010.12.13
Lots of update processing triggered by the postage stamp server has been going on for the past several days. Except for some requests for chips with reduction STDSCIENCE_V0 things have been progressing smoothly.
- 12:40 (bills) added label MD02.2010.rerun to stdscience 562 exposures. About 1/2 hour later pantasks died. Restarted it at 13:31
- 17:08 (heather) under my own stdscience, processed darktest%.20101213 to investigate edge nans. Queued darktest.201012013, consisting of 100 i band images, for the static masking (and the autogeneration of the static masks).
Tuesday : 2020.12.14
Update storm seems to have passed during the night.
Many faults from old chipRuns with reduction STDSCIENCE_V0 (chip_id ~= 20000). Many (if not all) of the config dump files for this reduction refer to files that no longer exist. I set the state to goto_purged but I have not yet pulled the trigger and set label to goto_cleaned.
- 08:00 Queued MD02 nightly stacks. Label MD02.2010.rerun. Data group is MD02.V2.$date to distinguish from the MD02 stacks run previously with data group MD02.$date
- 10:40 (Serge) Many diffs have faults. Started revert for diff (through czartool). I don't see anything else.
- 11:00 (Serge) I ran ~heather/sshToNodes.py ipp /usr/local/sbin/nfscheck as ipp and all nodes tell they are OK.
- 11:20 (Serge) Repeated entry in logs of failed jobs:
I/O error code: 102 -> pmFPAfileWrite (pmFPAfileIO.c:340): Known programming error Error: file->mode != PM_FPA_MODE_INTERNAL is not true. -> pmFPAfileIOChecks (pmFPAfileIO.c:90): I/O error failed WRITE in FPA_AFTER block for PSPHOT.BACKMDL.STDEV -> main (ppSub.c:89): I/O error Unable to close files. Unable to perform ppSub: 2 at /home/panstarrs/ipp/psconfig//ipp-20101206.lin64/bin/diff_skycell.pl line 400.
Gene speaks: "[...] hitting the same failure which [...] i mentioned earlier this morning. let's let them fail [and fix later]"
- 13:14 stopping processing in order to update the build. Need to wait a few minutes to let the running stacks finish.
- 13:41 rebuild complete processing restarted. set label STS.20101202 to inactive temporarily until fix is confirmed. Reverted the diff faults.
- 14:45 set label STS.20101202 to active. Added MD02.2010.rerun to survey.dist and added label to distribution pantasks.
- 15:48 (Serge) Added ippMonitor as mysql user on ippdb:
CREATE USER 'ippMonitor'@'ipp004.ifa.hawaii.edu' IDENTIFIED BY 'ippMonitor'; GRANT REPLICATION CLIENT ON *.* TO 'ippMonitor'@'ipp004.ifa.hawaii.edu'; FLUSH PRIVILEGES;
Modified ippMonitor/raw/site.php so that czar tool can check replication status on ippdb02 (SVN 30034).
Added link to czar log wiki pages in czartool page (SVN 30035).
- 17:40 Stopped processing in order to rebuild psphot and psmodules. Restarted a few minutes later. set diff.revert.off so as to evaluate diff failures.
Wednesday : 2010.12.15
- 06:41 (Bill) Got lots of images last night. czartool says 631 exposures and all have been downloaded. The MD02 rerun nightly stacks are nearly done, only 5 left out of over 5549. One is failing due to a corrupt mask file from warp 136656. I regenerated it with tools/runwarpskyfile.
- set label STS.20101202 to inactive while nightlyscience is busy.
- 09:56 (Serge) Stopped chip to speed up SweetSpot data processing
- 11:34 (Serge) SweetSpot data not published: in stdscience survey.add.publish SweetSpot.nightlyscience 5
- 11:42 (Serge) Stopped all pantasks_server
- ~12:15 (Bill) updated production build with changes to fix the diff problems. Tweaked the survey publish task to urge along the sweetspot data.
- 13:30 activated label STS.20101202 reverted all outstanding diff faults. Queued stack-stack diff runs for MD02.2010.rerun
- 18:00 Switched production build to ipp-20101215
- 19:09 Had a few hiccups. For some reason magic didn't get built so distribution and destreak fell over. Did the rebuild while things were running and got some "config files not found faults" (Don't do that!). Shut down everything and restarted. After a few more missing files were "fixed". isp registration is turned off because Chris thinks that the system is going to want to run burntool on the images.
