IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links
wiki:PS1_IPP_CzarLog_20101213

Version 21 (modified by Serge CHASTEL, 15 years ago) ( diff )

--

PS1 IPP Czar Logs for the week 2010.12.13 - 2010.12.19

(Up to PS1 IPP Czar Logs)

Monday : 2010.12.13

Lots of update processing triggered by the postage stamp server has been going on for the past several days. Except for some requests for chips with reduction STDSCIENCE_V0 things have been progressing smoothly.

  • 12:40 (bills) added label MD02.2010.rerun to stdscience 562 exposures. About 1/2 hour later pantasks died. Restarted it at 13:31
  • 17:08 (heather) under my own stdscience, processed darktest%.20101213 to investigate edge nans. Queued darktest.201012013, consisting of 100 i band images, for the static masking (and the autogeneration of the static masks).

Tuesday : 2020.12.14

Update storm seems to have passed during the night.

Many faults from old chipRuns with reduction STDSCIENCE_V0 (chip_id ~= 20000). Many (if not all) of the config dump files for this reduction refer to files that no longer exist. I set the state to goto_purged but I have not yet pulled the trigger and set label to goto_cleaned.

  • 08:00 Queued MD02 nightly stacks. Label MD02.2010.rerun. Data group is MD02.V2.$date to distinguish from the MD02 stacks run previously with data group MD02.$date
  • 10:40 (Serge) Many diffs have faults. Started revert for diff (through czartool). I don't see anything else.
  • 11:00 (Serge) I ran ~heather/sshToNodes.py ipp /usr/local/sbin/nfscheck as ipp and all nodes tell they are OK.
  • 11:20 (Serge) Repeated entry in logs of failed jobs:
    I/O error code: 102
     -> pmFPAfileWrite (pmFPAfileIO.c:340): Known programming error
        Error: file->mode != PM_FPA_MODE_INTERNAL is not true.
     -> pmFPAfileIOChecks (pmFPAfileIO.c:90): I/O error
        failed WRITE in FPA_AFTER block for PSPHOT.BACKMDL.STDEV
     -> main (ppSub.c:89): I/O error
        Unable to close files.
     Unable to perform ppSub: 2 at
     /home/panstarrs/ipp/psconfig//ipp-20101206.lin64/bin/diff_skycell.pl line 400.
    

Gene speaks: "[...] hitting the same failure which [...] i mentioned earlier this morning. let's let them fail [and fix later]"

  • 13:14 stopping processing in order to update the build. Need to wait a few minutes to let the running stacks finish.
  • 13:41 rebuild complete processing restarted. set label STS.20101202 to inactive temporarily until fix is confirmed. Reverted the diff faults.
  • 14:45 set label STS.20101202 to active. Added MD02.2010.rerun to survey.dist and added label to distribution pantasks.
  • 15:48 (Serge) Added ippMonitor as mysql user on ippdb:
CREATE USER 'ippMonitor'@'ipp004.ifa.hawaii.edu' IDENTIFIED BY 'ippMonitor';
GRANT REPLICATION CLIENT ON *.* TO 'ippMonitor'@'ipp004.ifa.hawaii.edu';
FLUSH PRIVILEGES;

Modified ippMonitor/raw/site.php so that czar tool can check replication status on ippdb02 (SVN 30034).

Added link to czar log wiki pages in czartool page (SVN 30035).

  • 17:40 Stopped processing in order to rebuild psphot and psmodules. Restarted a few minutes later. set diff.revert.off so as to evaluate diff failures.

Wednesday : 2010.12.15

  • 06:41 (Bill) Got lots of images last night. czartool says 631 exposures and all have been downloaded. The MD02 rerun nightly stacks are nearly done, only 5 left out of over 5549. One is failing due to a corrupt mask file from warp 136656. I regenerated it with tools/runwarpskyfile.
  • set label STS.20101202 to inactive while nightlyscience is busy.
  • 09:56 (Serge) Stopped chip to speed up SweetSpot data processing
  • 11:34 (Serge) survey.add.publish SweetSpot.nightlyscience 5

Thursday : 2010.12.16

Friday : 2010.12.17

Saturday : 2010.12.18

Sunday : 2010.12.19

Note: See TracWiki for help on using the wiki.