| Version 12 (modified by , 15 years ago) ( diff ) |
|---|
PS1 IPP Czar Logs for the week 2011.01.03 - 2011.01.09
(Up to PS1 IPP Czar Logs)
Monday : 2011.01.3
heather reverted (using regtool -revert...) burntool/registration.
bill and eugene have turned off processing because we are out of disk space.
Tuesday : 2011.01.04
bill is czar today
- It appears that all data from last night has had burtool applied.
- 12:30 Set stdscience to 'run' added ThreePi.nightlyscience back in
- 12:52 we seem to be getting a pretty decent rate of faults due to nfs errors
Wednesday : 2011.01.05
- (serge/07:40) cam revert on
- (serge/08:39) publishing restarted
- (bills/10:00) warp stuck lots of entries in warpPendingSkyCell book in done state. ipp049 not responding to ssh 4 warps stuck running there. Stopped everything for awhile let jobs finish. Then reset the books (warp.reset, chip.reset, etc)
- (bills/10:51) Gavin rebooted ipp049. publish was getting lots of faults. Stopped it and asked Serge to investigate)
- (bills/11:35) Turned off some reverts in order to debug the fault rate. Also set poll limit to 32 to reduce the load in order to get an idea whether that is the problem or not.
- (bills/12:20) two chips failed repeatedly. Turned out to that the log files had a storage object but no instances. Fixed with neb-mv
- Serge found the origin of the publishFailures (some runs got queued for a client with a non-existent data store)
- turned warp off to allow the diffs to make better progress. poll limit is 64
- (bills/12:41) turned warp back on. Still getting failures even with only a few jobs running. There is at least one corrupt camRun 152676. See http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/PS1_Operations/broken_files
- (bills/13:29) found another corrupt file warp_id 142806 skycell.1162.062
Thursday : 2011.01.06
Friday : 2011.01.07
Saturday : 2011.01.08
Sunday : 2011.01.09
Note:
See TracWiki
for help on using the wiki.
