| 22 | | - (serge/07:40) cam revert on |
| 23 | | - (serge/08:39) publishing restarted |
| 24 | | - (bills/10:00) warp stuck lots of entries in warpPendingSkyCell book in done state. ipp049 not responding to ssh 4 warps stuck running there. Stopped everything for awhile let jobs finish. Then reset the books (warp.reset, chip.reset, etc) |
| 25 | | - (bills/10:51) Gavin rebooted ipp049. publish was getting lots of faults. Stopped it and asked Serge to investigate) |
| 26 | | - (bills/11:35) Turned off some reverts in order to debug the fault rate. Also set poll limit to 32 to reduce the load in order to get an idea whether that is the problem or not. |
| 27 | | - (bills/12:20) two chips failed repeatedly. Turned out to that the log files had a storage object but no instances. Fixed with neb-mv |
| | 23 | * (serge/07:40) cam revert on |
| | 24 | * (serge/08:39) publishing restarted |
| | 25 | * 10:00 warp stuck lots of entries in warpPendingSkyCell book in done state. ipp049 not responding to ssh 4 warps stuck running there. Stopped everything for awhile let jobs finish. Then reset the books (warp.reset, chip.reset, etc) |
| | 26 | * 10:51 Gavin rebooted ipp049. publish was getting lots of faults. Stopped it and asked Serge to investigate) |
| | 27 | * 11:35 Turned off some reverts in order to debug the fault rate. Also set poll limit to 32 to reduce the load in order to get an idea whether that is the problem or not. |
| | 28 | * 12:20 two chips failed repeatedly. Turned out to that the log files had a storage object but no instances. Fixed with neb-mv |
| 30 | | - (bills/12:41) turned warp back on. Still getting failures even with only a few jobs running. There is at least one corrupt camRun 152676. See http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/PS1_Operations/broken_files |
| 31 | | - (bills/13:29) found another corrupt file warp_id 142806 skycell.1162.062 . Increased poll limit to 128 |
| | 31 | * 12:41 turned warp back on. Still getting failures even with only a few jobs running. There is at least one corrupt camRun 152676. See http://svn.pan-starrs.ifa.hawaii.edu/trac/ipp/wiki/PS1_Operations/broken_files |
| | 32 | * 13:29 found another corrupt file warp_id 142806 skycell.1162.062 . Increased poll limit to 128 |
| | 33 | * 13:45 found 2 publishRuns that were failing and reverting repeatedly 9GB of log files was the result |
| | 34 | * 14:06 ran nightly_science.pl --queue_stacks --date 2011-01-05 |