IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links

Changes between Version 32 and Version 33 of Cluster_Storage_Notes


Ignore:
Timestamp:
Oct 17, 2011, 5:52:56 PM (15 years ago)
Author:
watersc1
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Cluster_Storage_Notes

    v32 v33  
    33== Automatic Storage Monitoring ==
    44
    5 
    6 
    7 
     5Cron jobs are set up on all nebulous storage nodes to scan the nebulous disks and construct statistics about their contents.  These jobs are striped across the cluster throughout the week, in an attempt to prevent all hosts running the disk scan simultaneously (which would presumably have an effect on processing throughput).  The usage statistics are placed in nebulous in files named `neb://ipp_diskspace/YYYY-MM-DD/ippXXX.Z.neb_usage.dat`, with the date string matching the date of the Sunday of the week the scan was performed.  The script parses each filename in the nebulous directory, and attempts to classify the stage it comes from as well as the type of product from that stage.  The total count of these files, and the total usage (in bytes) of these files.  These totals are what is stored in the statistics files.  An excerpt of one of these files is listed below, showing the CHIP stage file statistics from one host.
     6{{{
     7CHIP B1FITS 774142 320650755840
     8CHIP B1JPG 71 18789330
     9CHIP B2FITS 775168 34038596160
     10CHIP B2JPG 70 902953
     11CHIP BUNDLE 6649 190039346914
     12CHIP CATALOG 462015 443818370880
     13CHIP FITS 98121 1030044401280
     14CHIP KERNEL 14 6658560
     15CHIP LOG 764083 25942316933
     16CHIP MASK 44888 93167104320
     17CHIP MDC 774856 82952703287
     18CHIP MDL 832113 34001887680
     19CHIP PNG 9838 160628007
     20CHIP PSF 774958 77240390400
     21CHIP PTN 50884 60478741440
     22CHIP SKYCELL 30 86400
     23CHIP STATS 637305 2358654287
     24CHIP TRACE 422839 29600227
     25CHIP WEIGHT 44740 608909785920
     26}}}
     27
     28At the end of the week, the gpc1 database is polled to identify the count of states for each stage, and the results stored in `neb://ipp_diskspace/YYYY-MM-DD/run_im_counts.dat`.  This gives the stage, the data_state, and then the count of the number of components with that data_state:
     29{{{
     30CHIP cleaned 16945531
     31CHIP error_cleaned 25155
     32CHIP error_scrubbed 5851
     33CHIP full 631356
     34CHIP purged 286850
     35CHIP scrubbed 5993
     36CHIP update 3304
     37}}}
     38At the end of the week, a summary file is created (`neb://ipp_diskspace/YYYY-MM-DD/summary.dat`) that gives the usage in TB of each stage/product combination:
     39{{{
     40CHIP_B1FITS 4876.40501260757
     41CHIP_B1JPG 5.90300299786031
     42CHIP_B2FITS 506.811673343182
     43CHIP_B2JPG 0.397686927579343
     44CHIP_BTTABLE 0
     45CHIP_BUNDLE 2889.25785760116
     46CHIP_CATALOG 7536.26382619143
     47CHIP_FITS 14993.0452584894
     48CHIP_KERNEL 3.89136224985123
     49CHIP_LOG 557.324770034291
     50CHIP_MASK 1228.97341176867
     51CHIP_MDC 1149.67314596102
     52CHIP_MDL 501.610046625137
     53CHIP_PNG 10.054713034071
     54CHIP_PSF 1185.04951536655
     55CHIP_PTN 1154.56684827805
     56CHIP_SKYCELL 0.0493767857551575
     57CHIP_STATS 68.9840127192438
     58CHIP_TRACE 5.00015713181347
     59CHIP_UNKNOWN 0.0171183617785573
     60CHIP_WEIGHT 9193.91913002729
     61}}}
     62
     63A largely untested final summary should be generated that takes the usage.dat files, the run_im_counts.dat file, and the mapping file (hardcoded as `neb://ipp_diskspace/mappings_im.dat`) to match the sizes in each stage/product and the counts in each stage/data_state, and construct an understanding of how much disk space is being used by permanent products (raw imfiles), transient products (chip stage images), and final output products (stacks).  This has been delayed due to other concerns, and an issue with the perl installation on ippbXX.  This final summary would then be able to be used to recalculate the table in `trunk/tools/diskspace/sizes_from_counts.pl`.  This script takes a list of counts (any of the run_im_counts.dat files), and uses the average sizes for each stage/data_state, and calculates a description of which stages are using the most disk space.  Running this script on the count file from 2011-10-09 yields (in part):
     64{{{
     65CHIP     cleaned            PRODUCT        31179.7770     0.001840  16945531
     66CHIP     error_cleaned      PRODUCT         2652.1420     0.105432     25155
     67CHIP     error_scrubbed     PRODUCT          232.8581     0.039798      5851
     68CHIP     full               TRANSIENT      75617.5081     0.119770    631356
     69CHIP     purged             PRODUCT          527.8040     0.001840    286850
     70CHIP     scrubbed           PRODUCT           11.0271     0.001840      5993
     71CHIP     update             TRANSIENT        348.3473     0.105432      3304
     72}}}
     73This shows that of the space used by chip (on the disks scanned), 30% is consumed by the permanent output products of previous calculations, etc.  There is also a summary printed:
     74{{{
     75        PRODUCT         272237.058713
     76        TRANSIENT       160852.076341
     77        PERMANENT       805663.824597
     78        SUM             1238752.959651
     79}}}
     80which displays the calculated sizes used by each of the three classes of products, along with the sum.  This sum seems to only be accurate to about 10%, and would certainly be improved by fixing the ippbXX perl issue and fully recalculating the size/count table used.
    881== Accounting ==
    982|| 2011-02-28                     || Item                || Balance             ||