[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[LDM #UKF-836086]: LDM 6.9.4 issue---queue size getting monstrous
- Subject: [LDM #UKF-836086]: LDM 6.9.4 issue---queue size getting monstrous
- Date: Tue, 18 Jan 2011 09:32:59 -0700
Gilbert,
> Forwarded conversation
> Subject: "ldmadmin check" problem LDM may not be running
> ------------------------
>
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:31 PM
> To: address@hidden, address@hidden
>
>
> Checking for a running LDM system...
> Checking the system clock...
> Checking the most-recent insertion into the queue...
> Vetting the size of the queue and the maximum acceptable latency...
> vetQueueSize(): The maximum acceptable latency (registry parameter
> "/server/max-latency": 3600 seconds) is greater than the observed minimum
> virtual residence time of data-products in the queue (2029 seconds).
> This will hinder detection of duplicate data-products.
> The value of the "/reconciliation-mode" registry-parameter is "increase
> queue"
> Increasing the capacity of the queue...
> Creating new queue of 2249086265 bytes and 106734 slots...
> Illegal size "2249086265"
> Usage: pqcreate [options] <initialsz>[k|m|g] <pqfname>
> pqcreate [options] -s <initialsz>[k|m|g] [-q <pqfname>]
> Options:
> -v
> -c
> -f
> -l logfname
> -S nproducts
> (default pqfname is "/home/ldm/var/queues/ldm.pq")
> vetQueueSize(): Couldn't create new queue: /home/ldm/var/queues/ldm.pq.new
It appears that 1) the "ldmadmin check" command noticed that the product-queue
wasn't big enough given the "max latency" parameter; 2) the reconciliation-mode
parameter was set to "increase queue size"; and 3) the size of the queue that
would be necessary in order to guarantee duplicate data-product detection is
larger than the operating-system can handle. I suspect that the system in
question is a 32-bit one and that it doesn't support large files (files larger
than about 2 GB).
Your options include 1) setting the reconciliation-mode parameter to "do
nothing", which will cause the "ldmadmin check" command to complain but,
otherwise, do nothing and will prevent the LDM from guaranteeing duplicate
data-product detection; 2) setting the reconciliation-mode parameter to
"decrease max latency", which will cause the maximum latency parameter to be
adjusted downwards in order to guarantee duplicate product detection; 3)
migrating to a system that supports larger files in order to keep the default
3600 second maximum latency; and 4) trying to rebuild the LDM on the current
system with support for large files.
I figured that there would be some "growing pains" with the addition of this
new feature. Let me know what you decide or if you have any questions.
> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 5:46 PM
> To: address@hidden, address@hidden
>
>
> virtual residence time of data-products in the queue (1961 seconds).
> Creating new queue of 2322946772 bytes and 109217 slots...
> Illegal size "2322946772"
>
> ----------
> From: **<address@hidden>
> Date: Mon, Jan 17, 2011 at 6:01 PM
> To: address@hidden, address@hidden
>
>
> virtual residence time of data-products in the queue (1929 seconds).
> Creating new queue of 2359502899 bytes and 111645 slots...
> Illegal size "2359502899"
>
> ----------
> From: *Gilbert Sebenste* <address@hidden>
> Date: Mon, Jan 17, 2011 at 7:10 PM
> To: address@hidden
> Cc: address@hidden
>
>
> Hello Steve,
>
> Gilbert here. Now you get to see my second job in action. :-)
> We have a problem here at AllisonHouse. One of our machines,
> feed03.allisonhouse.com, has a huge ldm.pq.new file and since
> we run those from shared memory, it is filling it up and
> crashing our LDM. I think Tom Yoksas had this problem.
> This is only occurring on one of two servers which are
> essentially identical, getting the same feeds.
> Anyway, ldmadmin check complains that:
>
>
> In the ldmd.log file, I see:
>
> Jan 17 23:16:32 feed03 10.1.1.12[25694] NOTE: LDM-6 desired product-class:
> 20110117221632.957 TS_ENDT {{NEXRAD2, "K[L-R]"},{NONE,
> "SIG=081abfd86fff9ce0e906d4df32984bbc"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25695] NOTE: LDM-6 desired product-class:
> 20110117221633.068 TS_ENDT {{NEXRAD2, "K[S-Z]"},{NONE,
> "SIG=753faf5aef690a29022c8ad31e661de0"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25696] NOTE: LDM-6 desired product-class:
> 20110117221633.194 TS_ENDT {{NEXRAD2, "P[A-Z]"},{NONE,
> "SIG=9d08e28a11b3cd7ecf0f9e31fcc7f64c"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25698] NOTE: LDM-6 desired product-class:
> 20110117221633.536 TS_ENDT {{NEXRAD3, ".*"},{NONE,
> "SIG=1c9a5dea7c65b2c63b9fbaf0c08c8eca"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25699] NOTE: LDM-6 desired product-class:
> 20110117221633.662 TS_ENDT {{IDS|DDPLUS, "^(W.....) (....)"},{NONE,
> "SIG=bdc9fbb9c60b57219c478675bc0303cb"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25700] NOTE: LDM-6 desired product-class:
> 20110117221633.787 TS_ENDT {{IDS|DDPLUS, "^(ASUS01) (KWBC)"},{NONE,
> "SIG=70ed6821f05064e0a4c8166b5e36ee80"}}
> Jan 17 23:16:33 feed03 10.1.1.12[25701] NOTE: LDM-6 desired product-class:
> 20110117221633.930 TS_ENDT {{IDS|DDPLUS, "^(FSUS02) (KWBC)"},{NONE,
> "SIG=b29855e5fefa81a4bc9a3ff915255f40"}}
> Jan 17 23:16:34 feed03 pqact[25689] NOTE: Starting from insertion-time
> 2011-01-17 23:16:05.458968 UTC
> Jan 17 23:16:05 feed03 pqact[24062] NOTE: Behind by 0.129631 s
> Jan 17 23:16:05 feed03 pqact[24061] NOTE: Behind by 0.142458 s
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Starting Up (25542)
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: mmap: (nil) 0 2141605888: Cannot
> allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] ERROR: pq_open failed:
> /home/ldm/var/queues/ldm.pq.new: Cannot allocate memory
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Exiting
> Jan 17 23:16:20 feed03 pqcopy[25659] NOTE: Number of products copied: 0
> Jan 17 23:16:32 feed03 pqcheck[25665] NOTE: Starting Up (25542)
> Why is it creating this new ldm.pq.new? Weird. Anyway, it's causing the LDM
> to crash. When I do an
> ldmadmin delqueue after stopping it, it doesn't delete the .new file, and
> just takes up a bunch
> of unnecssary memory. After stopping the LDM and doing an ldmadmin delqueue,
> I then did a rm ldm.pq.new, and did this:
>
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 2093460
> drwxrwxrwt 2 root root 60 Jan 18 00:17 ./
> drwxr-xr-x 11 root root 3720 Oct 8 03:12 ../
> -rw-rw-r-- 1 ldm ldm 2141605888 Jan 17 23:16 ldm.pq.new
> /home/ldm% rm ldm.pq.new
> rm: remove regular file `ldm.pq.new'? y
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin mkqueue
> /home/ldm% ldmadmin mkqueue
> /home/ldm% rehash
> /home/ldm% ldmadmin clean
> ldmadmin newlog
> /home/ldm% ldmadmin newlog
> /home/ldm% cd /dev/shm
> /home/ldm% ls
> ls: unparsable value for LS_COLORS environment variable
> ldm.pq
> /home/ldm% ll
> ls: unparsable value for LS_COLORS environment variable
> total 1191740
> drwxrwxrwt 2 root root 60 Jan 18 00:18 ./
> drwxr-xr-x 11 root root 3720 Oct 8 03:12 ../
> -rw-rw-r-- 1 ldm ldm 1219145728 Jan 18 00:18 ldm.pq
> /home/ldm% df -k
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/SysVolGroup-LogVolRoot
> 234316432 13830396 208391368 7% /
> /dev/sda1 124427 20223 97780 18% /boot
> tmpfs 4155468 1191740 2963728 29% /dev/shm
> /dev/sdb1 240292420 61002716 167083520 27% /home/ldm/data
> /home/ldm% ldmadmin newlog
> /home/ldm% ldmadmin start
> The product-queue is OK.
> Checking pqact(1) configuration-file(s)...
> /home/ldm/etc/pqact.conf: syntactically correct
> /home/ldm/etc/pqact.conf.emwin: syntactically correct
> Checking LDM configuration-file (/home/ldm/etc/ldmd.conf)...
> Starting the LDM server...
> /home/ldm% pwd
> ---
> And it now seems to be fine. I think Tom Yoksas had a similar issue, but
> since I didn't
> have it, I just blew it off. Anyway...
>
> I realize that I am coming from a .com address, and therefore I completely
> understand you have no obligation to support me in this whatsoever.
> But, I do think you should obviously know about it, in case this
> is a serious or significant issue.
>
> Thanks!
>
> Gilbert
>
> ----
>
> Gilbert Sebenste
> Chief Meteorologist
> Allisonhouse, LLC
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: UKF-836086
Department: Support LDM
Priority: Normal
Status: Closed