This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
John, > Some portion of the LDM radar and satellite image creation failed again > overnight and I'm not sure what the issue is. The "pqmon" shows the max age > at 11004 so that looks better now. > > 20190325T150821.335052Z pqmon[13846] NOTE pqmon.c:358:main() nprods nfree > nempty nbytes maxprods maxfree minempty maxext age > 20190325T150821.335079Z pqmon[13846] NOTE pqmon.c:466:main() 1351711 3 > 0 123399368448 1351713 6 0 22486561072 11004 Wow! 123 gigabytes! You took what I said and ran with it! :-) You should be OK -- although with only 24 GB of memory your system will be swapping portions of the product-queue in and out continuously. I recommend monitoring the LDM system via the "ldmadmin addmetrics" and "ldmadmin plotmetrics" facilities. See the documentation for details. If you can increase the amount of physical memory to be, say, 120% of the product-queue, that would make the system more efficient. For your situation, you would need approximately 44 GB of physical memory in order to save the last hours worth of data. > I see a lot of these in the logs, but nothing else that stands out to me. > 20190324T000001.404224Z pqact[22051] WARN filel.c:3016:reap() Child 10270 > terminated by signal 10 The above means that the child process that was started by a pqact(1) EXEC entry and whose process ID was 10270 received a SIGUSR1 signal and, consequently, terminated. This signal is used by the LDM system to cause the various processes to close and re-open their log files, which is necessary in order to change to a new log file. Unfortunately, this particular child-process handled the SIGUSR1 in the default manner: by terminating abnormally. Can you determine what program corresponded to PID 10270? I consider this a bug in the LDM system and will work on a fix for the next release. Thanks for reporting it. > I do see these as well, but I’m not sure this is tied to the issue: > * 20190325T022401.340805Z XXX.XXX.XXX.XXX(feed)[8890] NOTE > error.c:236:err_log() Couldn't flush connection; flushConnection() failure to > 199.133.78.4: RPC: Unable to receive; errno = Connection reset by peer > * 20190325T022603.051454Z XXX.XXX.XXX.XXX(feed)[2789] NOTE > uldb.c:1535:sm_vetUpstreamLdm() Terminated redundant upstream LDM > (addr=199.133.78.4, pid=21698, vers=6, type=feeder, mode=alternate, > sub=(20190325012401.287302 TS_ENDT {{EXP, ".*"}})) > * 20190325T022603.051555Z XXX.XXX.XXX.XXX(feed)[21698] NOTE > ldmd.c:306:signal_handler() SIGTERM received > * 20190325T022603.051605Z XXX.XXX.XXX.XXX(feed)[21698] NOTE > ldmd.c:187:cleanup() Exiting > * 20190325T022603.052320Z ldmd[22048] NOTE ldmd.c:170:reap() child 21698 > exited with status 7 The above means that a receiving LDM process on host XXX.XXX.XXX.XXX subscribed to the same feed as a previous receiving LDM process on the same host. The new sending LDM process, consequently, terminated the sending LDM process that was started by the previous receiving LDM process because: 1) there's no sense in duplicating work; and 2) this is a classic denial-of-service vector. This can be safely ignored unless the two receiving LDM processes are behind a NAT, in which case they'll have the same IP address. In this case, the registry parameter "/server/enable-anti-DOS" at the sending site should be "false". Regards, Steve Emmerson Ticket Details =================== Ticket ID: WSJ-190258 Department: Support LDM Priority: Normal Status: Closed =================== NOTE: All email exchanges with Unidata User Support are recorded in the Unidata inquiry tracking system and then made publicly available through the web. If you do not want to have your interactions made available in this way, you must let us know in each email you send to us.