This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
corepuncher, > Hi thanks for taking my question. > > I have a machine where LDM runs well, but only for a day or two. Then, it > suddenly shuts off. Well...seemingly. There is no "pqact" or "noaaportinge" > when I run "top", and data is not flowing. The best ways to determine if data is flowing are "ldmadmin watch" and "notifyme -vl-". > Just happened a few minutes ago. So I try to do an "ldm clean", and it says > The LDM system is running, and to stop it first. > > So I do ldmadmin stop, and I just get a perpetual: > > Stopping the LDM server... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... > Waiting for the LDM server to terminate... It can take a while to stop an LDM system. If it doesn't stop withing a minute, however, then something's wrong. > So ^C to stop it. > > Here is the last thing shown in ldmd.log: > > Mar 2 12:00:43 newton noaaportIngester[15971] ERROR: [gb22gem.c:74] [GB 1] > Couldn't get parameter values > Mar 2 12:00:43 newton noaaportIngester[15971] ERROR: [gb2param.c:89] [GB -1] > Couldn't get parameter info: disc=0, cat=16, id=3, pdtn=0 > Mar 2 12:00:43 newton noaaportIngester[15971] ERROR: [gb22gem.c:74] [GB 1] > Couldn't get parameter values > Mar 2 12:00:43 newton noaaportIngester[15971] WARN: Gap in packet sequence: > 1052210005 to 1052210549 [skipped 543] > Mar 2 12:00:43 newton noaaportIngester[15971] ERROR: Missing fragment in > sequence, last 565/66075757 this 1109/66075757 > Mar 2 12:00:43 newton noaaportIngester[15971] WARN: Gap in packet sequence: > 1052210549 to 1052214590 [skipped 4040] > Mar 2 12:00:43 newton noaaportIngester[15971] WARN: Gap in packet sequence: > 1052214590 to 1052214802 [skipped 211] Aside from missing some GEMPAK GRIB2 table entries, this looks normal. > I did look at the "ldm pid" file, and found the number. Then, I went > into TOP, and although I could not see it, I did a kill on that pid, > and it worked! A SIGINT sent to the top-level LDM server should stop the system quickly -- at the risk of corrupting the product-queue. > So that gets it to restart, but doesn't explain why it stops suddenly. > The crazy part is, I have another server, so 2 cords coming from Novra > receiver. The other machine never has this issue...so it must be a > software issue? > > From address@hidden Mon Mar 2 11:38:29 2015 > > Actually, I take that back. Even though it "seemed" to start after killing > that PID listed in the file: > > The product-queue is OK. > Checking pqact(1) configuration-file(s)... > /home/ldm/etc/pqact.conf: syntactically correct > etc/pqact.gempak: syntactically correct > etc/pqact.grlevelx: syntactically correct > Checking LDM configuration-file (/home/ldm/etc/ldmd.conf)... > Starting the LDM server... > > Again, there is no pqact or noaaportinge process running under top. So > alas, only thing I can do is reboot. > > The log, after getting a "fake" ldm start, shows this: > > Mar 2 12:35:37 pqact[518] NOTE: Starting from insertion-time 2015-03-02 > 18:01:12.401276 UTC > Mar 2 12:35:37 noaaportIngester[520] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[520] ERROR: [multicastReader.c:97] Couldn't > bind to port 1201 > Mar 2 12:35:37 noaaportIngester[520] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 noaaportIngester[521] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[521] ERROR: [multicastReader.c:97] Couldn't > bind to port 1202 > Mar 2 12:35:37 noaaportIngester[521] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 noaaportIngester[523] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[523] ERROR: [multicastReader.c:97] Couldn't > bind to port 1204 > Mar 2 12:35:37 noaaportIngester[523] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 noaaportIngester[522] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[522] ERROR: [multicastReader.c:97] Couldn't > bind to port 1203 > Mar 2 12:35:37 noaaportIngester[522] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 ldmd[516] NOTE: child 520 exited with status 1: > noaaportIngester -m 224.0.1.1 -I 10.0.0.3 > Mar 2 12:35:37 noaaportIngester[524] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[524] ERROR: [multicastReader.c:97] Couldn't > bind to port 1205 > Mar 2 12:35:37 noaaportIngester[524] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 ldmd[516] NOTE: child 521 exited with status 1: > noaaportIngester -m 224.0.1.2 -I 10.0.0.3 > Mar 2 12:35:37 ldmd[516] NOTE: child 522 exited with status 1: > noaaportIngester -m 224.0.1.3 -I 10.0.0.3 > Mar 2 12:35:37 noaaportIngester[525] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[525] ERROR: [multicastReader.c:97] Couldn't > bind to port 1206 > Mar 2 12:35:37 ldmd[516] NOTE: child 523 exited with status 1: > noaaportIngester -m 224.0.1.4 -I 10.0.0.3 > Mar 2 12:35:37 noaaportIngester[525] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 noaaportIngester[526] ERROR: Address already in use > Mar 2 12:35:37 noaaportIngester[526] ERROR: [multicastReader.c:97] Couldn't > bind to port 1207 > Mar 2 12:35:37 noaaportIngester[526] ERROR: [noaaportIngester.c:340] > Couldn't create multicast-reader > Mar 2 12:35:37 pqact[527] NOTE: Starting Up > Mar 2 12:35:37 ldmd[516] NOTE: child 524 exited with status 1: > noaaportIngester -m 224.0.1.5 -I 10.0.0.3 > Mar 2 12:35:37 ldmd[516] NOTE: child 525 exited with status 1: > noaaportIngester -m 224.0.1.6 -I 10.0.0.3 > Mar 2 12:35:37 pqact[528] NOTE: Starting Up > Mar 2 12:35:37 ldmd[516] NOTE: child 526 exited with status 1: > noaaportIngester -m 224.0.1.7 -I 10.0.0.3 > Mar 2 12:35:37 pqact[528] NOTE: Starting from insertion-time 2015-03-02 > 18:01:12.401276 UTC > Mar 2 12:35:37 pqact[527] NOTE: Starting from insertion-time 2015-03-02 > 18:01:12.401276 UTC I suspect that you still have noaaportIngester(1) processes running. Would it be possible for me to log onto the system in question as the LDM user? Regards, Steve Emmerson Ticket Details =================== Ticket ID: SAE-848662 Department: Support LDM Priority: Normal Status: Closed