This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Karen, > Interesting problem. If you could give me any ideas on what might have > gone wrong here I would greatly appreciate it. I hope this all makes > sense..... > > Machine A(pluto) is sending data to Machine B, it also gets data from > Machine B. > > Machine B(dontpanic) receives data from Machine A and sends it to > Machine C, and vice-versa. > > Machine C(kyodai) gets data from Machine B and also sends to Machine B. > > So basically Machine B is a middle-man for A and C (which do not > interact directly -- different networks with Machine B in a DMZ). > > Today Machine C created a new file which was inserted into its queue and > then sent to Machine B. However, when Machine B tried to send the file > on to Machine A, it got an error > > Log snippet from Machine B -- 20287 is the PID of the rpc.ldmd for > Machine A : > > Apr 26 16:38:03 dontpanic 172.16.20.32[20287] ERROR: pqe_new(): zero > product size > Apr 26 16:38:03 dontpanic 172.16.20.32[20287] ERROR: pqe_new() failed: > Invalid argument: d41d8cd98f00b204e9800998ecf8427e 0 > 20070426163803.011 EXP 000 > wdssii/KTLX_RVP.20070426.163650.vcp32.2.dN5.nc.gz > Apr 26 16:38:03 dontpanic rpc.ldmd[20251] NOTE: child 20287 exited with > status 10 The above indicates that the downstream LDM on host Dontpanic received a zero-length data-product from the upstream LDM on host 172.16.20.32. Due to a bug in the code, this caused the downstream LDM to terminate. This would only happen if the data transfer was occurring in ALTERNATE mode. I've fixed the code and it will be in the next release (6.6.4). Why was a zero-length data-product generated? > However, the data originated on Machine C, so if there was a problem > with it I'm not sure how it got into the queue successfully on Machine > C, or how it got sent successfully to Machine B. > > Log snippet from Machine A shows: > > Apr 26 16:38:03 pluto dontpanic(feed)[7356] ERROR: feed or notify > failure; COMINGSOON: RPC: Remote system error > Apr 26 16:38:03 pluto rpc.ldmd[7351] NOTE: child 7356 exited with status 6 The above indicates that the upstream LDM on host Pluto couldn't send a data-product to the downstream LDM on host Dontpanic. Consequently, it terminated. This was at the same time that the downstream LDM on host Dontpanic that received the zero-length data-product terminated. Was it sent from host Pluto? > Thereafter no data was processed between Machines A and B. Machine C > was idle as it doesn't have anything to do unless it gets the data from > A (via B) first. > > Restarting LDM on Machines A and C had no effect (not surprising), but > restarting LDm on Machine B solved the problem. > > -- > Beware programmers who carry screwdrivers. > > ------------------------------------------- > address@hidden > > Phone: 405-325-6982 > Cell: 405-834-8559 > SAIC/Systems Analyst > National Severe Storms Laboratory Regards, Steve Emmerson Ticket Details =================== Ticket ID: FBV-178390 Department: Support LDM Priority: Normal Status: On Hold