This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: Leigh Orf <address@hidden> >Organization: UNCA >Keywords: 200102052335.f15NZkX27276 McIDAS-XCD Leigh, I just wanted to quickly touch base with you to let you know that I am still hammering away at the sounding serving problem we have run into on your system. I have a request in at another site to give me login access for some tests. The reason for this is that they are running RedHat 5.2. It will be interesting to see if the failure with sounding servicing started with RedHat 6.x (i.e., libc 2.0). >Some more puzzle pieces here. I ran a packet sniffer on my home machine >while viewing a skewt. I was lucky enough to capture a succesful >one from home with storm2 (in the morning when the network isn't >saturated) along with three unsuccesful ones. I used a very cool program >called ethereal. I used version 0.8.14 which was compiled with GTK+ >1.2.8, libpcap 0.4 and libz1.1.3 if you want to view this stuff for >yourself. See http://ethereal.zing.org for info if you want to build >this (rpms are available for Linux). Thanks for the reference. >I have created a /tmp/mcidas directory on storm2 and put the files >strace in there. There is a subdirectory called ethereal with a file >containing the ethereal file with all the traffic in a file which can be >opened by ethereal, and two binary stream dumps, one of a succesful read >and one of a failure. OK. >Here's a summary of what I found in case you don't want to muck with >this yourself. > >Each ADDE skewT load happens in two parts. I will use an example from >my machine. A connection is made between local (home machine) port 1069 >and remote (storm2) 500 (mcserv). This stream is *always* succesfully >complete. Some text strings from this transfer include: > >MDFH >USER >MDFH >RTPTSRC UPPERMAND POS=0 MAX=1 BPOS=1 EPOS=9999 VERSION=1 > >IRAB >Mand. Level RAOB for 05 FEB 2001 >USER >MDXX0016 >IRAB >Mand. Level RAOB for 06 FEB 2001 >USER >MDXX0017 >IRAB >Mand. Level RAOB for 07 FEB 2001 >USER >MDXX0018 >IRAB >Mand. Level RAOB for 08 FEB 2001 >USER >MDXX0019 >IRSG >Sig. Level RAOB for 05 FEB 2001 >USER >MDXX0026 >IRSG >Sig. Level RAOB for 06 FEB 2001 >USER >MDXX0027 You can see this (although it is harder) from the strace outputs. The mandatory level MD files are being scanned to see which one contains the data that is needed for the sounding. >... etc. > >Then, a second connection is made between local port 1070 and remote >mcserv. This stream is the one that fails. Some ascii text from it (I'm >just running strings on it): > >USER >RTPTSRC UPPERMAND MAX=10000 SIG='RTPTSRC/UPPERSIG' SELECT= 'IDN 72317' >'DAY 2001039' 'TIME 0' POS=ALL TRACE=0 VERSION=1 LAT >HLAT LON LEV P T TD DIR SPD Z IDN DAY TIMEZS ST CO HMS MOD >NREC >HDEG DEG CHARMB K K DEG MPS M CYD HMS M CHARCHARHMS >SFC >NC US >IRAB >1000 >NC US >IRAB >925 >NC US >IRAB >850 ... Now, the request is for: o mandatory level data (IRAB) o significant level data (SIGT) o significant level data (SIGW) >SIGT >SIGT >SIGT >SIGT >SIGT >. >. >. >SIGW >SIGW >SIGW >SIGW ... >The second stream is the one that is truncated. Right, and it is truncated AFTER transferring all of the mandatory level and significant wrt Temperature data. It is failing near the end of all transfers when doing significant wrt Wind data. >What is interesting >is that this stream is usually *almost* entirely read before >it is truncated. I noticed this also. >In fact, for the two stream dumps I put in >/tmp/mcidas/ethreal (which are of the second stream for each case), >the difference between the successful and unsuccesful read was only >four bytes! Man. I will have to seriously look through these files. >But it's not always this close, if you look at the other >unsuccesful packet traces sometimes it only gets about 90% through >before getting truncated. I seemed to notice this inconsistency also. I also noticed that old_mmap was being called. When I tried to find an old_mmap entry point in a library, I was stymied. >So the most interesting part of what I learned through this is how close >the second stream gets to being read before getting truncated. Some >process is closing that pipe *just* before it can complete. It seems to >me the bugfix for this might be a one-liner, if you can just find where >to put it :) A one liner if it is in the McIDAS code, and a lot of hassle if it is something in the OS. >Anyway, that's probably as much debugging I'm gonna do on this, I figure >you know the code a lot better than I do. Hope this helps. This did help, thanks. I will keep hammering away at this until I come up with some sort of solution. Tom >From address@hidden Tue Feb 13 14:02:31 2001 >Subject: Re: 20010211: XCD GRID decoding, SOUNDINGS HODOGRAPH (cont.) Tom, >>On the model stuff: Running from Fkey gives this: >>Command (on one line): >> FOUSDISP T OLAY +00 INT=00:30 DAY=2001042 MODE=X GRIDF=132 >> GRA=12 SF=YES PRO=CONF >> OUT=PLO GU=GRAPHIC BLANK=NO >>Error: >> pipe read: Connection reset by peer > >Yikes!!!! The pipe read error is the same one I am running into when >trying to serve sounding data from Linux. Uhhh...I don't think this is your error. This is a semi-normal nonsensical message from some programs using the socket interface. After the sockets do an initial connect, a child is spawned and the port number is adjusted, internally to the kernel networking, to free the parent socket for the next incoming connection request. Otherwise only one program could ever connect to a port number. Some client programs interpret this as an error and report this as a serious sounding message. In fact, the connection was maintained throughout the process by the socket interface. The connect() process usually spawns a child for accept(). For some brief instant of time, there are actually two active ports. The remote clients often interpret this hand off as an event and kick out the above message. It isn't an error...normal Un-ix multitasking stuff at the socket interface is just being reported. But, it does indicate that the network interface is being used. Of course, this is normal even on the same Unix machine, (sockets are easy to use for IPC. Much easier than the shared memory, message queues and semaphores of Sys V unix.) I was under the impression that this was how the ADDE worked? Thus, the LOCAL-DATA in the LOCDATA.BAT is redundant for sessions originating on the same machine. You should be able to substitute the hostname for LOCAL-DATA. When you do, you will see this same message on the local machine. > >> FOUSDISP: Unable to execute PTDISP command >> FOUSDISP - Done >> FOUSDISP failed, RC=1 > I think this is the real error. Looks like a script error or a bad call, or just plain bad communications between two programs: FOUSDISP and PTDISP I am assuming that FOUSDISP is a wrapper around PTDISP. >I just tried serving FOUS14 data from the same server at UNCA that I >am having problems with (sounding serving), and it serves FOUS14 data >just fine. FOUSDISP plots FOUS14 data which, even though it is data >from a model, is stored in a McIDAS MD file and is part of the RTPTSRC >dataset. I wonder what the pipe read error you are seeing is telling >us about ADDE service on Linux!? > >Tom Absolutly. Hmmm...OK. This would be consist ant with what I am seeing. There are several flags to the socket() call when the sockets are instanciated. These are macros in the /usr/include files....uhhh, socket.h, I think. (Well, close.../usr/include/sys/socket.h) Red Hat does some weird stuff with their Protected Interface stuff...These are pi_socket.h and others. ADDE has been pretty workable in the past, so, I suspect that the RH people of taking liberties with the include files. There are really only two major kinds of socket interfaces: AF_UNIX (or AF_LOCAL) AF_INET A local interface and a remote interface basically. (Pretty much corresponding to the two types of connections in LOCDATA.BAT.) Of course, you know this stuff, soo, I'm just thinking out loud. But I suspect the error is in there. Gotta run, some personal support issues... If I don't send it now, you won't get it till the morrow. jdm