This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>From: "Kevin Polston" <address@hidden> >Organization: NOAA >Keywords: 200203112125.g2BLPSa16982 IDD Kevin, >Hi there. I have been monitoring the ldm over the past few days and >here are some observations. First, data continues to be running >late.....for surface obs, radar data and satellite imagery. It is not >late all the time however. In the evening...it appears to get caught >back up and runs in a pretty timely manner. Then, after 9am (sometimes >earlier) it starts slowing down again. This sounds like the cause is network congestion either at your site or somewhere up the line. >The satellite data, when it >comes in on time, runs very well. However....it seems few and far >between when that happens. The majority of the day it seems to run >between 60 and 90 minutes behind the actual time. Hmm... It is odd that the data can be 90 minutes behind since there is a 1 hour default request limit in the LDM. >Same with the radar >data. Sometimes it is a little "better" in that it is only 45-60 >minutes behind actual time. But rarely do I see it running close to the >actual time - unless it is in the evening. Even then it seems to lag >behind. My initial thoughts were I have too much data coming in and it >is overwhelming my bandwidth. It might be overwhelming your machine's ability to do remote procedure calls to transfer the data. Just a thought. >But then I thought how could that be >since the data, even though it is ~60 minutes behind is staying >consistently at that lag time. So why couldn't it stay at the current >time? Good question. One thing that does happen on the upstream side is a downstream's data request gets reclassed to look for times that are more current. >Then when things are running well there doesn't seem to be a >problem as far as timeliness goes. So is it really a bandwidth issue or >what? I don't have enough information to answer that. It is situations like these that our ability to do notifymes to your machine help us to troubleshoot problems. Since we can not even contact your machine, using these kinds of tools is not possible. It would be a _very_ good idea to find out what at your site is making it impossible for us to contact your LDM. >I have also been downloading model data and I wonder if that is >slowing things down? If the other problem is caused by network bandwidth, then yes this would slow things down. >The model data has been doing pretty well until >the last couple of days when it seems I am missing certain fields or >times. I wonder if it is because the data actually hasn't been >processed or downloaded yet as opposed to the data actually missing. Which model data are you FTPing? Is this the decoded GEMPAK files? If so, and if the target of the FTP is motherlode, then missing fields would indicate that they are missing in the original NOAAPORT broadcast since motherlode is fed directly from a NOAAPORT satellite receiver. >But perhaps it is all related to the timeliness issue...which would >explain the missing fields. So what would you suggest. I got rid of >all the other radar products so I am just back to the "/pN0R" data. OK, but this is still a lot of products. >I am wondering if I need to cut back on that too. Perhaps. If your site were reporting statistics, we could see if the slowness in one feed is caused by bottlenecks in another. >After editing out the >WMO data coming in that solved my disk space problem OK, good. >(and the prune >scripts are running quite nicely) I started ingesting all the satellite >data again but I changed it back to just the EAST/WEST-CONUS areas to >see if that would help. So far it has not. I believe that I had you split your feeds to cut down on possible slowness. Yes, here are the lines from your ldmd.conf file: request DDPLUS|IDS|HRS|FSL2 ".*" papagayo.unl.edu request NIMAGE "WEST-CONUS|EAST-CONUS" 129.93.52.150 request NEXRAD "/pN0R" papagayo.unl.edu With this setup, the NIMAGE stuff is being ingested by one rpc.ldmd and all of the other stuff is ingested by a second rpc.ldmd. If one of the feeds like NEXRAD is causing the bottleneck in the DDPLUS|IDS|etc., then one thing you can do is split the feed again. The way to do this is to create an alias for papagayo.unl.edu in your /etc/hosts file and then use that alias in the request line. Here is an example: IF /etc/nsswitch.conf sets up search for machine names by the /etc/hosts file and then DNS, then the entry will look like: hosts: files dns If it is setup to use DNS before looking in /etc/hosts, the entry will look like: hosts: dns files You want yours to look like: hosts: files dns for the following to work. Edit /etc/hosts (as root) and add: 129.93.52.150 papagayo.unl.edu papagayo2.unl.edu After doing this, modify your ~ldm/etc/ldmd.conf file and change the NEXRAD entry as I indicate here: request DDPLUS|IDS|HRS|FSL2 ".*" papagayo.unl.edu request NIMAGE "WEST-CONUS|EAST-CONUS" 129.93.52.150 request NEXRAD "/pN0R" papagayo2.unl.edu This will force your system to run three rpc.ldmd processes: one for DDPLUS|IDS|HRS|FSL2; one for NIMAGE; and one for NEXRAD. Again, if the slowness in one feed is a result of the slowness in a different feed (not in the same request line), then this will help. >Another thing I noticed was when I edit the ldmd.conf file and restart >ldm (after stopping it properly of course), it seemed like the data >could never get caught up until 00Z for the next day and even then >sometimes it lagged. There was one day where the satellite data didn't >even come in for several hours after I had re-started ldm. Without being more specific, I can't make any real comments about this. I can say, however, that there was a day recently (last week) when the NIMAGE feed was interrupted for several hours. >Then in the >morning it looked timely (initially) before lagging behind. During the >times that no satellite imagery was being ingested I checked the >ldmd.log file. There were several entries that said something about a >broken pipe and db_flush pqact and write errors. This is too general of a report. Specifics from the log file would help. >When the data is >coming in (even when it is late) I might see occasionally a line in >there that says "skipped". I don't know if that means anything to you >or not. This is when the data gets reclassed on the up stream machine. The reclass is skipping ahead in the queue because the data next up for relaying is already over an hour old. >What is really perplexing is that I still FTP a couple of data >products (primarily the regional radar composites from the University >of Arizona) and as long as there is no problem on their end I get the >data extremely timely....usually within 10 minutes or less of the >actual time. So the question of the day is how to solve this tardiness >problem. And how do I know if it is me or the upstream feed sites? You can figure out if it the data is late coming from your upstream feed site or you by running notifyme. For instance, in two different xterms, run side-by-side invocations of notifyme: notifyme -vxl- -f NIMAGE -h papagayo.unl.edu notifyme -vxl- -f NIMAGE The first will tell you when papagayo gets a product, and the second will tell you when your LDM gets a product. You can then compare the times that both were received and see if the problem is upstream from papagayo (not likely since the NIMAGE feed is coming from motherlode which gets it right from the satellite dish) or in the link to you. Also, if you are receiving the products in a timely manner, but they are not being written to disk in a timely manner it means that your LDM's pqact routine might be struggling to get though the queue to process products. This could be caused by your FILEing of every NEXRAD product and possibly a slow disk. If a slow pqact is your problem, we can address that separately. For now, we need to isolate where the bottleneck is on a feed by feed basis. >What is frustrating is there are times when the data is very timely and >I start to think ok we're back on track but then inevitably it falls >behind again. This really does sound like a network bottleneck. >So I am seeking your help, ldm guru. Also, last time we >chatted you mentioned a national and/or regional VIL composite imagery >is in the works. When do you think that will be available for >consumption? You are already getting the national N0R product in the NIMAGE feed (when you do not limit ingest to WEST-CONUS and EAST-CONUS). I thought I sent you the pqact.conf action to file these images, but since I don't see it in your attached pqact.conf, I will assume that I didn't. Here is a version that attempts to follow the directory structure you are using: # # png compressed 1km radar GINI format NIMAGE ^rad_(........)_(....) PIPE -close util/readpng -n -l logs/png.log /usr1/nawips/metdat/images/sat/RADAR/1km/rad/rad_\1_\2 Also, if you want to continue to limit the NIMAGE ingest to the WEST and EAST CONUS images, then you should add a separate NIMAGE request line to your ldmd.conf file: request NIMAGE "^rad_" papagayo2.unl.edu >I have sent my current pqact.conf and ldmd.conf files for you to look at. Got em. Tom