This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Justin Cooke wrote: > Chi, > > Was the change made to both ldm1 and ldm2? Yes. > > Justin > > Chi.Y.Kang wrote: >> Yes, I made the change to the LDM servers to test the shared memory >> configuration. >> >> # Setting SHMMAX Parameter 4 GB >> kernel.shmmax = 4294967296 >> # getconf PAGE_SIZE >> kernel.shmmni = 4096 >> kernel.shmall = 2097152 >> >> However, this doesn't explain the performance relief because... ldm >> doesn't seem to be using shared memory, or at least not listed on the >> table. Mr Cano thought LDM might be using this. >> >> ldm1:~$ ipcs -a >> >> ------ Shared Memory Segments -------- >> key shmid owner perms bytes nattch >> status 0x00000000 0 root 600 3976 >> 4 dest >> ------ Semaphore Arrays -------- >> key semid owner perms nsems >> ------ Message Queues -------- >> key msqid owner perms used-bytes messages >> >> Justin Cooke wrote: >> >>> Chi, >>> >>> Has anything at all changed on ldm1 since yesterday? Starting at 04Z >>> the feed on node6 improved dramatically, all other subscribers to ldm1 >>> also noticed improved performance. >>> >>> Justin >>> >>> Steve Chiswell wrote: >>> >>>> Justin, >>>> >>>> I noticed that the feeds from ldm1 dropped as you said. Do you know >>>> if anything >>>> changed related to that machine? >>>> >>>> I can add daffy back to ldm1 and see if things maintain their >>>> performance, but >>>> will wait to find out if any changes were made? Since ldm2 is still >>>> lagging, >>>> seems like it is not a network wide issue? >>>> >>>> Steve >>>> >>>> On Thu, 21 Jun 2007, Justin Cooke wrote: >>>> >>>> >>>> >>>>> Steve, >>>>> >>>>> Looking at the graphs it appears that transfers improved greatly >>>>> after >>>>> 04Z today. I did a netstat on ldm1 and I still see where atm and >>>>> flood >>>>> are subscribing to it, same as yesterday. >>>>> >>>>> Although looking at the latency graphs you provide it looks like >>>>> those >>>>> subscribing to ldm2 are still seeing delays. >>>>> >>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+atm.cise-nsf.gov >>>>> >>>>> >>>>> >>>>> Justin >>>>> >>>>> Steve Chiswell wrote: >>>>> >>>>>> Justin, >>>>>> >>>>>> I am receiving the stats from node6: >>>>>> Latency: >>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc?CONDUIT+node6.woc.noaa.gov >>>>>> >>>>>> >>>>>> Volume: >>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_vol_nc?CONDUIT+node6.woc.noaa.gov >>>>>> >>>>>> >>>>>> >>>>>> The latency there to ldm1 is climbing on the initial connection, and >>>>>> will start off by catching up on the last hours worth of data in the >>>>>> upstream queue. After that, we can see what the latency is doing. >>>>>> >>>>>> Steve >>>>>> >>>>>> On Wed, 2007-06-20 at 12:43 -0400, Justin Cooke wrote: >>>>>> >>>>>> >>>>>>> Steve and Chi, >>>>>>> >>>>>>> I tried to ping rtstats.unidata.ucar.edu but was unable to. >>>>>>> >>>>>>> Chi would you be able to set up a static route from node6 to >>>>>>> rstats.unidata.ucar.edu like Steve mentions? >>>>>>> >>>>>>> I actually am unable to connect to ncepldm.woc.noaa.gov either. >>>>>>> However >>>>>>> I did set up a feed to "ldm1" and am receiving CONDUIT data >>>>>>> currently. >>>>>>> >>>>>>> Steve how tough would it be to do the pqact step you mention and >>>>>>> to get >>>>>>> the stats reports from those if Chi is unable to get the static >>>>>>> route >>>>>>> going? >>>>>>> >>>>>>> Thanks for all the help, >>>>>>> >>>>>>> Justin >>>>>>> >>>>>>> On Jun 20, 2007, at 12:16 PM, Steve Chiswell wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Justin, >>>>>>>> >>>>>>>> Is that box capable of sending stats to our >>>>>>>> rtstats.unidata.ucar.edu >>>>>>>> host? >>>>>>>> Eg, is it allowed to connect outside your domain? >>>>>>>> >>>>>>>> The ldm won't need to run pqact to test out the throughput and >>>>>>>> netwrok, >>>>>>>> but will need ldmd.conf lines: >>>>>>>> >>>>>>>> EXEC "rtstats -h rtstats.unidata.ucar.edu" >>>>>>>> request CONDUIT ".*" ncepldm.woc.noaa.gov >>>>>>>> >>>>>>>> The pqact EXEC action can be commented out. The request >>>>>>>> line will start the feed to ncepldm which flood.atmos.uiuc.edu is >>>>>>>> pointing to, and showing high latency. If you are able to feed >>>>>>>> from >>>>>>>> ncepldm >>>>>>>> without the latency that outside hosts are showing, then it would >>>>>>>> isolate the >>>>>>>> problem further to the border of your network to the outside. If >>>>>>>> you do >>>>>>>> show similar latency, then it would either be the LDM >>>>>>>> configuration >>>>>>>> itself, or the local >>>>>>>> router that the machines are on. >>>>>>>> >>>>>>>> If you are able to send rtstats out to us, then we can monitor >>>>>>>> stats on >>>>>>>> our web pages. >>>>>>>> Your network might require a static route be added for sending >>>>>>>> that >>>>>>>> outside your domain (that would something your networking folks >>>>>>>> would >>>>>>>> know). The rtstats sends >>>>>>>> a small text report about every 60 seconds, so not a lot of >>>>>>>> traffic. >>>>>>>> >>>>>>>> If you can't configure your host to send rtstats, then we could >>>>>>>> create >>>>>>>> q >>>>>>>> pqact.conf action to file the .status reports and calculate the >>>>>>>> latency >>>>>>>> from those. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Steve >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 2007-06-20 at 12:03 -0400, Justin Cooke wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Steve, >>>>>>>>> >>>>>>>>> If you provide us a pqact.conf I can have the box chi set up to >>>>>>>>> feed >>>>>>>>> off of ldm1 and see how its latencies are. >>>>>>>>> >>>>>>>>> Justin >>>>>>>>> On Jun 20, 2007, at 11:36 AM, Steve Chiswell wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Justin, >>>>>>>>>> >>>>>>>>>> Since the change at 13Z by dropping daffy.unidata.ucar.edu out >>>>>>>>>> of the >>>>>>>>>> top level nodes the ldm2 feed to NSF is showing little/no >>>>>>>>>> latency at >>>>>>>>>> all. The ldm1 feed to NSF which is connected using the >>>>>>>>>> alternate LDM >>>>>>>>>> mode is only devivering the .status messages its creates as all >>>>>>>>>> the >>>>>>>>>> other products are duplicates of products already being >>>>>>>>>> received from >>>>>>>>>> LDM2 and that is showing high latency: >>>>>>>>>> http://www.unidata.ucar.edu/cgi-bin/rtstats/iddstats_nc? >>>>>>>>>> CONDUIT+atm.cise-nsf.gov >>>>>>>>>> >>>>>>>>>> This configuration is getting data out to the community at the >>>>>>>>>> moment. >>>>>>>>>> The downside here is that it puts a single point of failure at >>>>>>>>>> NSF in >>>>>>>>>> getting the data to Unidata, but >>>>>>>>>> I'll monitor that end. >>>>>>>>>> >>>>>>>>>> It seems that ldm1 is either slow, or it is showing network >>>>>>>>>> limitations >>>>>>>>>> (since >>>>>>>>>> flood.atmos.uiuc.edu is feeding from ncepldm which is apparently >>>>>>>>>> pointing to ldm1, there is load on ldm1 besides the NSF feed. >>>>>>>>>> LDM2 is >>>>>>>>>> feeding both NSF and idd.aos.wisc.edu (and Wisc looks good >>>>>>>>>> since 13Z >>>>>>>>>> as >>>>>>>>>> well) so it is able to >>>>>>>>>> handle the throughput to 2 downstreams, but adding daffy as the >>>>>>>>>> 3rd >>>>>>>>>> seems to >>>>>>>>>> cross some point in volume of what can be sent out. >>>>>>>>>> >>>>>>>>>> Steve >>>>>>>>>> >>>>>>>>>> On Wed, 2007-06-20 at 09:45 -0400, Justin Cooke wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Thanks Steve, >>>>>>>>>>> >>>>>>>>>>> Chi has set up a box on the lan for us to run LDM on, I am >>>>>>>>>>> beginning >>>>>>>>>>> to >>>>>>>>>>> get things running on there. >>>>>>>>>>> >>>>>>>>>>> have you seen any improvement since dropping daffy? >>>>>>>>>>> >>>>>>>>>>> Justin >>>>>>>>>>> >>>>>>>>>>> On Jun 20, 2007, at 9:03 AM, Steve Chiswell wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Justin, >>>>>>>>>>>> >>>>>>>>>>>> Yes, this does appear to be the case. I will drop daffy from >>>>>>>>>>>> feeding >>>>>>>>>>>> directly and instead move it to feed from NSF. That will >>>>>>>>>>>> remove one >>>>>>>>>>>> of the top level relays of data having to go out of NCEP and >>>>>>>>>>>> we can see if the other nodes show an improvement. >>>>>>>>>>>> >>>>>>>>>>>> Steve >>>>>>>>>>>> >>>>>>>>>>>> On Wed, 20 Jun 2007, Justin Cooke wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Steve, >>>>>>>>>>>>> >>>>>>>>>>>>> Did you see a slowdown to ldm2 after Pete and the other sites >>>>>>>>>>>>> began >>>>>>>>>>>>> making connections? >>>>>>>>>>>>> >>>>>>>>>>>>> Chi, considering steve saw a good connection to ldm1 >>>>>>>>>>>>> before the >>>>>>>>>>>>> other >>>>>>>>>>>>> sites connected doesn't that point toward a network issue? >>>>>>>>>>>>> >>>>>>>>>>>>> All of our queue processing on the diskserver has been >>>>>>>>>>>>> running >>>>>>>>>>>>> without >>>>>>>>>>>>> any problems so I don't believe anything on that system would >>>>>>>>>>>>> impacting >>>>>>>>>>>>> ldm1/ldm2. >>>>>>>>>>>>> >>>>>>>>>>>>> Justin >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 20, 2007, at 12:04 AM, Chi Y Kang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> I setup the test LDM server for the NCEP folks to test the >>>>>>>>>>>>>> local >>>>>>>>>>>>>> pull >>>>>>>>>>>>>> from the LDM servers. That should give us some >>>>>>>>>>>>>> information / >>>>>>>>>>>>>> network >>>>>>>>>>>>>> or system related issue. We'll handle that tomorrow. I >>>>>>>>>>>>>> am a >>>>>>>>>>>>>> little >>>>>>>>>>>>>> bit concerned that the slow down all occurred at the some >>>>>>>>>>>>>> time as >>>>>>>>>>>>>> the >>>>>>>>>>>>>> ldm1 crash last week. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, can NCEP also check if there are any bad dbnet >>>>>>>>>>>>>> queues on >>>>>>>>>>>>>> the >>>>>>>>>>>>>> backend servers? Just to verify. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steve Chiswell wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks Justin, >>>>>>>>>>>>>>> I also had a typo in my message: >>>>>>>>>>>>>>> ldm1 is running slower than ldm2 >>>>>>>>>>>>>>> Now if the feed to ldm2 all of a sudden slows down if Pete >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> other >>>>>>>>>>>>>>> sites add a request to it, it would really signal some >>>>>>>>>>>>>>> sort of >>>>>>>>>>>>>>> total >>>>>>>>>>>>>>> bandwidth limitation >>>>>>>>>>>>>>> on the I2 connection. Seemed a little coincidental that we >>>>>>>>>>>>>>> had a >>>>>>>>>>>>>>> show >>>>>>>>>>>>>>> period >>>>>>>>>>>>>>> of good connectivity to ldm1 after which it slowed way >>>>>>>>>>>>>>> down. >>>>>>>>>>>>>>> Steve >>>>>>>>>>>>>>> On Tue, 2007-06-19 at 17:01 -0400, Justin Cooke wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I just realized the issue. When I disabled the "pqact" >>>>>>>>>>>>>>>> process >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> ldm2 earlier today it caused our monitor script (in cron, >>>>>>>>>>>>>>>> every 5 >>>>>>>>>>>>>>>> min) to kill the LDM and restart it. I have removed the >>>>>>>>>>>>>>>> check >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> the pqact in that monitor...things should be a bit better >>>>>>>>>>>>>>>> now. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Chi.Y.Kang wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Huh, i thought you guys were on the system. let me >>>>>>>>>>>>>>>>> take a >>>>>>>>>>>>>>>>> look >>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>> ldm2 >>>>>>>>>>>>>>>>> and see what is going on. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Justin Cooke wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chi.Y.Kang wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Steve Chiswell wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Pete and David, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I changed the CONDUIT request lines at NSF and >>>>>>>>>>>>>>>>>>>> Unidata to >>>>>>>>>>>>>>>>>>>> request data >>>>>>>>>>>>>>>>>>>> from ldm1.woc.noaa.gov rather than >>>>>>>>>>>>>>>>>>>> ncepldm.woc.noaa.gov >>>>>>>>>>>>>>>>>>>> after >>>>>>>>>>>>>>>>>>>> seeing >>>>>>>>>>>>>>>>>>>> lots of >>>>>>>>>>>>>>>>>>>> disconnect/reconnects to the ncepldm virtual name. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The LDM appears to have caught up here as an interim >>>>>>>>>>>>>>>>>>>> solution. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Still don't know the cause of the problem. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Steve >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I know the NCEP was stop and starting the LDM service >>>>>>>>>>>>>>>>>>> on the >>>>>>>>>>>>>>>>>>> ldm2 >>>>>>>>>>>>>>>>>>> box >>>>>>>>>>>>>>>>>>> where the VIp address is pointed to at this time. >>>>>>>>>>>>>>>>>>> how is >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>>>> connection to LDM1? is the speed of the conduit feed >>>>>>>>>>>>>>>>>>> acceptable? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chi, NCEP has not restarted the LDM on ldm2 at all >>>>>>>>>>>>>>>>>> today. But >>>>>>>>>>>>>>>>>> looking >>>>>>>>>>>>>>>>>> at the logs it appears to be dying and getting >>>>>>>>>>>>>>>>>> restarted by >>>>>>>>>>>>>>>>>> cron. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I will watch and see if I see anything. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Justin >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Chi Y. Kang >>>>>>>>>>>>>> Contractor >>>>>>>>>>>>>> Principal Engineer >>>>>>>>>>>>>> Phone: 301-713-3333 x201 >>>>>>>>>>>>>> Cell: 240-338-1059 >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Steve Chiswell <address@hidden> >>>>>>>>>> Unidata >>>>>>>>>> >>>>>>>>>> >>>>>>>> -- >>>>>>>> Steve Chiswell <address@hidden> >>>>>>>> Unidata >>>>>>>> >>>>>>>> >> >> >> -- Chi Y. Kang Contractor Principal Engineer Phone: 301-713-3333 x201 Cell: 240-338-1059