[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20010516: 20010514: 20010507: 20010507: oabsnd and swap space
- Subject: 20010516: 20010514: 20010507: 20010507: oabsnd and swap space
- Date: Thu, 17 May 2001 13:17:28 -0600
Chris,
I logged in and took a look at your system. I couldn't run "sar" on
your system. You didn't have "top" that I could find so I copied one over:
----
load averages: 1.50, 0.77, 0.50 14:36:58
95 processes: 90 sleeping, 3 running, 1 zombie, 1 on cpu
CPU states: 39.4% idle, 36.5% user, 9.2% kernel, 14.9% iowait, 0.0% swap
Memory: 512M real, 13M free, 250M swap in use, 125M swap free
---
It appears that it really is a lack of system resources that is causing the
problem of forking the process. Some of the problem could possibly be
alleviated with fewer httpd processes. You could probably stand to add
some sawp space since you only have about 375MB available with 512MB of RAM.
In GEMPAK 5.4, the maximum grid size was 100,000 grid points. I increased this
size to 400,000 in GEMPAK 5.6 to enable the use of several of
NCEP's larger grids.
Since oabsnd (about 90M of space is allocated on launching) probably uses up
the available free memory, the forking of gplt fails.
We have 3 options:
1) add more swap and see if this helps. You can do this with "mkfile" and
adding it to swap with "swap -a". If you have the disk space, adding
128MB would get you closer to the 1 to 1 ratio.
2) recompile GEMPAK 5.6 with the smaller grid sizes defined so that you have the
same array sizes as when you were happy with GEMPAK 5.4.
3) I was able to get the upperair.csh script to somewhat work around the problem
by forcing gplt to be launched before running oabsnd. This worked better,
but still ran into problems when McIDAS was doing imgremap.k.
Here is what I tried:
I created a directory under $GEMDATA/tmp/upperair.chiz and coppied your
upperair.csh script to that directory for tinkering. I changed some paths
in the script so as not to overwrite your grids in $HOME, and to
use the upperair.chiz directory.
Following the gddelt section you have, I added:
# Lets just launch a program to get gplt fired up. Use this one
# gplt for the entire script.
echo " "
echo "get gplt launched....."
gpmap << GPLT
e
GPLT
Launching gpmap gets gplt running. Now, you won't have to worry about
forking gplt in oabsnd, since it is already running. You can still see:
"Killed" when trying to launch oabsnd if you don't have enough memory.
I also removed the individual gpend commands you had in the script
since you only need it after you are finished with all your oabsnd invocations.
Since it takes time to fork the gplt process, you are better of only
having to start it up once.
I left the upperair.chiz directory (with "top" there too) for you.
One other thing, in your .cshrc, your source Gemenviron, then you set
you path. Since Gemenviron will add GEMEXE and SCRIPTS_EXE to your path,
it is better to set your path first (without hardcoding the gempak
binary directory in it, and then sourcing Gemenviron. Since you were not
adding the SCRIPTS_EXE directory into the PATH, you were overriding the PATH
and preventing scripts like "cleanup" from working.
Running "cleanup -c" will remove the message queues and kill off the
gplt and parrent processes for a user.
Steve Chiswell
Unidata User Support
>From: address@hidden (Chris Hennon)
>Organization: UCAR/Unidata
>Keywords: 200105161652.f4GGqdp13789
>Steve -
>
>I apologize for taking up so much of your time. I'll understand if you
>have other things to take care of.
>
>I've been working with one specific script which I am running by itself.
>Hopefully, this specific example will yield some useful information. I
>was wondering if you could login to my machine and take a look. The
>script is located in:
>
>/usr/local/gempak/scripts/upperair/upperair.csh
>
>Basically, it runs oabsnd multiple times to create upperair grids, then
>calls a variety of other scripts that produce upperair plots. When this
>script completes, it leaves behind several message queues:
>
>ipcs -pt
>IPC status from <running system> as of Wed May 16 11:15:48 EDT 2001
>T ID KEY MODE OWNER GROUP LSPID LRPID STIME
>STIME RTIME CTIME
>Message Queues:
>q 2951 0x4b3fb75 --rw-rw-rw- gempak ldm 4181 0 23:30:03
>q 102 0x4b3fbe4 --rw-rw-rw- gempak ldm 4292 0 23:30:11
>q 103 0x4b3fc28 --rw-rw-rw- gempak ldm 4360 0 23:30:20
>q 104 0x4b3fca4 --rw-rw-rw- gempak ldm 4484 0 23:30:28
>q 105 0x4b3fd0d --rw-rw-rw- gempak ldm 4589 0 23:30:36
>q 106 0x4b3fd81 --rw-rw-rw- gempak ldm 4705 0 23:30:44
>q 107 0x4b3fdf1 --rw-rw-rw- gempak ldm 481 0 23:30:52
>q 1108 0x4b402d8 --rw-rw-rw- gempak ldm 6072 0 23:35:20
>q 109 0x4b402f1 --rw-rw-rw- gempak ldm 6097 0 23:35:39
>T ID KEY MODE OWNER GROUP CPID LPID
>ATIME DTIME CTIME
>Shared Memory:
>m 202 0 --rw-rw-rw- gempak ldm 19174 294
>17:01:19 17:01:19 17:01:19
>T ID KEY MODE OWNER GROUP OTIME CTIME
>Semaphores:
>twister:[/home/chennon/output/gifs/sat/1998]%
>
>but no gplt processes. There is a log file from the last time I tried to
>run the script in:
>
>/usr/local/gempak/logs/upperair.log
>
>I ran it just after a reboot, so the system should have been clean. One
>other thing that happened after the rebuild that shouldn't have an impact
>but I thought I would mention - we turned off a bunch of system processes
>due to security concerns - the ones that are no longer active are in
>/etc/rc2.d/turnedoff. I don't see any that would have an impact on gempak
>programs but I thought I would mention it.
>
>I appreciate your efforts. Thanks.
>
>Chris
>
>================================================
>| Chris Hennon Ohio State University |
>| Tropical Meteorology address@hidden |
>| |
>| Dept of Geography Office: 1155 Derby Hall |
>| 1036 Derby Hall Phone : (614) 292-2704 |
>| Columbus, OH 43210 Fax : (614) 292-6213 |
>================================================
>
>On Mon, 14 May 2001, Unidata Support wrote:
>
>>
>> Chris,
>> I'm not saying that you can't run more than 1 GEMPAK program at the same tim
> e.
>> What I can say is:
>> 1) if you have a program that frequently exits abnormally, and leaves behind
>> a gplt, or other process, then the likelihood is that system resources wi
> ll
>> start to run short.
>>
>> 2) If 2 processes ask for a gplt at the same time, it is possible for both p
> rograms
>> to be issued the same message queue ID by the system. This happens becaus
> e
>> until the program actually gets the gplt process running, the system will
> keep
>> handing out the same available message queue. Using _gf programs where th
> e
>> gplt and gf processes are linked to the application reduces the total
>> number of processes running on your system at any one time, and avoids th
> e
>> use of message queues- thereby avoiding the possible conflict above.
>>
>> 3) If multiple programs are running at the same time, you should have ntl ru
> nning
>> on the display so that all processes use the shared color map so you don'
> t run out of
>> colors on the display (you can run ntl on a screen:1 as well). Or, use th
> e gif device
>> driver that doesn't require an X display to be running (you'll have to us
> e message queues
>> for the gif driver - except with the radmap_sw program which I do have li
> nked with gif insted of gf).
>>
>> Nothing has changed in the underlying message queue system between 5.4 and 5
> .6, or the
>> shared color system- so that isn't a cause for differences.
>>
>> when you say that models take 2-3 hours to run, are you saying that the time
> over which the data
>> arrives is 2-3 hours, or are you saying that the GEMPAK programs take that l
> ong to run?
>> I can help you organize actions to kick off when the LDM receives necessary
> grids, or
>> determine when all the pieces of data exists so that you don't have to run p
> rograms
>> multiple times to recreate plots as more data arrives. Let me know if I can
> help you.
>>
>> Steve Chiswell
>> Unidata User Support
>>
>>
>>
>>
>>
>>
>> >From: address@hidden (Chris Hennon)
>> >Organization: UCAR/Unidata
>> >Keywords: 200105142213.f4EMDfp11331
>>
>> >Steve -
>> >
>> >This issue seems to have been resolved after a reboot, though I am not
>> >sure why.
>> >
>> >Just to clarify,
>> >are you saying that two or more gempak programs cannot be running at the
>> >same time? When I was using 5.4 and before the rebuild, I sometimes had 4
>> >or 5 scripts cranking along at the same time with no problem. I've
>> >followed your suggestions, using the _gf programs where possible and using
>> >master scripts for large jobs. But there are still issues with
>> >overlapping jobs - for example, surface fields get plotted every hour, but
>> >to run the NGM,ETA, and AVN models takes at least 2-3 hours to run.
>> >
>> >Thanks ahead.
>> >
>> >Chris
>> >
>> >================================================
>> >| Chris Hennon Ohio State University |
>> >| Tropical Meteorology address@hidden |
>> >| |
>> >| Dept of Geography Office: 1155 Derby Hall |
>> >| 1036 Derby Hall Phone : (614) 292-2704 |
>> >| Columbus, OH 43210 Fax : (614) 292-6213 |
>> >================================================
>> >
>> >On Mon, 7 May 2001, Unidata Support wrote:
>> >
>> >>
>> >> Chris,
>> >>
>> >> I was actually refering to the grid dimensions, can you send me the
>> >> GDINFO for your grid file?
>> >>
>> >> Steve Chiswell
>> >> Unidata User Support
>> >>
>> >>
>> >>
>> >> >From: address@hidden (Chris Hennon)
>> >> >Organization: UCAR/Unidata
>> >> >Keywords: 200105071933.f47JXqp00391
>> >>
>> >> >Steve -
>> >> >
>> >> >The upperstr.grd file is pretty big:
>> >> >
>> >> >twister:[/usr/local/gempak/grids]% ls -l
>> >> >-rw-r--r-- 1 gempak ldm 2575360 Apr 12 23:30 upperstr.grd
>> >> >
>> >> >oabsnd is version 5.6.a, as is dcuair.
>> >> >
>> >> >Chris
>> >> >
>> >> >================================================
>> >> >| Chris Hennon Ohio State University |
>> >> >| Tropical Meteorology address@hidden |
>> >> >| |
>> >> >| Dept of Geography Office: 1155 Derby Hall |
>> >> >| 1036 Derby Hall Phone : (614) 292-2704 |
>> >> >| Columbus, OH 43210 Fax : (614) 292-6213 |
>> >> >================================================
>> >> >
>> >> >On Mon, 7 May 2001, Unidata Support wrote:
>> >> >
>> >> >>
>> >> >> Chris,
>> >> >>
>> >> >> What is the size of the $HOME/grids/upperstr.grd file?
>> >> >> What version of GEMPAK are you running (eg 5.6, 5.6.C)?
>> >> >> Are you running a different version of the dcuair decoder?
>> >> >>
>> >> >> For example:
>> >> >> GEMPAK-OABSND>version
>> >> >>
>> >> >> GEMPAK Version 5.6.c.1
>> >> >>
>> >> >> % dcuair -help
>> >> >> ....
>> >> >> >Version 5.6.c.1<
>> >> >>
>> >> >>
>> >> >> Steve Chiswell
>> >> >> Unidata User Support
>> >> >>
>> >> >>
>> >> >> >From: address@hidden (Chris Hennon)
>> >> >> >Organization: UCAR/Unidata
>> >> >> >Keywords: 200105071647.f47Gltp15071
>> >> >>
>> >> >> >Steve -
>> >> >> >
>> >> >> >I double checked and all looks well there:
>> >> >> >
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]% cd $GEMEXE
>> >> >> >twister:[/usr/local/gempak/bin/sol]% ls -l gplt
>> >> >> >-rwxr-xr-x 1 gempak ldm 496276 Apr 23 13:45 gplt*
>> >> >> >twister:[/usr/local/gempak/bin/sol]% cd ../../scripts/upperair
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]% oabsnd
>> >> >> > SNFILE Sounding data file $RAW_UPA/20010507_upa.ge
> m
>> >> >> > GDFILE Grid file $HOME/grids/upperstr.grd
>> >> >> > SNPARM Sounding parameter list tmpc
>> >> >> > STNDEX Stability indices
>> >> >> > LEVELS Vertical levels 925
>> >> >> > VCOORD Vertical coordinate type PRES
>> >> >> > DATTIM Date/time 12
>> >> >> > DTAAREA Data area for OA
>> >> >> > GUESS Guess file*time
>> >> >> > GAMMA Convergence parameter 0.3
>> >> >> > SEARCH Search radius/Extrapolation 20/EX
>> >> >> > NPASS Number of passes 2
>> >> >> > QCNTL Quality control threshold
>> >> >> > Parameters requested: SNFILE,GDFILE,SNPARM,STNDEX,LEVELS,VCOORD,DATT
> IM,
>> >> >> > DTAAREA,GUESS,GAMMA,SEARCH,NPASS,QCNTL.
>> >> >> > GEMPAK-OABSND>r
>> >> >> >Could not fork
>> >> >> > [GEMPLT -101] NOPROC - Nonexistent executable.
>> >> >> > [OABSND -3] Fatal error initializing GEMPLT.
>> >> >> >twister:[/usr/local/gempak/scripts/upperair]%
>> >> >> >
>> >> >> >Chris
>> >> >> >
>> >> >> >================================================
>> >> >> >| Chris Hennon Ohio State University |
>> >> >> >| Tropical Meteorology address@hidden |
>> >> >> >| |
>> >> >> >| Dept of Geography Office: 1155 Derby Hall |
>> >> >> >| 1036 Derby Hall Phone : (614) 292-2704 |
>> >> >> >| Columbus, OH 43210 Fax : (614) 292-6213 |
>> >> >> >================================================
>> >> >> >
>> >> >> >On Mon, 7 May 2001, Unidata Support wrote:
>> >> >> >
>> >> >> >>
>> >> >> >> Chris,
>> >> >> >>
>> >> >> >> OABSFC requires that "gplt" be found. The non-existent
>> >> >> >> executable seems to indicate that $GEMEXE/gplt is either
>> >> >> >> not bring found, that you don't have permission to execute it,
>> >> >> >> or that for some reason the system is not able to execute gplt.
>> >> >> >>
>> >> >> >> Since it says non-existent, it sounds like the program is
>> >> >> >> not being found. See if there is any problem with your $GEMEXE
>> >> >> >> environmental variable (which is set when you sourced Gemenviron),
>> >> >> >> and double check that gplt is executable as well.
>> >> >> >>
>> >> >> >> The attempt to execute gplt occurs when you run the analysis,
>> >> >> >> eg, not when you first start up oabxxx.
>> >> >> >>
>> >> >> >> Steve Chiswell
>> >> >> >> Unidata User Support
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> >From: address@hidden (Chris Hennon)
>> >> >> >> >Organization: UCAR/Unidata
>> >> >> >> >Keywords: 200105071618.f47GIbp13844
>> >> >> >>
>> >> >> >> >Steve -
>> >> >> >> >
>> >> >> >> >I've run into a curious problem. I'm trying to run "oabsnd" for j
> ust
>> > on
>> >> > e
>> >> >> >> >level and one variable and the program exits with a NOPROC - Nonex
> ist
>> > ent
>> >> >> >> >executable and "Could not fork" errors. I think I have plenty of
> swa
>> > p
>> >> >> >> >space:
>> >> >> >> >
>> >> >> >> >swap -s
>> >> >> >> >total: 67792k bytes allocated + 167728k reserved = 235520k used, 1
> 596
>> > 08k
>> >> >> >> >available
>> >> >> >> >
>> >> >> >> >There are no rogue processes around that I can see. There are no
> dea
>> > d
>> >> >> >> >message queues. In the past, I have run oabsnd under the same con
> dit
>> > ion
>> >> > s
>> >> >> >> >without a problem, even with more levels and more variables. The
> sup
>> > por
>> >> > t
>> >> >> >> >archives all seem to indicate a problem with either swap space or
> orp
>> > han
>> >> > ed
>> >> >> >> >processes but it doesn't appear that I have those issues. Any ide
> as?
>> >> >> >> >Thanks.
>> >> >> >> >
>> >> >> >> >Chris
>> >> >> >> >
>> >> >> >> >================================================
>> >> >> >> >| Chris Hennon Ohio State University |
>> >> >> >> >| Tropical Meteorology address@hidden |
>> >> >> >> >| |
>> >> >> >> >| Dept of Geography Office: 1155 Derby Hall |
>> >> >> >> >| 1036 Derby Hall Phone : (614) 292-2704 |
>> >> >> >> >| Columbus, OH 43210 Fax : (614) 292-6213 |
>> >> >> >> >================================================
>> >> >> >> >
>> >> >> >>
>> >> >> >> *******************************************************************
> ***
>> > ***
>> >> > ***
>> >> >> >> Unidata User Support UCAR Unidat
> a P
>> > rog
>> >> > ram
>> >> >> >> (303)497-8644 P.O.
> Bo
>> > x 3
>> >> > 000
>> >> >> >> address@hidden Boulder,
> CO
>> > 80
>> >> > 307
>> >> >> >> -------------------------------------------------------------------
> ---
>> > ---
>> >> > ---
>> >> >> >> Unidata WWW Service http://www.unidata.ucar.
> edu
>> > /
>> >> >
>> >> >> >> *******************************************************************
> ***
>> > ***
>> >> > ***
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >> **********************************************************************
> ***
>> > ***
>> >> >> Unidata User Support UCAR Unidata P
> rog
>> > ram
>> >> >> (303)497-8644 P.O. Bo
> x 3
>> > 000
>> >> >> address@hidden Boulder, CO
> 80
>> > 307
>> >> >> ----------------------------------------------------------------------
> ---
>> > ---
>> >> >> Unidata WWW Service http://www.unidata.ucar.edu
> /
>> >
>> >> >> **********************************************************************
> ***
>> > ***
>> >> >>
>> >> >
>> >>
>> >> *************************************************************************
> ***
>> >> Unidata User Support UCAR Unidata Prog
> ram
>> >> (303)497-8644 P.O. Box 3
> 000
>> >> address@hidden Boulder, CO 80
> 307
>> >> -------------------------------------------------------------------------
> ---
>> >> Unidata WWW Service http://www.unidata.ucar.edu/
>
>> >> *************************************************************************
> ***
>> >>
>> >
>>
>> ****************************************************************************
>> Unidata User Support UCAR Unidata Program
>> (303)497-8644 P.O. Box 3000
>> address@hidden Boulder, CO 80307
>> ----------------------------------------------------------------------------
>> Unidata WWW Service http://www.unidata.ucar.edu/
>> ****************************************************************************
>>
>