[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20020513: scouring McIDAS data files (cont.)
- Subject: 20020513: scouring McIDAS data files (cont.)
- Date: Mon, 13 May 2002 11:09:01 -0600
>From: Gilbert Sebenste <address@hidden>
>Organization: NIU
>Keywords: 200205130500.g4D50Ta13471 McIDAS mcscour.sh
Gilbert,
re: scouring setup
>You already had this set up to do this.
OK, and it was working.
re: typical location for mcscour.sh
>Yep. It's not working. It returns with a command prompt immediately with
>no errors!
This indicates a lack of the system resource needed to run the McIDAS
commands.
>> <login as 'mcidas'>
>> cd workdata
>> decinfo.k SET DMGRID INACTIVE
>
>I get an error message that says "decinfo.k: Cannot create negative UC
This confirms the lack of the system resource needed to run a McIDAS
program.
>No dice, Tom. Any other hints? Disk space still zero, and I need my
>machine back!
Given the failure to create negative UC message, I decided to login
to see what was happening. What I found was that your system had
run out of interprocess communication resources. This was caused by
_lots_ of shared memory segments ( > 150) being allocated to and owned
by 'ldm'. There were also a few ( < 10) allocated to and owned by
'mcidas'. Given that the system had no resources to run new McIDAS
programs, it is no wonder that disk scouring stopped working.
The fix for all of this was:
o stop the LDM
o remove all shared memory segments allocated by 'ldm' for McIDAS
related activities (XCD decoding)
o remove all of the temporary directories used by 'ldm' when
McIDAS programs are executed:
cd ~ldm/.mctmp
rm -rf *
o remove all of the GRID files created by XCD:
cd /data/mcidas
rm -f GRID*
(lots of these files had zero lengths)
o check on available disk space (32 GB after the above cleanup)
o restart the LDM
As I write this, McIDAS-XCD is once again decoding data, and there is plenty
of disk space:
weather2-niu ldm-48> df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/hda2 38314668 4188776 32179620 12% /
/dev/hda1 46636 8727 35501 20% /boot
Why your system initially went into a tailspin I cannot say. I can say
that after the resources needed to run McIDAS programs were exhausted,
it was pretty much inevitable that you would run out of disk space.
The reason is that the McIDAS-XCD GRID decoder never exits, so it will
keep running. The LDM processes (rpc.ldmd) also never exit (they
shouldn't, that is), so GRID data files keep getting written and stop
getting scoured.
When things are running correctly, you should see a small number
of subdirectories of ~ldm/.mctmp, and a small number of IPCs allocated
to 'ldm' for McIDAS activities. Here is how things look at this
moment:
weather2-niu ldm-52> ls -alt ~ldm/.mctmp
ls: unparsable value for LS_COLORS environment variable
total 20
drwx------ 5 ldm users 4096 May 13 12:02 ./
drwx------ 2 ldm users 4096 May 13 11:49 50888713/
drwx------ 2 ldm users 4096 May 13 11:49 50954251/
drwx------ 2 ldm users 4096 May 13 11:49 50987020/
drwxr-xr-x 24 ldm users 4096 May 13 11:49 ../
weather2-niu ldm-51> ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 294914 ldm 777 196608 2 dest
0x00000000 983043 ldm 777 196608 2 dest
0x00000000 1048580 ldm 777 196608 2 dest
0x00000000 1179653 ldm 777 196608 2 dest
0x00000000 1245190 ldm 777 196608 2 dest
0x00000000 3244039 ldm 777 196608 2 dest
0x00000000 3309576 ldm 777 196608 2 dest
0x00000000 50888713 ldm 600 384300 7
0x00000000 50921482 ldm 600 512000 0
0x00000000 50954251 ldm 600 384300 2
0x00000000 50987020 ldm 600 384300 2
0x00000000 51019789 ldm 600 512000 0
0x00000000 51052558 ldm 600 512000 0
The number of subdirectories of ~ldm/.mctmp will grow and shrink as
McIDAS PostProcess BATCH files are run upon receipt of Unidata-Wisconsin
(LDM feedtype MCIDAS) imagery. The same is true for IPCs.
Given that the problem recently occurred, I recommend that you
occasionally do a quick check of IPCs use by 'ldm' (run the same
commands as I included above). If you see lots of shared memory
segments (under the 'shmid' column above), or if you see lots of
subdirectories under ~ldm/.mctmp, run through the cleanup procedure
that I illustarted above to clean things up.
>From address@hidden Mon May 13 09:26:57 2002
>Subject: Re: 20020513: scouring McIDAS data files
>Grid 5002 is 1.5 GB in size. It appears as though it is scouring,
It wasn't.
>BUT something is not right. I don't know what it is.
weather2 ran out of IPCs.
Tom
>From address@hidden Mon May 13 11:14:08 2002
>Subject: Re: 20020513: scouring McIDAS data files (cont.)
Hi Tom,
re: you now have disk space aplenty
> weather2-niu ldm-48> df -k
> Filesystem 1k-blocks Used Available Use% Mounted on
> /dev/hda2 38314668 4188776 32179620 12% /
> /dev/hda1 46636 8727 35501 20% /boot
Excellent.
re: why your system went into a tailspin
>It is weird. I checked weather.admin; the same tailspin didn't occur
>there. Then again, I do have grid processing turned off.
re: The number of subdirectories of ~ldm/.mctmp will grow and shrink ...
>OK.
re: recommend keeping an eye on things
>Gotcha. Thanks much again for the help, and now I understand! Never had
>this problem before, and I didn't have a clue why it was filling up.
>Take care, I'll keep an eye on this!
*******************************************************************************
Gilbert Sebenste ********
Internet: address@hidden (My opinions only!) ******
Staff Meteorologist, Northern Illinois University ****
E-mail: address@hidden ***
web: http://weather.admin.niu.edu **
Work phone: 815-753-5492 *
*******************************************************************************