[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
20030521: ldm-mcidas decoding of Unidata-Wisconsin images on Tru64 at USU
- Subject: 20030521: ldm-mcidas decoding of Unidata-Wisconsin images on Tru64 at USU
- Date: Fri, 23 May 2003 07:53:42 -0600
>From: "Dan A. Dansereau" <address@hidden>
>Organization: USU
>Keywords: 200305201449.h4KEnaLd000625 McIDAS-X 2002 ldm-mcidas Tru64
Hi Dan,
re: why images keep getting decoded into the same output AREA
> Thanks - I'm at the wrong end of a rope!
Well, I think I just joined you at the end of that rope!
>P.S. I really want to know what I did or did not do!
The decoding on climatemine is working now, but I am not sure what I did to
get it to this point (if anything). Here is what I did:
- mucked around with the directory permissions under /var/data. I
had found that /var/data was owned by root, and that 'mcidas' couldn't
change the permissions on ROUTE.SYS and SYSKEY.TAB in /var/data/mcidas.
This doesn't make much sense since 'mcidas' owned the files I was
trying to change permission on!
- changed permissions on ROUTE.SYS and SYSKEY.TAB to rw-rw-r; they
were rwx-rwx-rwx
- created the ~ldm/mcidas/data directory
- brought over the source for ldm-mcidas v2002b and built the package
from source. I then added more and more debug output to pnga2area.c
and pngsubs.c to try and get a handle on what was happening. During
this process, I changed one line of code that compares two strings.
The check that was there checked the first two characters, and I
changed it to check 4 characters. This _should not_ have had
any effect on the decoding of images
When I left work yesterday afternoon, the decoding was not working
correctly. The decoder, pnga2area, would read the routing table,
ROUTE.SYS, to get the last AREA number that an image of the type being
processed was stored in. What it was not doing -- for some unknown
reason -- was incrementing that number by 1 so that the new image would
be decoded into a different AREA (this was the crux of the problem,
btw).
The debug statements that I added were to find out why that output AREA
number was not being incremented. I suspected that the code _was_
actually incrementing the number, but the information was not getting
written back to the routing table for some reason (hence the mucking
with file/directory permissions). This would cause the decoder to
think that the image being processed was the first one ever received,
so new images would keep getting decoded into the same AREA numbers.
After I got home after dinner, I logged back onto climatemine and found
to my great surprise that there were multiple images of each type on
disk indicating that the information was successfully getting written
back to the routing table. The only thing I changed just before
leaving work was the permissions on the ~ldm directory itself. It had
been rwx--, and I changed it to be rwxrwxr--. I did not expect that
this would make any difference, but, combined with the creation of the
~ldm/mcidas/data directory, it might have. In fact, since all of the
debug statements and the one line change was in place before changing
the ~ldm directory permissions and decoding was not working correctly,
and then decoding started working after my change of the directory
permissions, this is the only thing that could have made things start
working (unless you did something different to the OS in the interim).
The _REALLY_ puzzling thing for me is that the compositing of GOES-East
and West images _was_ working throughout this entire process and the
routing table was getting updated to reflect those changes. This
means that the the processes being run by 'ldm' had to be able to
write to the routing table. This all gives me a headache, and makes
me feel that I am at the end of that rope with you :-(
Let's move on. I some more things on climatemine that had nothing
to do with the decoding, but did have a lot to do with keeping
things running.
1) Your /var file system ran out of room while I was
working. I recognized this since I got a message while editing
using vi. I changed the number of days of GRID data being kept online
by modifying ~ldm/decoders/mcscour.sh and by deleting by hand all
GRID files in /var/data/xcd that were one day old.
2) I added a cron entry to rotate the ldm-mcidas.log files. I did this
since ~ldm/logs/ldm-mcidas was getting excessively large (the size
before rotation gred to 1.7 MB).
3) there were a number of orphaned shared memory segments (indicated
by running 'ipcs') and associated subdirectories in ~ldm/.mctmp.
These were created by McIDAS processes (like compositing of East and
West images), but were not removed for some reason when the processes
exited. I removed those segments (using 'ipcrm -m <segno>') and
the .mctmp subdirectories (using 'rm') while I had the LDM shut down
(important to not do this while the LDM is running since you might
be deleting a segment/directory that is in use)
4) while I was on climatemine, I took the opportunity to upgrade the
LDM to LDM-6.0.11. I did this to see if it eliminated a problem
which I mention below.
Some observations:
- you are currently decoding imagery into /var/data/mcidas and XCD files
into /var/data/xcd. I recommend combining the output directories
so that everything goes into /var/data/mcidas. The reason for this
is that with the current setup (that works), you have to have copies
of SCHEMA, ROUTE.SYS, and SYSKEY.TAB in both of these directories AND
really the copies of ROUTE.SYS and SYSKEY.TAB should be the same.
The only way to do that now is to have ROUTE.SYS and SYSKEY.TAB
in one directory and then make links to those copies in the other
directory. It is just simplier in the long run to combine the
output directories.
- I am seeing a mysterious memory fault when running 'ldmadmin pqactcheck':
sh: 134600 Memory fault
The error is causing a core dump of pqact when the limit on coredump
is changed from its default size of 0 to unlimited. This memory fault
is associated with the ldmadmin action that checks the pqact.conf
file's use of /dev/null. pqact is running normally when processing
actions from ~ldm/etc/pqact.conf, so there is no urgent need to find
out what the problem is. I don't understand this memory problem, but
I think that it must be looked into fairly soon. I suspect that it
has something to do with an OS configuration/permission.
Further investigations:
- I want to continue to try to understand why the ldm-mcidas image decoding
was not working correctly, and what actually changed to make it start
working. With your permission, I will continue to logon to climatemine
over the next few days to poke around,
- We need to understand what is causing the memory fault problem when
running 'ldmadmin pqactcheck'.
Lastly, I am hoping that you will upgrade the LDM on allegan to 6.0.11
or, if it gets cut, 6.0.12 today, this weekend, or Monday.
I have got to run right now...
Tom
>From address@hidden Fri May 23 09:58:04 2003
Tom
I have not changed a thing on the OS, so
some of your magic must of worked, however -
all of the composites (mdrtopo, gwvistopo, gew-vis)
are now blank/black.
Anyway FEEL FREE to logon, and do whatever is needed
to fix this thing! And - what can I do to help??, or
payback you/unidata for your help??
Dan