[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20000923: Sunset.meteor.wisc.edu major problems
- Subject: Re: 20000923: Sunset.meteor.wisc.edu major problems
- Date: Mon, 25 Sep 2000 17:14:20 -0600
Below is the set of messages relevant to the diagnosis of a problem
with LDM 5.1.2 pqcreate resulting in a SIGBUS error on SGI/IRIX 32-bit
platforms for certain combinations of queue size and number of
products.
--Russ
To: address@hidden, address@hidden
From: address@hidden (Pete Pokrandt)
Reply-to: address@hidden
Subject: Sunset.meteor.wisc.edu major problems
Date: Sat, 23 Sep 2000 11:10:50 -0500
>To: address@hidden
>From: address@hidden (Pete Pokrandt)
>Subject: Re: 20000923: Sunset.meteor.wisc.edu major problems
>Organization: Dept of Atmos & Oceanic Sciences, University of
Wisconsin-Madison
>Keywords: sigbus, bus error, SGI/IRIX
Hi all,
Anyone feeding from sunset.meteor.wisc.edu, please fail over
to your backup until further notice. I'm having major problems
with the ldm and/or machine crashing regularly. I suspect either
a bad disk or perhaps a memory problem, but I can't go in
to deal with it right now, since there's a UW/Northwestern Football
game happening 2 blocks away from our building.
I'll try to get in tonight to have a look and try to see what's
going on. If it looks like an extended outage, I'll try to
get everyone set up on profhorn.meteor.wisc.edu as a backup.
Unidata support: can you verify that profhorn.meteor.wisc.edu
is allowed to feed from motherlode? And if not, can it be
added until I figure out what's up with sunset? Thanks.
Sorry for the hassles..
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: address@hidden
Subject: Re: Sunset.meteor.wisc.edu major problems
Date: Sat, 23 Sep 2000 10:55:58 -0600
From: Russ Rew <address@hidden>
Hi Pete,
> Unidata support: can you verify that profhorn.meteor.wisc.edu
> is allowed to feed from motherlode? And if not, can it be
> added until I figure out what's up with sunset? Thanks.
I've verified that you should be able to feed from motherlode, because
it's ldmd.conf contains the following line:
allow UNIDATA|FSL2 ^(sunset|profhorn)\.meteor\.wisc\.edu$
--Russ
To: address@hidden, address@hidden
From: address@hidden (Pete Pokrandt)
Subject: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
Date: Sat, 23 Sep 2000 13:34:21 -0500
Hi all,
Thanks to Unidata support, working hard on a Saturday, anyone who
normally feeds from sunset.meteor.wisc.edu can instead feed from
profhorn.meteor.wisc.edu until sunset is fixed and happy again.
One word of caution, I'm going to be slowly piping through the data from
the UIUC archive site that I've missed since 0800 UTC, so you may end
up getting more data than you are expecting until the backlog flushes
through.
ALso, profhorn.meteor.wisc.edu is the machine that I use to also
ingest the high bandwidth NMC2 feed, so I'm not sure if the 10 mbps
line into profhorn will handle the load of everyone feeding from
it in addition to the NMC2 feed. I'll keep an eye on it and
let you all know if it seems to be a problem.
I'll be in this evening to try to figure out what's up on sunset. Very
frustrating, at first, the ldm was crashing, but now I can't even get
pqcreate to run. It dumps a core as soon as the queue file has grown to
it's complete size.. I've tried it on different disk drives as well, so
it's not a bad disk. Strange..
I'm going to try first swapping in some different RAM, and if that
doesn't work, maybe a new mother board.. Nice to just happen to have a
few spare parts lying around.. Unidata Support: does this sound to you
like a memory problem? I have not seen any bad memory info in my system
logs.
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: address@hidden
Cc: support-ldm
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
Date: Sat, 23 Sep 2000 15:44:12 -0600
From: Russ Rew <address@hidden>
Pete,
> I'll be in this evening to try to figure out what's up on sunset. Very
> frustrating, at first, the ldm was crashing, but now I can't even get
> pqcreate to run. It dumps a core as soon as the queue file has grown to
> it's complete size.. I've tried it on different disk drives as well, so
> it's not a bad disk. Strange..
Please send us (address@hidden) the command line you use to
invoke pqcreate and if possible also a traceback from when it crashes.
You can get the traceback by running it until it crashes and leaves a
"core" file, then running "dbx" (or whatever debugger you use, I'm not
sure what platform you are running this on) giving as arguments the
pqcreate executable and the core file, something like:
% dbx /usr/local/ldm/bin/pqcreate core
At this point dbx may produce a bunch of output, but when it finally
gives you a prompt, type "where" and then cut and paste the output to
me, along with how you invoked pqcreate.
Also it's just worth checking that you are creating the product queue
on a local disk rather than a remotely mounted disk. The latter won't
work, but it should give an error message rather than just dumping
core ...
> I'm going to try first swapping in some different RAM, and if that
> doesn't work, maybe a new mother board.. Nice to just happen to have a
> few spare parts lying around.. Unidata Support: does this sound to you
> like a memory problem? I have not seen any bad memory info in my system
> logs.
Good luck. It doesn't sound like a memory problem to me, but I
haven't had any memory problems recently, so I'm not sure what the
symptoms would be. The system should do a memory check when you
reboot it, which should catch most memory errors.
--Russ
To: Russ Rew <address@hidden>
cc: address@hidden
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
In-reply-to: Your message of "Sat, 23 Sep 2000 15:44:12 MDT."
<address@hidden>
Date: Sat, 23 Sep 2000 17:53:07 -0500
From: Pete Pokrandt <address@hidden>
Russ,
This is the same exact setup that has been running mostly flawlessly
for months. Every so often the ldm will die, usually seems to be
related to an increase in the data volume. Usually deleting and
re-making the queue will solve the problem and it'll run for weeks
with no problems.
Yesterday the ldm crashed, so I redid the queue and restarted, then
last night the machine hung, so I rebooted, redid the queue and
started again. It ran for about 1/2 hour and died, so I redid
the queue again, then again.. you get the picture.. Then this
morning after another reboot I tried again to make the queue and
started getting the core dumps.
One strange thing is, it works ok for a 2.5 Mb (yes, that small, I've
tried lots of things :) queue, but 5 Mb, 25 Mb, 250 Mb, 400 Mb, and
600 Mb (my normal queue size as of late) all dump a core.
It is on a local disk, not an nfs mounted one.
I'm running on an SGI R4000 with IRIX 6.5, 192 Mb of RAM, roughly 200 Mb of
swap
As for starting it, I'm just running a normal ldmadmin mkqueue.
I believe the command that it is spawning is:
pqcreate -q /usr2/ldm/ldm.pq -s 25000000
sunset 10% ldmadmin mkqueue
Sep 23 22:43:37 UTC sunset.meteor.wisc.edu : make_pq: mkqueue failed
Here's the output from dbx:
unset 31% dbx /usr/local/ldm/bin/pqcreate core
dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
Core from signal SIGBUS: Bus error
(dbx) where
> 0 sx_init(sx = 0x5833a64, nalloc = 6103)
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":2200, 0x1000a418]
1 ctl_init(pq = 0x10033fe0, align = 8)
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":3783, 0x1000ec4c]
2 pq_create(path = 0x7fff3012 = "/usr2/ldm/ldm.pq", mode = 438, pflags =
0, align = 8, initialsz = 25000000, nproducts = 6103, pqp = 0x7fff2dec)
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":4306, 0x100105c4]
3 main(ac = 7, av = 0x7fff2e64)
["/usr/local/ldm/ldm-5.1.2/src/pqcreate/pqcreate.c":186, 0x10003790]
4 __start()
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177,
0x10003184]
Let me know if this helps at all, I'm still plannign to go in tonight
to start swapping hardware to see if that makes a difference.
You think maybe I should recompile the ldm? Perhaps some of the binaries
got fu-bar'd somehow?
Thanks for the help!
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
Date: Sat, 23 Sep 2000 19:43:32 -0600 (MDT)
From: Steve Chiswell <address@hidden>
To: Pete Pokrandt <address@hidden>
cc: address@hidden, address@hidden
Subject: 20000923: sunset downstream sites may feedfrom
profhorn.meteor.wisc.edu
In-Reply-To: <address@hidden>
Pete,
pqcreate would core dump if you ran out of disk space while trying to create
the
queue.....or if creating the queue was excercising some bad disk blocks.
Assuming you have plenty of disk space, you might want to try the format
utility to test the disk for bad sectors - and map them out if found.
Steve Chiswell
To: Steve Chiswell <address@hidden>
cc: address@hidden, address@hidden
Subject: Re: 20000923: sunset downstream sites may feedfrom
profhorn.meteor.wisc.edu
In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
<address@hidden>
Date: Sat, 23 Sep 2000 21:00:15 -0500
From: Pete Pokrandt <address@hidden>
In a previous message to me, you wrote:
>
>
>Pete,
>
>pqcreate would core dump if you ran out of disk space while trying to
create the
>queue.....or if creating the queue was excercising some bad disk blocks.
>Assuming you have plenty of disk space, you might want to try the format
>utility to test the disk for bad sectors - and map them out if found.
>
>
>Steve Chiswell
>
Steve,
The disk is not full, and in fact I have tried it on more than one
disk, and get the same results on both. I'll try it on a third and
see if it still happens.
I also just recompiled the ldm, I'll see if that makes any difference.
Thanks,
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Steve Chiswell <address@hidden>
cc: address@hidden, address@hidden
Subject: Re: 20000923: sunset downstream sites may feedfrom
profhorn.meteor.wisc.edu
In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
<address@hidden>
Date: Sat, 23 Sep 2000 21:15:07 -0500
From: Pete Pokrandt <address@hidden>
Steve and all,
Recompiled the ldm, still dumps core.
Tried to build the queue on yet a third disk, still dumps core.
Here's the stack from dbx on the pqcreate core file:
sunset 18% dbx ~/bin/pqcreate core
dbx version 7.2.1 patch 2991 May 14 1998 17:09:10
where
Core from signal SIGBUS: Bus error
(dbx) > 0 sx_init(sx = 0x5833a64, nalloc = 6103)
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":2200, 0x1000a418]
1 ctl_init(pq = 0x10033fe0, align = 8)
["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":3783, 0x1000ec4c]
2 pq_create(path = 0x7fff300c = "/cool.pretty/ldm/ldm.pq", mode = 438,
pflags = 1, align = 8, initialsz = 25000000, nproducts = 6103, pqp =
0x7fff2dec) ["/usr/local/ldm/ldm-5.1.2/src/pq/pq.c":4306, 0x100105c4]
3 main(ac = 5, av = 0x7fff2e64)
["/usr/local/ldm/ldm-5.1.2/src/pqcreate/pqcreate.c":186, 0x10003790]
4 __start()
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M3/csu/crt1text.s":177,
0x10003184]
(dbx)
If I make the queue size small enough - rediculously small,
280000 bytes, then it is successful.
Check out this sequence of pqcreate commands (I deleted the ldm.pq in
between each one from a different window):
sunset 22% pqcreate -q /cool.pretty/ldm/ldm.pq -v -f -s 250000
Creating /cool.pretty/ldm/ldm.pq, 250000 bytes, 61 products.
pqcreate: create "/cool.pretty/ldm/ldm.pq" failed: File exists
sunset 23% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 400000
Creating /cool.pretty/ldm/ldm.pq, 400000 bytes, 97 products.
Bus error (core dumped)
sunset 24% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 300000
Creating /cool.pretty/ldm/ldm.pq, 300000 bytes, 73 products.
Bus error (core dumped)
sunset 25% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 260000
Creating /cool.pretty/ldm/ldm.pq, 260000 bytes, 63 products.
sunset 26% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 270000
Creating /cool.pretty/ldm/ldm.pq, 270000 bytes, 65 products.
sunset 27% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 280000
Creating /cool.pretty/ldm/ldm.pq, 280000 bytes, 68 products.
Bus error (core dumped)
sunset 28% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 275000
Creating /cool.pretty/ldm/ldm.pq, 275000 bytes, 67 products.
sunset 29% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 278000
Creating /cool.pretty/ldm/ldm.pq, 278000 bytes, 67 products.
sunset 30% pqcreate -q /cool.pretty/ldm/ldm.pq -v -s 279000
Creating /cool.pretty/ldm/ldm.pq, 279000 bytes, 68 products.
Bus error (core dumped)
For whatever reason, 67 products is ok, but 68 is a no-go.
The exact same behavior is exhibited no matter what local disk
I try to create the queue on:
sunset 32% pqcreate -q /usr2/ldm/ldm.pq -v -s 278000
Creating /usr2/ldm/ldm.pq, 278000 bytes, 67 products.
sunset 33% pqcreate -q /usr2/ldm/ldm.pq -v -s 279000
Creating /usr2/ldm/ldm.pq, 279000 bytes, 68 products.
Bus error (core dumped)
I'm really stumped here.. could it be something with the
memory mapping? In all cases, it seems to create the entire
length of the file, and right at the very end, when the queue
size is almost at, or at it's proper size, that's when the core
dump occurs.
I'm going to swap in some different RAM and if that doesn't work,
a new mother board, to see if either of those make any difference.
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Steve Chiswell <address@hidden>
cc: address@hidden, address@hidden
Subject: Re: 20000923: sunset downstream sites may feedfrom
profhorn.meteor.wisc.edu
In-reply-to: Your message of "Sat, 23 Sep 2000 19:43:32 MDT."
<address@hidden>
Date: Sat, 23 Sep 2000 21:27:19 -0500
From: Pete Pokrandt <address@hidden>
In a previous message to me, you wrote:
>
>
>Pete,
>
>pqcreate would core dump if you ran out of disk space while trying to
create the
>queue.....or if creating the queue was excercising some bad disk blocks.
>Assuming you have plenty of disk space, you might want to try the format
>utility to test the disk for bad sectors - and map them out if found.
>
>
>Steve Chiswell
>
Steve and all,
Update number n+3.. new motherboard, new memory, same problem. Still
dumping core. I suppose it is possible that all 3 of the disks that
I am running this on have bad blocks on them, I'll give the format
util a try and see if I can find anything along those lines.
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Pete Pokrandt <address@hidden>
Cc: support-ldm, chiz, rkambic
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
Date: Sat, 23 Sep 2000 23:41:23 -0600
From: Russ Rew <address@hidden>
Pete,
Sorry to hear replacing the memory and the other things you've tried
haven't fixed the problem. The dbx traceback you sent showing the bus
seems to indicate an alignment problem, as if something is being
stored at an address that is not properly aligned for the type of data
that is stored there, for example trying to store a 32-bit integer at
an odd byte address.
I can't remember seeing anything quite like that, and I couldn't
reproduce the problem on an SGI/IRIX 6.5 platform here.
Your experiment with changing queue sizes to show that 67 products
works but 68 products doesn't leads me to believe you might be able to
explicitly set the number of products to a larger number using the
"-S" option to pqcreate. While you're at it, you should probably be
using the "-c" (clobber) option as well, so you don't have to manually
delete the queue each time before you create a new one.
pqcreate just divides the queue size by 4096 to get the number of
product slots to use, but you can specify a different number with the
-S option, something like:
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
for example to make the queue have 6101 product slots instead of 6103.
If you played around with this, you might find a value that worked
with a large queue and there might be a pattern to the bus errors that
depends on the number of product slots.
This is pure speculation since I can't reproduce the problem, but
maybe you are compiling with a compiler flag or optimization level
that changes the alignment restrictions. For example, if you set the
highest level of optimization when compiling, maybe that requires
strict alignment, whereas if you don't specify optimization but
instead use the debugging flag "-g", looser alignment works.
I'm afraid I'll have to wait until Monday to pursue this, but a little
more information might help:
- Do you have the CFLAGS environment variable set when you build the
LDM? If so, what value?
- Is this the first time you've tried LDM 5.1.2 on this SGI/IRIX
platform (sunset)? If so, what version were you running with
successfully before?
- What kind of platform is profhorn? Are you using LDM 5.1.2 on it?
You may have found a platform-specific bug in LDM 5.1.2, but until we
can reproduce it, we'll have trouble fixing it ...
--Russ
To: Russ Rew <address@hidden>
cc: address@hidden
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
<address@hidden>
Date: Mon, 25 Sep 2000 14:11:22 -0500
From: Pete Pokrandt <address@hidden>
In a previous message to me, you wrote:
>Pete,
>
>> I'm running on an SGI R4000 with IRIX 6.5, 192 Mb of RAM, roughly 200 Mb
of
> swap
>
>> Recompiled the ldm, still dumps core.
>
>Could you please try using our precompiled binary for SGI/IRIX
>platforms on sunset, instead of what you compiled? Maybe just use
>pqcreate out of our binary to see if it fails the same way yours
>does. This would eliminate a lot of the possible sources of problems,
>such as which compiler with which flags and libraries you used to
>build LDM 5.1.2.
>
Russ,
Your pqcreate also dumps core. Also rebuilt the kernel and
no luck.
>
>Also, did you get the message I sent Saturday night? If not, I've
>appended another copy.
Yes, but in the flurry of things I was trying I totally forgot about
it. I'll go through that now and get back to you.
Thanks again,
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Russ Rew <address@hidden>
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
<address@hidden>
Date: Mon, 25 Sep 2000 14:57:07 -0500
From: Pete Pokrandt <address@hidden>
In a previous message to me, you wrote:
>
> Pete,
>
>
> Your experiment with changing queue sizes to show that 67 products
> works but 68 products doesn't leads me to believe you might be able to
> explicitly set the number of products to a larger number using the
> "-S" option to pqcreate. While you're at it, you should probably be
> using the "-c" (clobber) option as well, so you don't have to manually
> delete the queue each time before you create a new one.
>
> pqcreate just divides the queue size by 4096 to get the number of
> product slots to use, but you can specify a different number with the
> -S option, something like:
>
> pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
>
> for example to make the queue have 6101 product slots instead of 6103.
> If you played around with this, you might find a value that worked
> with a large queue and there might be a pattern to the bus errors that
> depends on the number of product slots.
Russ,
sunset 35% pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
Creating /cool.pretty/ldm/ldm.pq, 25000000 bytes, 6101 products.
No core dump (woohoo!).
I'll play around with it some more and see if that queue actually
works with the ldm.. Shouldn't be any reason why it wouldn't, right?
> I'm afraid I'll have to wait until Monday to pursue this, but a little
> more information might help:
>
> - Do you have the CFLAGS environment variable set when you build the
> LDM? If so, what value?
Shouldn't be, I'm just running with a straight ./configure with
no CFLAGS env variable set.
>
> - Is this the first time you've tried LDM 5.1.2 on this SGI/IRIX
> platform (sunset)? If so, what version were you running with
> successfully before?
I have been running ldm-5.1.2 on sunset since Sept 2, and a beta
version before that since August 4. Both ran just fine up until
Friday. That's the most bizzare part of this, I didn't change
anything, it just stopped working.. Kinda scary.
>
> - What kind of platform is profhorn? Are you using LDM 5.1.2 on it?
profhorn is RedHat Linux
Red Hat Linux Red Hat Linux release 6.1 (Cartman)
Kernel 2.2.14 on an i686
It is running ldm-5.1.2 beta1 since August 7.
>
> You may have found a platform-specific bug in LDM 5.1.2, but until we
> can reproduce it, we'll have trouble fixing it ...
>
> --Russ
>
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Russ Rew <address@hidden>
cc: address@hidden
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
In-reply-to: Your message of "Mon, 25 Sep 2000 12:44:23 MDT."
<address@hidden>
Date: Mon, 25 Sep 2000 16:03:57 -0500
From: Pete Pokrandt <address@hidden>
Russ,
I have been playing a bit more with the queue sizes.. It seems
that you are correct, that only certain values for the number
of products work.
I have had success with these:
-----
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6101
(where the default would have been 6103)
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6099
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6098
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6097
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6094
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6093
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6089
and
pqcreate -c -q /usr3/ldm/data/ldm.pq -v -s 650000000 -S 158689
(where the default would have been 158691)
-----
The following all failed:
Default:
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000
Creating /cool.pretty/ldm/ldm.pq, 25000000 bytes, 6103 products.
Bus error (core dumped)
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6102
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6100
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6096
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6095
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6092
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6091
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6090
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6088
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6087
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6086
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6085
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6084
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6083
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6082
pqcreate -c -q /cool.pretty/ldm/ldm.pq -v -s 25000000 -S 6081
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+
To: Russ Rew <address@hidden>
Subject: Re: sunset downstream sites may feedfrom profhorn.meteor.wisc.edu
In-reply-to: Your message of "Mon, 25 Sep 2000 15:56:31 MDT."
<address@hidden>
Date: Mon, 25 Sep 2000 17:05:48 -0500
From: Pete Pokrandt <address@hidden>
In a previous message to me, you wrote:
>Pete,
>
>Thanks for trying our binary and for reporting back on LDM 5.1.2
>pqcreate values that worked and the ones that caused bus errors on
>SGI/IRIX. You're the first one to report this bug, and we have now
>reproduced it here so we have a chance of fixing it. The bus error
>occurs under the following circumstances:
>
> - SGI/IRIX 32-bit platform (things seems to work fine on 64-bit IRIX
> platforms when compiled with -64 flag)
>
> - LDM 5.1.2 (things seem to work with LDM 5.1.2beta3, so this bug was
> introduced late in development)
>
> - certain values of queue size and number of products, as you have
> reported
>
>The workaround, to try different values of number of products with
>"-S" option to pqcreate, will get you going until I can deliver the
>real fix.
Russ,
Got it, yeah, I am running now with the 650 Mb queue I produced with
Default - 2 and it's running fine. I must have just been lucky
with the previous size queues I had been running with.
Glad I could help find the bug.. well, kinda.. :)
>
>I'll put some sort of announcement onto the ldm-users mailing list
>about this bug and the work-around soon.
>
>Thanks again for your persistence, and sorry we didn't catch this
>during testing ...
No problem, I'm just happy to have a solution that works..
Pete
--
+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+
^ Pete Pokrandt V 1447 AOSS Bldg 1225 W Dayton St^
^ Systems Programmer V Madison, WI 53706 ^
^ V address@hidden ^
^ Dept of Atmos & Oceanic Sciences V (608) 262-3086 (Phone/voicemail) ^
^ University of Wisconsin-Madison V 262-0166 (Fax) ^
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<+>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+