[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[IDD #ZDN-294180]: [ldm-users] Pqact FILE writes with custom removal?

This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.

Subject: [IDD #ZDN-294180]: [ldm-users] Pqact FILE writes with custom removal?
Date: Thu, 29 Oct 2015 15:56:42 -0600

Hi Neil,

I drove your first post to the ldm-users email list into our inquiry tracking
system so that we could address one item in more detail than would be typically
be appropriate for a list...

re:
> Scouring 10 days of online, or even nearline, NIDS and NEXRCOMP being held 
> for a
> teaching environment is a big headache.

Scouring of the NEXRAD Level III (aka NIDS) data has always been a huge 
challenge.
Scouring of the NEXRCOMP images from the FNEXRAD datastream is far easier as
the number of images that have to be dealt with is relatively small.

re:
> Donna Cote has recently polled the community about this issue.
> 
> At this file count, crawling thru the millions of inodes with a scheduled 
> find command
> will not keep up with the accumulation rate. (sounds like the intersection of 
> two
> curves, beyond which there is no return; would anyone care to derive that 
> solution?)

I would agree that this can be true for the NEXRAD Level III images give the
number of NEXRADs there are and the number of products that are issued for each
radar as frequently as every 5 minutes.

re:
> This leads me to wonder if one can construct a pqact entry that will do the 
> FILE or
> STDIOFILE action on the incoming product plus remove the same product of the 
> same date
> minus a desired day count.  Or the remove action, -exec, could be a separate 
> pqact entry
> on the same product regular expression minus the day count, just so it’s done 
> at the
> time of the new product arrival.

I wrote a Bourne shell script that gets piped products from a 'pqact' invocation
and then writes those products to disk, logs the receipt of the product, and 
then
"scours" the products kept to a user-defined number.  We use this script for 
data
feeds that have tractable number of products, e.g., FNEXRAD, NIMAGE, UNIWISC.
I tried using this same approach for the NEXRAD Level III products, but it could
not keep up with the number of products being received even though tat test was
made a number of years ago when there were way fewer numbers of Level III 
products
to deal.  The reason for the failure was that PIPEing a product to a script is a
very heavy weight process since the script interpreter has to be started and
stopped for each invocation.

re:
> Is there a clever way to manipulate the syntax of
> temporal subexpression replacement described in the pqact man pages to 
> generate the
> desired filename to remove?  Would that even solve the problem?

I think that the answer to both questions is no.

re:
> Or is there a pqact solution?  Are you just left with scripting OS commands 
> (find; eg.
> scour)?

Way back when I was trying the shell script approach for FILEing and logging
the NEXRAD Level III products, we had a "bake off" between different approaches
to the scouring problem (LDM's scour, vs a Perl script, vs a Tcl script (mine)).
Just so you know, there was no real winner in the bake off, but the approach
taken by the Tcl and Perl scripts made things easier when the products were
organized into daily directories.  While looking into if there was a better way
to delete the files, and I ran across the concept of simply unlinking them.
All of the documentation that I found warned that this was a _very_ bad thing
to do, so I abandoned my investigation.

So what to do?

Apparently, organizations like Google have gotten around problems like this by
developing custom (and private) file systems for their storage uses.  This
type of approach does not fit with sites running commodity hardware and 
software,
so it just does not fit.

The only other thing is to test to see which stock file systems (e.g., EXT2-4,
XFS, ReiserFS, ZFS, etc.) handle the problem the best.  As I noted in my
post to ldm-users, we have had good luck with the ZFS file system on our
Solaris 10 SPARC machines.  We have been intending to try ZFS under Linux,
but we (our system administrator) have not had time to get to this test.
Donna, on the other hand, has moved from XFS to ZFS and seen very good
performance boosts.

Cheers,

Tom
--
****************************************************************************
Unidata User Support                                    UCAR Unidata Program
(303) 497-8642                                                 P.O. Box 3000
address@hidden                                   Boulder, CO 80307
----------------------------------------------------------------------------
Unidata HomePage                       http://www.unidata.ucar.edu
****************************************************************************


Ticket Details
===================
Ticket ID: ZDN-294180
Department: Support IDD
Priority: Normal
Status: Closed

Prev by Date: [IDD #VVO-154186]: NOAA LDM MADIS
Next by Date: [IDD #SPO-216795]: HRRR Data
Previous by thread: 20151009: service interruption for iddb.unidata.ucar.edu Saturday, October 11
Next by thread: [IDD #SPO-216795]: HRRR Data
Index(es):
- Date
- Thread