This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Dave, > I've just borrowed from a sample pqact.conf file for a GEMPAK > installation (provided by Tom Yoksas) a pattern for action in my > pqact.conf file. Rather than trigger a decoder (a la virtually every > action in the sample file), though, I'm just trying to file the data. > > A regular expression issue comes up. Here's a simplified example that > I hope illustrates my conceptual problem: > > WMO (^a)|(^b|(c|d)) .... ([0-3][0-9])([0-2][0-9]).. > FILE (\?:yy)(\?:mm)\?\(?+1)_type.wmo > > where "?" represents an integer that matches the paranthetical > expression ([0-3][0-9]) (the day of the month) and "?+1" matches the > next paranthetical expression, ([0-2][0-9]). The letters a, b, c, and > d represent strings of one or more regular expressions without > parentheses. > > The question is, what should "?" be? > > I have two conceptual uncertainties here. First, when two > parenthetical expressions are separated by "|", are the two referred > to by separate (sequential) values of \n (where n is an integer), or > are they both referred to by the same value of \n (since they > represent possibly mutually exclusive alternatives)? Two parenthetical subexpressions separated by "|" would have two different \n backreferences. > Second, when parentheses are nested, how should the expressions they > enclose be counted when determining an appropriate value of \n? Backreference \n always refers to the subexpression enclosed by the n-th unescaped left parenthesis. > In the example above, "?" could be anywhere from 2 to 4, depending on > the answers to these questions, and in one instance the number could > vary depending on which option of the highest-level "|" ("or") > structure in the example above is realized. > > The actual pattern that I'm working with is supposed to capture ship, > buoy, and CMAN data and looks like this: > > WMO (^S[IMN]V[^GINS])|^S[IMN]W[^KZ]|(^S(HV|HXX|S[^X]))|(^SX(VD|V.50| > US(2[0- > 3]|08|40|82|86)))|(^Y[HO]XX84) .... ([0-3][0-9])([0-2][0-9]).. > FILE data/surface/(\n:yy)(\n:mm)\n\m_boy.wmo > > where \n and \m are to be determined to get the date and time when the > data were recorded. There are some things wrong with the above extended regular expression. As I recall, the first field in a WMO header has six characters: four letters followed by two digits. The above ERE, however, would match, for example, "SIVA ", "SIWA ", "SHV ", "SHXX ", and "SSA " -- which don't fit the pattern of the first field of a WMO header. To simplify things, you can always break-up a complicated ERE into multiple pqact(1) entries, each one handling a subset of the complicated ERE. Regards, Steve Emmerson Ticket Details =================== Ticket ID: RJS-786355 Department: Support LDM Priority: Normal Status: Closed