[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NetCDF/PERL and strings (or CHARs is you prefer)
- Subject: Re: NetCDF/PERL and strings (or CHARs is you prefer)
- Date: Fri, 12 Nov 1999 12:54:42 -0700 (MST)
On Fri, 12 Nov 1999, Steve Diggs wrote:
> Robb,
>
> I got your name from Steve Emmerson at Unidata.
>
> My problem, I'm sure is a common one. I'm not new to either NetCDF or
> Perl, but I've only been using the Perl/NetCDF interface for a few days.
> I'm having issues with the way that NetCDF treats string data.
>
> For instance, I might have an array of scalars that looks like this:
>
> @atmospheric_conditions = qw ( cloudy sunny fog rain );
> @count_ac = ($#atmospheric_conditions);
>
> This is close to my actual problem, so if I decide to write this arrayof
> atmospeheric conditions to a NetCDF file, I would do the following:
>
> my @start = (0);
> my $varid_ac = NetCDF::vardef($ncid, 'ATM_COND', NetCDF::CHAR, $dimid);
>
> # .. then I leave define mode and write the data out
>
> NetCDF::varput($ncid, $varid_ac, \@start, \@count_ac,
> \@atmospheric_conditions);
>
> I will only get back: 'clo' (the 1st 3 chars from the 1st element of the
> array above). What's the deal? How on earth do I encode variable length
> strings in Perl arrays into NetCDF CHAR arrays?
Steve,
NetCDF doesn't handle variable length strings, you need to pad all strings
with nulls. The cdl files define the variable type and scope for all the
writes. If you download the decoders package and look at the netCDF perl
decoders as examples. I will also send an attachment example.
Better yet, if I give
> this outout NetCDF data set to someone unfamiliar with the data, what
> gyrations do they have to go through in NetCDF/Perl to extract the values
> correctly?
It would be better if you wrote the extraction process using NetCDF perl
or the NetCDF library. This package is not the easiest to use but once you
have some working examples then it make sense.
Robb...
>
> I'm sure that you've answered this question before, so a solid example or
> really good documentation would be sufficient. BTW, where isn't it
> apparent in the NetCDF-Perl documentation that the weirdness above happens
> and what to do about it?
>
> thanks,
> -sd
>
> p.s. I'll send you a piece of code that illustrates this behaviour in my
> next message to you.
>
> --
> --------------------------------------------------------------------
> Steve Diggs Voice: (858)534-1108
> Scripps Institution of Oceanography FAX : (858)534-7383
> WOCE Hydrographic Program Office/STS EMAIL: address@hidden
> 9500 Gilman Drive WWW : whpo.ucsd.edu
> La Jolla, CA 92093-0214
> --------------------------------------------------------------------
>
===============================================================================
Robb Kambic Unidata Program Center
Software Engineer III Univ. Corp for Atmospheric Research
address@hidden WWW: http://www.unidata.ucar.edu/
===============================================================================
#!/opt/bin/perl
#
# statsproc
#
# written by: M. Baltuch
# date written: February 1997
#
# description:
# This file parses the raw stats input and writes both a netCDF file
# containing the latency data, as well as a topo file.
#
# input files: ~ftp/pub/idd/ldmstats/latency.input raw latency input
# ~/usr/local/ldm/etc/stats.conf site configuration file
#
# output files: ~ftp/pub/idd/ldmstats/topology.cur current topo data
# ~ftp/pub/idd/ldmstats/<feed><yyyymm>.nc latency data
# feed datastream
# yyyy year
# mm month number
# ~ftp/pub/idd/ldmstats/<yyyymmddhh>.stats old style stats
# yyyy year files
# mm month number
# dd day number
# hh hour number
# ~/usr/local/ldm/logs/latency.log log file
#
# Modification History:
# who when what
###############################################################################
use NetCDF;
# necessary files
$latencydir = "/home/ftp/pub/idd/ldmstats";
$idddir = "/home/idd";
$ncbindir = "/usr/local/ldm/util";
$archivedir = "/data/iddstats";
$inputfile = "$latencydir/latency.input";
$rawfile = "$latencydir/latency.input.1";
$archivefile = "$archivedir/latency.dat";
$conffile = "$idddir/etc/stats.conf";
$topofile = "$latencydir/topology.cur";
$junkfile = "$latencydir/stats.junk";
$logfile = "$idddir/logs/latency.log";
$cdlfile = "$idddir/etc/latency.cdl";
# shift the current input file
rename $inputfile, $rawfile;
system("touch $inputfile");
chmod 0664,$inputfile;
# make sure any writes to rawfile have completed
sleep(5);
# copy the raw file to the archive
#system("cat $rawfile >> $archivefile");
# open necessary files
open (RAWFILE, "$rawfile") || &bad_exit("Could not open $rawfile: $!");
open (CONFFILE, "$conffile") || &bad_exit("Could not open $conffile: $!");
open (TOPOFILE, ">> $topofile") || &bad_exit("Could not open $topofile: $!");
open (LOGFILE, ">> $logfile") || &bad_exit("Could not open $logfile: $!");
# log start
&print_log("Started");
# read in the configuration file
while (<CONFFILE>) {
($confname, $confstatus) = split (/[ \t\n]+/);
$HOSTSTATUS{$confname} = $confstatus;
}
close CONFFILE;
# main processing loop
$binfh = 'fh00';
$numproc = 0;
$nc_flag = 0;
while (<RAWFILE>) {
undef $avglat;
undef $maxlat;
if (/LDMBINSTAT/) {
$numproc++;
while(<RAWFILE>) {
if (/LDMEND/) {
last;
}
else {
($version,$host,$bin,$feed,$numprods,$numbytes,$plus) =
split(' ', $_ );
$host = lc($host);
chomp($plus);
if ($plus eq "+") {
chomp ( $_ ); # get rid of CR
chop ( $_ ); # get rid of plus sign
$add = <RAWFILE>;
$_ .= $add;
($lsttime,$origin,$avglat,$maxlat) = split(' ', $add);
($maxlat, $dummy) = split('\@',$maxlat);
$nc_flag = 1;
}
else {
&print_log("OLDSTATS $host $bin $feed");
$nc_flag = 0;
}
if ($bin !~ /^199|^200/) {
open (JUNK, ">> $junkfile") ||
&bad_exit("Could not open $junkfile: $!");
print JUNK;
close JUNK;
&print_log("BADENTRY $host $bin $feed $avglat $maxlat");
$nc_flag = 0;
next;
}
elsif (!defined $BINFILES{$bin}) {
open($binfh, ">> $latencydir/$bin.stats") ||
&bad_exit("Could not open $latencydir/$bin.stats: $!");
chmod 0664,"$latencydir/$bin.stats";
$BINFILES{$bin} = $binfh;
$binfh++;
$bfh++;
}
select($BINFILES{$bin});
# change HRS to HDS
s#HRS#HDS#;
( @F ) = split( / /, $_ );
$F[ 1 ] = lc( $F[ 1 ] );
#print $_;
print "@F";
undef( @F );
# write to the netCDF file
if ($nc_flag == 1) {
write_ncfile();
}
# check for too many file handles open
if ($bfh == 15) {
foreach $key (keys %BINFILES) {
close($BINFILES{$key});
}
undef(%BINFILES);
$bfh = 0;
}
}
}
}
elsif (/TOPOLOGY/) {
while(<RAWFILE>) {
if (/TOPOEND/) {
last;
}
else {
print TOPOFILE;
}
}
}
}
# close all stats files
foreach $key (keys %BINFILES) {
close($BINFILES{$key});
}
# close all netCDF files;
foreach $key (keys %NCFILEIDS) {
NetCDF::close($NCFILEIDS{$key});
}
# close up the rest
close RAWFILE;
close TOPOFILE;
# make sure that the topology file has the right permissions
chmod 0664,"topofile";
# log finish
&print_log("Finished: $numproc sites processed");
close LOGFILE;
# eliminate old data in stats files
chdir( "$latencydir" ) ;
@FILES = split( /[ \t\n]+/, `/bin/ls -rx *.stats` ) ;
#
for( $i = 0; $i <= 24; $i++ ) {
open( STATS, $FILES[ $i ] ) || die "could not open $FILES[ $i ]: $!\n" ;
while( <STATS> ) {
( $version, $host, $bintime, $feedtype, $number, $bytes,
$theBin, $source, $avgLat, $maxLat )
= split( ' ', $_ );
$DATA{ "$host $feedtype $source" } = $_ ;
}
close( STATS ) ;
open( STATS, ">$FILES[ $i ]" ) ||
die "could not open $FILES[ $i ]: $!\n" ;
foreach $entry ( sort keys( %DATA ) ) {
print STATS $DATA{ $entry } ;
}
close( STATS ) ;
undef( %DATA ) ;
}
# all done
exit 0;
###############################################################################
# bad exit routine
###############################################################################
sub bad_exit {
local($err_str) = @_;
local($date_str) = &get_date();
print STDERR "$date_str procstats: $err_str\n";
print LOGFILE "$date_str procstats: $err_str\n";
exit -1;
}
###############################################################################
# Date routine. Gets date and time as GMT in the same format as the LDM log
# file.
###############################################################################
sub get_date {
@month_array = (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec);
local($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
gmtime(time());
local($date_string) =
sprintf("%s %d %02d:%02d:%02d UTC", $month_array[$mon], $mday,
$hour, $min,$sec);
return $date_string;
}
###############################################################################
# Log message printing routine
###############################################################################
sub print_log {
local($msg_str) = @_;
local($date_str) = &get_date();
print LOGFILE "$date_str procstats: $msg_str\n";
}
###############################################################################
# write data to netCDF file
###############################################################################
sub write_ncfile {
if ($feed eq "NONE" || $feed eq "DOWN") {
return;
}
local($filename) = "$feed";
$filename =~ s/\Q|// ;
$filename .= substr $bin, 0, 6;
$filename .= ".nc";
local($yeardir) = substr $bin, 0, 4;
local($ncdir) = "$latencydir/$yeardir";
local($recnum) = 0;
# get the hour and day indices
$bin =~ /\d{6}(\d{2})(\d{2})/;
local($daynum) = $1;
$daynum -= 1;
local($hournum) = $2;
# do we need to make a new netCDF file?
if (! -e "$ncdir/$filename") {
# make sure the year directory exists first
if (! -e $ncdir) {
mkdir($ncdir,0775) || &bad_exit("Can't create $ncdir: $!");
}
# now create the file
system("$ncbindir/ncgen -o $ncdir/$filename $cdlfile");
}
# open the appropriate netCDF file if not already open
if (!defined $NCFILEIDS{$filename}) {
$ncid = NetCDF::open("$ncdir/$filename", WRITE);
$NCFILEIDS{$filename} = $ncid;
$numncid++;
# get the size of the record dimension
$dimid = NetCDF::dimid($ncid,"siteNum");
$nameid = "xxxxxxx";
$rsize = -1;
NetCDF::diminq($ncid, $dimid, $nameid, $rsize);
$RECSIZE{$filename} = $rsize;
# read the site_name variable to get the recnum's
$varid = NetCDF::varid($ncid,"site_name");
$SITEVARIDS{$filename} = $varid;
if ($rsize > 0) {
for ($i = 0; $i < $RECSIZE{$filename}; $i++) {
@start = ($i, 0);
@count = (1, 80);
@sitename = ("\0" x 80);
NetCDF::varget($ncid, $varid, \@start, \@count, \@sitename);
$newsite = "";
for ($j = 0; $j < 80; $j++) {
$siteChr = chr($sitename[$j]);
last if( $siteChr eq "\0" || $siteChr eq "\\" ) ;
$newsite .= $siteChr ;
}
$SITENUM{$filename}{$newsite} = $i;
}
}
# get the variable id for the avg and max latency variables, as well
# as for the host status variable
$STATUSVARIDS{$filename} = NetCDF::varid($ncid, "node_status");
$MAXVARIDS{$filename} = NetCDF::varid($ncid, "max_latency");
$AVGVARIDS{$filename} = NetCDF::varid($ncid, "avg_latency");
}
else {
$ncid = $NCFILEIDS{$filename};
}
# do we know about this site yet in this file
if (!defined $SITENUM{$filename}{$host}) {
if (!defined $HOSTSTATUS{$host}) {
&print_log("HOSTEXIST $host $bin $feed $avglat $maxlat");
return;
}
$SITENUM{$filename}{$host} = $RECSIZE{$filename};
$RECSIZE{$filename}++;
@start = ($SITENUM{$filename}{$host}, 0);
@count = (1, 80);
for ($i = 0; $i < 80; $i++) {
$padstr[$i] = '\0';
}
for ($i = 0; $i < length($host); $i++) {
$padstr[$i] = substr($host, $i, 1);
}
NetCDF::varput($ncid,$SITEVARIDS{$filename},\@start, \@count,\@padstr);
@stat_indices = ($SITENUM{$filename}{$host});
$hoststat = $HOSTSTATUS{$host} * 1;
NetCDF::varput1($ncid,$STATUSVARIDS{$filename}, \@stat_indices,
$hoststat);
}
$hindex = $SITENUM{$filename}{$host};
@indices = ($hindex,$daynum,$hournum);
$avglat *= 1.0;
$maxlat *= 1.0;
NetCDF::varput1($ncid, $AVGVARIDS{$filename}, \@indices, $avglat);
NetCDF::varput1($ncid, $MAXVARIDS{$filename}, \@indices, $maxlat);
# finally, make sure we don't have too many netCDF files open
if ($numncid == 21) {
foreach $key (keys %NCFILEIDS) {
NetCDF::close($NCFILEIDS{$key});
}
undef %NCFILEIDS;
undef %RECSIZE;
undef %SITENUM;
undef %SITEVARIDS;
undef %MAXVARIDS;
undef %AVGVARIDS;
undef %STATUSVARIDS;
$numncid = 0;
}
}
netcdf latency { // IDD latency definition
dimensions:
siteNum = UNLIMITED;
day = 31;
hour = 24;
maxNameLength = 80;
variables:
float avg_latency(siteNum, day, hour);
avg_latency:long_name = "Average Latency Report";
avg_latency:_FillValue = -99999.f;
avg_latency:units = "seconds since 1970-01-01 00 UTC";
float max_latency(siteNum, day, hour);
max_latency:long_name = "Maximum Latency Report";
max_latency:_FillValue = -99999.f;
max_latency:units = "seconds since 1970-01-01 00 UTC";
byte node_status(siteNum);
node_status:long_name = "host relay status";
node_status:_FillValue = '\377';
char site_name(siteNum, maxNameLength);
site_name:long_name = "Site Hostname Index";
:title = "Latency definition";
// general notes:
//
// Files will contain one month's worth of data for a single feed. It will be
// stored in a directory indicating the year. A configuration file will be
// kept in ~ldm/etc/stats.conf. This will provide current relay/leaf status
// for each host site and will be used when first adding a site to the data
// file.
}