[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 20050302: netCDF General - Large file problem
- Subject: Re: 20050302: netCDF General - Large file problem
- Date: Thu, 03 Mar 2005 11:26:40 -0700
>To: address@hidden
>From: "Jim Cowie" <address@hidden>
>Subject: netCDF General - Large file problem
>Organization: RAL
>Keywords: 200503022003.j22K3ZjW025634
Jim,
You wrote:
> I will include a CDL of the file below. As far as I can tell, the
> structure of these variables should allow a large file version,
> according to the restrictions on large files under the
> 3.5documentation.
It looks like the structure of your CDL violates one of the
restrictions for large files under the "classic" format, described in
the 3.5 documentation, although the netCDF library should have
returned an error when you tried to define the variable "cprob_snow"
that is the first variable to violate the format constraints:
There are important constraints on the structure of large netCDF
files that result from the 32-bit relative offsets that are part of
the netCDF file format:
If you don't use the unlimited dimension, only one variable can
exceed 2 Gbytes in size, but it can be as large as the underlying
file system permits. It must be the last variable in the dataset,
and the offset to the beginning of this variable must be less than
about 2 Gbytes. For example, the structure of the data might be
something like:
netcdf bigfile1 {
dimensions:
x=2000;
y=5000;
z=10000;
variables:
double x(x); // coordinate variables
double y(y);
double z(z);
double var(x, y, z); // 800 Gbytes
}
From your CDL, the offset to the beginning of the cprob_snow variable
is at least the size of the header plus the size of the data arrays
for all the previous variables. Ignoring the size of the header, I
just added up the sizes of the preceding data arrays with the help of
a little python script. In units of bytes, these work out to
0 type_bytes
4 forc_time
12 creation_time
20 num_sites
24 site_list
9224 T
211977224 max_T
264969224 min_T
317961224 dewpt
529929224 wind_u
741897224 wind_v
953865224 wind_speed
1165833224 cloud_cov
1377801224 visibility
1589769224 prob_fog
1801737224 prob_thunder
2013705224 cprob_rain
2225673224 cprob_snow
2437641224 cprob_ice
2649609224 prob_precip06
2861577224 prob_precip24
2914569224 qpf06
so the last five variables all have offsets larger than 2**31 =
2147483648. The netCDF 3.5.0 library should have returned an error
when you tried to define cprob_snow with an nf_def_var() call, and if
it didn't, that's a bug. I verified that the 3.6.0 library does
return an error in this case, as it should,
So this data can't all be stored in a classic format netCDF file, but
the good news is that it can probably be stored and accessed fine in a
64-bit offset format netCDF file, and it may be possible to fix the
headers of the CDF1 files to make them CDF2 files and copy the data
into the CDF2 files with no data loss.
It's also possible that the 3.5.0 library wrote values for the last
five variables in the wrong place, in which case the data could not be
recovered. With a little work, I think I could convert the file you
provided to a CDF2 file and dump out the first few values of each of
the last five variables. If I did that, could you tell if they looked
right? Do you have a lot of files in this form, so that it would be
worth trying to recover the data in this way?
--Russ
netcdf gfs00_dmos_emp.20050217.0040 {
dimensions:
max_site_num = 2300 ;
num_eqns = 30 ;
var_regressors = 3 ;
days = 16 ;
fc_times_per_day = 4 ;
daily_time = 1 ;
weight_vals = 4 ;
variables:
int type ;
type:long_name = "cdl file type" ;
double forc_time ;
forc_time:long_name = "time of earliest forecast" ;
forc_time:units = "seconds since 1970-1-1 00:00:00" ;
double creation_time ;
creation_time:long_name = "time at which forecast file was
created" ;
creation_time:units = "seconds since 1970-1-1 00:00:00" ;
int num_sites ;
num_sites:long_name = "number of actual_sites" ;
int site_list(max_site_num) ;
site_list:long_name = "forecast site list" ;
float T(max_site_num, days, fc_times_per_day, num_eqns, var_regressors,
weight_vals) ;
T:long_name = "temperature" ;
T:units = "Celsius" ;
float max_T(max_site_num, days, daily_time, num_eqns, var_regressors,
weight_vals) ;
max_T:long_name = "maximum temperature" ;
max_T:units = "Celsius" ;
float min_T(max_site_num, days, daily_time, num_eqns, var_regressors,
weight_vals) ;
min_T:long_name = "minimum temperature" ;
min_T:units = "Celsius" ;
float dewpt(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
dewpt:long_name = "dewpoint" ;
dewpt:units = "Celsius" ;
float wind_u(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
wind_u:long_name = "u-component of wind" ;
wind_u:units = "meters per second" ;
float wind_v(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
wind_v:long_name = "v-component of wind" ;
wind_v:units = "meters per second" ;
float wind_speed(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
wind_speed:long_name = "wind speed" ;
wind_speed:units = "meters per second" ;
float cloud_cov(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
cloud_cov:long_name = "cloud cover" ;
cloud_cov:units = "percent*100" ;
float visibility(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
visibility:long_name = "visibility" ;
visibility:units = "km" ;
float prob_fog(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
prob_fog:long_name = "probability of fog" ;
prob_fog:units = "percent*100" ;
float prob_thunder(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
prob_thunder:long_name = "probability of thunder" ;
prob_thunder:units = "percent*100" ;
float cprob_rain(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
cprob_rain:long_name = "conditional probability of rain" ;
cprob_rain:units = "percent*100" ;
float cprob_snow(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
cprob_snow:long_name = "conditional probability of snow" ;
cprob_snow:units = "percent*100" ;
float cprob_ice(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
cprob_ice:long_name = "conditional probability of ice" ;
cprob_ice:units = "percent*100" ;
float prob_precip06(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
prob_precip06:long_name = "probability of precipitation, 6 hr" ;
prob_precip06:units = "percent*100" ;
float prob_precip24(max_site_num, days, daily_time, num_eqns,
var_regressors, weight_vals) ;
prob_precip24:long_name = "probability of precipitation, 24 hr"
;
prob_precip24:units = "percent*100" ;
float qpf06(max_site_num, days, fc_times_per_day, num_eqns,
var_regressors, weight_vals) ;
qpf06:long_name = "amount of precipitation" ;
qpf06:units = "mm" ;
data:
}