This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
>To: address@hidden >From: "Jim Cowie" <address@hidden> >Subject: netCDF General - Large file problem >Organization: RAL >Keywords: 200503022003.j22K3ZjW025634 Jim, You wrote: > I will include a CDL of the file below. As far as I can tell, the > structure of these variables should allow a large file version, > according to the restrictions on large files under the > 3.5documentation. It looks like the structure of your CDL violates one of the restrictions for large files under the "classic" format, described in the 3.5 documentation, although the netCDF library should have returned an error when you tried to define the variable "cprob_snow" that is the first variable to violate the format constraints: There are important constraints on the structure of large netCDF files that result from the 32-bit relative offsets that are part of the netCDF file format: If you don't use the unlimited dimension, only one variable can exceed 2 Gbytes in size, but it can be as large as the underlying file system permits. It must be the last variable in the dataset, and the offset to the beginning of this variable must be less than about 2 Gbytes. For example, the structure of the data might be something like: netcdf bigfile1 { dimensions: x=2000; y=5000; z=10000; variables: double x(x); // coordinate variables double y(y); double z(z); double var(x, y, z); // 800 Gbytes } From your CDL, the offset to the beginning of the cprob_snow variable is at least the size of the header plus the size of the data arrays for all the previous variables. Ignoring the size of the header, I just added up the sizes of the preceding data arrays with the help of a little python script. In units of bytes, these work out to 0 type_bytes 4 forc_time 12 creation_time 20 num_sites 24 site_list 9224 T 211977224 max_T 264969224 min_T 317961224 dewpt 529929224 wind_u 741897224 wind_v 953865224 wind_speed 1165833224 cloud_cov 1377801224 visibility 1589769224 prob_fog 1801737224 prob_thunder 2013705224 cprob_rain 2225673224 cprob_snow 2437641224 cprob_ice 2649609224 prob_precip06 2861577224 prob_precip24 2914569224 qpf06 so the last five variables all have offsets larger than 2**31 = 2147483648. The netCDF 3.5.0 library should have returned an error when you tried to define cprob_snow with an nf_def_var() call, and if it didn't, that's a bug. I verified that the 3.6.0 library does return an error in this case, as it should, So this data can't all be stored in a classic format netCDF file, but the good news is that it can probably be stored and accessed fine in a 64-bit offset format netCDF file, and it may be possible to fix the headers of the CDF1 files to make them CDF2 files and copy the data into the CDF2 files with no data loss. It's also possible that the 3.5.0 library wrote values for the last five variables in the wrong place, in which case the data could not be recovered. With a little work, I think I could convert the file you provided to a CDF2 file and dump out the first few values of each of the last five variables. If I did that, could you tell if they looked right? Do you have a lot of files in this form, so that it would be worth trying to recover the data in this way? --Russ netcdf gfs00_dmos_emp.20050217.0040 { dimensions: max_site_num = 2300 ; num_eqns = 30 ; var_regressors = 3 ; days = 16 ; fc_times_per_day = 4 ; daily_time = 1 ; weight_vals = 4 ; variables: int type ; type:long_name = "cdl file type" ; double forc_time ; forc_time:long_name = "time of earliest forecast" ; forc_time:units = "seconds since 1970-1-1 00:00:00" ; double creation_time ; creation_time:long_name = "time at which forecast file was created" ; creation_time:units = "seconds since 1970-1-1 00:00:00" ; int num_sites ; num_sites:long_name = "number of actual_sites" ; int site_list(max_site_num) ; site_list:long_name = "forecast site list" ; float T(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; T:long_name = "temperature" ; T:units = "Celsius" ; float max_T(max_site_num, days, daily_time, num_eqns, var_regressors, weight_vals) ; max_T:long_name = "maximum temperature" ; max_T:units = "Celsius" ; float min_T(max_site_num, days, daily_time, num_eqns, var_regressors, weight_vals) ; min_T:long_name = "minimum temperature" ; min_T:units = "Celsius" ; float dewpt(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; dewpt:long_name = "dewpoint" ; dewpt:units = "Celsius" ; float wind_u(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; wind_u:long_name = "u-component of wind" ; wind_u:units = "meters per second" ; float wind_v(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; wind_v:long_name = "v-component of wind" ; wind_v:units = "meters per second" ; float wind_speed(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; wind_speed:long_name = "wind speed" ; wind_speed:units = "meters per second" ; float cloud_cov(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; cloud_cov:long_name = "cloud cover" ; cloud_cov:units = "percent*100" ; float visibility(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; visibility:long_name = "visibility" ; visibility:units = "km" ; float prob_fog(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; prob_fog:long_name = "probability of fog" ; prob_fog:units = "percent*100" ; float prob_thunder(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; prob_thunder:long_name = "probability of thunder" ; prob_thunder:units = "percent*100" ; float cprob_rain(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; cprob_rain:long_name = "conditional probability of rain" ; cprob_rain:units = "percent*100" ; float cprob_snow(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; cprob_snow:long_name = "conditional probability of snow" ; cprob_snow:units = "percent*100" ; float cprob_ice(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; cprob_ice:long_name = "conditional probability of ice" ; cprob_ice:units = "percent*100" ; float prob_precip06(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; prob_precip06:long_name = "probability of precipitation, 6 hr" ; prob_precip06:units = "percent*100" ; float prob_precip24(max_site_num, days, daily_time, num_eqns, var_regressors, weight_vals) ; prob_precip24:long_name = "probability of precipitation, 24 hr" ; prob_precip24:units = "percent*100" ; float qpf06(max_site_num, days, fc_times_per_day, num_eqns, var_regressors, weight_vals) ; qpf06:long_name = "amount of precipitation" ; qpf06:units = "mm" ; data: }