[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 20001015: Creation of large netCDF datasets using ncgen and CDL files



>To: address@hidden
>From: Frederic J Chagnon <address@hidden>
>Subject: Creation of large netCDF datasets using ncgen and CDL files
>Organization: UCAR/Unidata
>Keywords: 200010132335.e9DNZ8417714

Hi Frederic,

> I am attempting to create a netCDf file to store some model output, and am 
> using a method that has worked for the past year. But now, the file I am 
> creating is very large (3 Gb), and there seems to be a problem.

The netCDF format uses 32-bit file offsets and extents, so in general
the size of a netCDF file is limited to about 2 Gbytes (2^31, since
off_t is a signed type).  However, there are some exceptions, and
below we suggest how you could keep your data in a single file by
taking advantage of one of these exceptions.

On systems that support 64-bit offsets and a large file environment
with files exceeding 2 Gbytes (IRIX64, SunOS 5.x for SPARC v9,
OSF1/Alpha, ...) it is possible to create and access very large netCDF
files.  The remaining size constraints are that the file offset to the
beginning of the record variables (if any) must be less than 2 Gbytes,
and the relative offset to the start of each fixed length variable or
each record variable within a record must be less than 2 Gbytes.
Hence, a very large netCDF file might have

  * no record variables, some ordinary fixed-length variables, and one
    very large (exceeding 2 Gbytes) fixed-length variable; or
  * some ordinary fixed-length and record variables, and one very large
    record variable; or
  * some ordinary fixed-length and record variables and a huge number
    of records.

If you create very large netCDF files, they will only be usable on
other systems that support very large files.  The netCDF file format
has not changed, so files less than 2 Gbytes in size are still
writable and readable on all systems on which netCDF is supported.

To eliminate the above weaker file size constraints would require a
new netCDF format.  So far the original format (version 1, since 1987)
has been sufficient for all versions of the software through the
latest netCDF 3.5 release.  Implementing software that would support a
new format (based on HDF-5) but that would also continue to permit
access to files in the previous format has been in our long-term plans
for netCDF.  But there will be no release in the near future that
doesn't have the above restrictions on variable size or number of
records in a netCDF file.

> Here's my method:
> I have a CDL file in which I define the extents of the domain and all the 
> variables. I use the ncgen -o command to create a netCDF file based on the 
> CDL 
> file. To test the "goodness" of the file created, I use the ncdump -c command.

> This method works well for me if I am creating datasets that are smaller than 
> 1.5 GB. However, I have just added more time step definitions to my CDL file, 
> and while it creates the netCDF file seemlessly, the ncdump -c command 
> returns 
> an error.
>
> I have even downloaded the latest version of the netCDf libraries 
> (netcdf-3.5-beta) in hope that it would solve the problem, but I have had no 
> luck.
>
> I am puzzled, because if I modify the CDL file and decrease either the number 
> of variables, or the size of the domain, the resulting netCDF file "works".
>
> Below is the error message from the ncdump -c command. If you could enlighten 
> me in any way on the matter, I would be most grateful. i would be glad to 
> supply my CDL file (but didn't want to bother you with an attachement unless 
> needed.) Thanks.
>
> ncdump -c OSU_YEAR_2D.nc
> netcdf OSU_YEAR_2D {
> dimensions:
>         londot = 75 ;
>         latdot = 50 ;
>         loncrs = 75 ;
>         latcrs = 50 ;
>         levela = 23 ;
>         levelb = 24 ;
>         levelc = 1 ;
>         time = 8760 ;
> variables:
>         float GROUNDT(time, levelc, latcrs, loncrs) ;
>                 GROUNDT:long_name = "GROUND TEMPERATURE" ;
> (...)
>
>         double time(time) ;
>                 time:time_origin = "01-JAN-1998:00:00:00" ;
>                 time:units = "seconds" ;
>                 time:point_spacing = "even" ;
>
> // global attributes:
>                 :title = "OSU_YEAR SIMULATION OUTPUT" ;
> data:
>
>  londot = ncdump: Invalid argument

So it looks like the GROUNDT variable takes about 131 Mbytes.  If you
have more than 16 such variables, the offset of the 17th and all
subsequent variables would be greater than 2^31 = 2.1475 Gbytes, so
such a file could not be represented with the current netCDF format.

However, if you defined `time' as a record (unlimited) dimension, each
record variable would only require about 15000 bytes, and you could
have up to 2^31 records, so with this structure you should be able to
keep all your data in a single netCDF file.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
address@hidden                     http://www.unidata.ucar.edu