[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[netCDF #MAN-367636]: netCDF-4 file grows enormous when dimension set to unlimited



Hi Margaret,

Thanks for sending the file. The netCDF-4 enhanced model feature you are
using is the primitive type "uint64" for one of the variables:

  uint64 qualityFlags(report_number) ;

The netCDF classic data model has only 8-, 16-, and 32-bit signed integer
types in addition to char, float, and double types. The enhanced model 
added signed and unsigned 64-bit types as well other unsigned integer 
types.

The other issue is chunking and small chunk sizes for which the netCDF4/HDF5 
storage overhead can be large in extreme cases like you have encountered. All 
of the 61 variables in your file use the report_number dimension, as can be 
seen 
with

  ncdump -h 
OR_EXIS-L1b-SFEU_G16_s20151772200010_e20151772200010_c20151772201050.nc | grep 
-v '[=:]'

(the grep filter is to ignore all the attribute and dimension declarations, to 
show
just variable declarations).

An unlimited dimension must use chunking, and the default chunksizes are not
good in this case, resulting in lots of chunks of size 1, 1x23, 1x35, and 1x4, 
as
can be seen with

  ncdump -s -h 
OR_EXIS-L1b-SFEU_G16_s20151772200010_e20151772200010_c20151772201050.nc | grep 
_ChunkSizes

Each chunked variable in an HDF5 file has an associated B-tree data structure 
used
to store each individual chunk, and the B-tree overhead is extreme for small 
chunks.

The good news is it's relatively easy to change the chunksizes to something 
reasonable
using nccopy. I'll send more about that in a subsequent response.

--Russ

> Thanks for the reply, Ward. Here is the file. It is possible that some
> enhanced features are being used and I don't know about them. Anyway, I
> appreciate your taking a look.
> 
> Meg
> 
> address@hidden> wrote:
> 
> > Hello Margaret,
> >
> > In regards to the first question, I wonder if the issue is the variables
> > along this dimension have default fill values set; if so , this may explain
> > the explosion in file size.  Would it be possible to get a copy of the
> > netCDF file to play around with? Also, what version of netCDF are you
> > working with?
> >
> > In regards to your second question; it sounds like there is either a bug
> > in nccopy that makes it think you are using or perhaps the file is using
> > some small part of the enhanced model?  That is just a guess; if you can
> > provide the original 192k file, I will figure out why nccopy is giving this
> > error.
> >
> > Thanks in advance, have a great day,
> >
> > -Ward
> >
> > > Hello,
> > >
> > > We have a 192K netCDF-4 file with no unlimited dimensions. One of these
> > > dimensions is report_number, which has dimension 1. When this dimension
> > is
> > > changed to "UNLIMITED // (currently 1)" the file expands to 213M, so over
> > > 1000 times as large as the original. Do you know what could be causing
> > > this? Is there any way to avoid it aside from compressing the file?
> > >
> > > Second question: I contacted Unidata a while ago, and Russ told me that
> > we
> > > likely had netCDF-4 files that didn't actually use any of the enhanced
> > > features. The data producer has confirmed that this is the case. I was
> > > interested in seeing if using the netCDF-4 classic model would reduce the
> > > file size, because other people here at NOAA have found netCDF-4 classic
> > > sizes to be on par with netCDF-3 (i.e., much less than the netCDF-4
> > > enhanced model). However, when I used the nccopy utility Russ
> > recommended,
> > > I got the following error:
> > >
> > > "Attempting netcdf-4 operation on strict nc3 netcdf-4 file"
> > >
> > > Is there some way to change the "strict nc3" flag, since this is really a
> > > netCDF-4 file?
> > >
> > > Thanks for any information.
> > >
> > > Meg Tilton
> > >
> > >
> > >
> > >
> > >
> > > address@hidden> wrote:
> > >
> > > > Meg,
> > > >
> > > > > Strange! That means our files are the netCDF-4 enhanced version, so
> > I'm
> > > > > surprised anyone could get ncdump -x to work on them. I guess it will
> > > > > remain an unsolved mystery.
> > > >
> > > > A possible explanation for the mystery is that your netCDF-4 files
> > > > really don't use any features of the enhanced model, but aren't
> > > > marked as netCDF-4 classic model files.  A netCDF-4 classic model
> > > > file is just a netCDF-4 file with a special scalar integer attribute
> > > > named "_nc3_strict" in the root group, which is tested to enforce
> > > > never adding any features of the enhanced netCCDF-4 data model to
> > > > the file, so that it will always be readable using the netCDF-3 API.
> > > >
> > > > Back in version 4.1.1, I don't think ncdump tested the file type, it
> > > > just printed whatever it could see through the API and displayed it
> > > > as NcML when the "-x" flag was used. But the ncdump code was never
> > > > modified to present the NcML representations for any of the netCDF-4
> > > > enhanced model features, partly because those representations were
> > > > still under development when netCDF 4.1.1 was released.
> > > >
> > > > If your current netCDF files are really netCDF-4 files that don't
> > > > use any enhanced data model features, then you could mark them as
> > > > netCDF-4 classic model files using the "nccopy" utility:
> > > >
> > > >   nccopy -k "netCDF-4 classic model" foo4.nc foo4c.nc
> > > >
> > > > to convert a netCDF-4 file to a netCDF-4 classic model file. That
> > > > would add the extra attribute (invisible through the netCDF API).
> > > > You could also do the same thing through the HDF5 API, which would
> > > > permit adding the attribute and overwriting files, which nccopy
> > > > doesn't permit.
> > > >
> > > > --Russ
> > > >
> > > > > address@hidden> wrote:
> > > > > >
> > > > > > Meg,
> > > > > >
> > > > > > > Thanks for your responses to my email.
> > > > > > >
> > > > > > > When I ran the ncdump -k on one of our netCDF4 files, the
> > response
> > > > was
> > > > > > just
> > > > > > > "netCDF-4." Does this mean it's the enhanced model, and it would
> > say
> > > > > > > "classic" otherwise? Or is it the other way around?
> > > > > >
> > > > > > It's the other way around.  The outputs from ncdump -k are one of
> > these
> > > > > > four strings:
> > > > > >
> > > > > >   classic
> > > > > >   64-bit offset
> > > > > >   netCDF-4
> > > > > >   netCDF-4 classic model
> > > > > >
> > > > > > The "-x" option works to specify NcML output for all but the third
> > of
> > > > > > those format variants, "netCDF-4".
> > > > > >
> > > > > > > I will pass on the information about the java netCDF library to
> > the
> > > > NOAA
> > > > > > > people who run our CLASS archive. They were the ones who were
> > > > originally
> > > > > > > asking about this, and they need to develop code to extract the
> > NcML
> > > > from
> > > > > > > netCDF4 files. So this may be very useful for them.
> > > > > >
> > > > > > --Russ
> > > > > >
> > > > > > > address@hidden> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Margaret Tilton - NOAA Affiliate,
> > > > > > > >
> > > > > > > > Your Ticket has been received, and a Unidata staff member will
> > > > review
> > > > > > it
> > > > > > > > and reply accordingly. Listed below are details of this new
> > Ticket.
> > > > > > Please
> > > > > > > > make sure the Ticket ID remains in the Subject: line on all
> > > > > > correspondence
> > > > > > > > related to this Ticket.
> > > > > > > >
> > > > > > > >     Ticket ID: NRE-269426
> > > > > > > >     Subject: ncdump -x with netCDF4 files
> > > > > > > >     Department: Support netCDF
> > > > > > > >     Priority: Normal
> > > > > > > >     Status: Open
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > The NetCDF libraries are developed at the Unidata Program
> > Center,
> > > > > > > > in Boulder, Colorado, funded primarily by the National Science
> > > > > > Foundation.
> > > > > > > >
> > > > > > > > All support requests are handled by the development team. No
> > > > dedicated
> > > > > > > > support staff are funded at this time. For this reason we
> > cannot
> > > > > > guarantee
> > > > > > > > response times, nor that we can resolve every support issue,
> > > > although
> > > > > > we
> > > > > > > > do our best to respond within 72 hours.
> > > > > > > >
> > > > > > > > It is in the nature of support requests that the same question
> > is
> > > > asked
> > > > > > > > many
> > > > > > > > times. We urge you to search the support archives for material
> > > > > > relating to
> > > > > > > > your support request:
> > > > > > > >
> > > > > > > > http://www.unidata.ucar.edu/search.jsp?support&netcdf
> > > > > > > >
> > > > > > > > If you are having trouble building netCDF, please take a look
> > at
> > > > the
> > > > > > > > "Building NetCDF" page:
> > > > > > > >
> > > > > > > > http://www.unidata.ucar.edu/software/netcdf/docs/building.html
> > > > > > > >
> > > > > > > > or the (unfortunately somewhat out-of-date) NetCDF Build
> > > > Troubleshooter
> > > > > > > > page:
> > > > > > > >
> > > > > > > >
> > http://www.unidata.ucar.edu/software/netcdf/docs/troubleshoot.html
> > > > > > > >
> > > > > > > > Windows users should see the FAQ list:
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#windows_netcdf4_2
> > > > > > > >
> > > > > > > > Complete documentation (including a tutorial, and sample
> > programs
> > > > in C,
> > > > > > > > Fortran,
> > > > > > > > Java, and other programming languages) can be found on the
> > netCDF
> > > > > > > > Documentation page:
> > > > > > > >
> > > > > > > > http://www.unidata.ucar.edu/software/netcdf/docs/
> > > > > > > > http://www.unidata.ucar.edu/software/netcdf/examples/programs/
> > > > > > > >
> > > > > > > > If you resolve your issue through one of these methods, please
> > > > send a
> > > > > > > > reply to
> > > > > > > > this email, letting us know that you no longer need support.
> > This
> > > > will
> > > > > > help
> > > > > > > > us spend more time on netCDF development.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Unidata User Support
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Margaret Tilton
> > > > > > > Cooperative Institute for Research in Environmental Sciences
> > (CIRES)
> > > > at
> > > > > > > University of Colorado at Boulder and
> > > > > > > NOAA National Geophysical Data Center, Solar and Terrestrial
> > Physics
> > > > > > > Division
> > > > > > > 325 Broadway, E/GC2
> > > > > > > Boulder, Colorado 80305
> > > > > > > 303-497-6223
> > > > > > >
> > > > > > >
> > > > > > Russ Rew                                         UCAR Unidata
> > Program
> > > > > > address@hidden
> > http://www.unidata.ucar.edu
> > > > > >
> > > > > >
> > > > > >
> > > > > > Ticket Details
> > > > > > ===================
> > > > > > Ticket ID: NRE-269426
> > > > > > Department: Support netCDF
> > > > > > Priority: Normal
> > > > > > Status: Closed
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Margaret Tilton
> > > > > Cooperative Institute for Research in Environmental Sciences (CIRES)
> > at
> > > > > University of Colorado at Boulder and
> > > > > NOAA National Geophysical Data Center, Solar and Terrestrial Physics
> > > > > Division
> > > > > 325 Broadway, E/GC2
> > > > > Boulder, Colorado 80305
> > > > > 303-497-6223
> > > > >
> > > > >
> > > > Russ Rew                                         UCAR Unidata Program
> > > > address@hidden                      http://www.unidata.ucar.edu
> > > >
> > > >
> > > >
> > > > Ticket Details
> > > > ===================
> > > > Ticket ID: NRE-269426
> > > > Department: Support netCDF
> > > > Priority: Normal
> > > > Status: Closed
> > > >
> > > >
> > >
> > >
> > > --
> > > Margaret Tilton
> > > Cooperative Institute for Research in Environmental Sciences (CIRES)
> > > at the University
> > > of Colorado and NOAA National Centers for Environmental Information
> > (NCEI)
> > > 325 Broadway, E/GC2
> > > Boulder, Colorado 80305
> > > 303-497-6223
> > >
> > >
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: MAN-367636
> > Department: Support netCDF
> > Priority: Normal
> > Status: Closed
> >
> >
> 
> 
> --
> Margaret Tilton
> Cooperative Institute for Research in Environmental Sciences (CIRES)
> at the University
> of Colorado and NOAA National Centers for Environmental Information (NCEI)
> 325 Broadway, E/GC2
> Boulder, Colorado 80305
> 303-497-6223
> 
> 
Russ Rew                                         UCAR Unidata Program
address@hidden                      http://www.unidata.ucar.edu



Ticket Details
===================
Ticket ID: MAN-367636
Department: Support netCDF
Priority: Normal
Status: Closed