[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[netCDFJava #MQO-415619]: Efficiently serializing NetCDF in memory objects
- Subject: [netCDFJava #MQO-415619]: Efficiently serializing NetCDF in memory objects
- Date: Tue, 12 Dec 2017 11:49:54 -0700
Sorry, I posted the proposal to the netcdf news group and forgot to copy
to the gihub site. I will do that ASAP.
>
> This is good news. I am eager to provide feedback on your proposal.
> I did not find any proposal in the linked issue. Did you forget to push :) ?
>
> Also, it has not been updated with our latest discussions. Do you want me
> to fix that ?
>
> Regards,
>
> Michaël
>
> address@hidden>:
>
> > It appears I can do this. You might examine my proposed
> > API and tell me if it will serve your purposes.
> > See https://github.com/Unidata/netcdf-c/issues/708
> >
> > > I will investigate and see if I can use this function
> > > to get the desired effect.
> > > Thanks for bringing it to my attention.
> > >
> > > >
> > > > The HDFgroup forum has pointed me to the H5Pset_file_image
> > > > <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#
> > Property-SetFileImage>
> > > > function. This is documented further here
> > > > <https://support.hdfgroup.org/HDF5/doc/Advanced/FileImageOperations/
> > HDF5FileImageOperations.pdf>
> > > > and seems to be available from HDF5 1.8.9 upwards.
> > > > So, I should rejoice as the limitations no longer comes from the HDF5
> > > > software. Yet, I am a bit suspicious because this feature has been
> > > > available from mid-2012 and I fear you are not using it for a good
> > reason.
> > > >
> > > > What do you think ? I am willing to try implementing it if someone can
> > > > mentor me.
> > > >
> > > > Regards,
> > > >
> > > > Michaël
> > > >
> > > > address@hidden>:
> > > >
> > > > > I think that as long as the mmap is set up with non-persist, then
> > > > > the only writes to disk will occur with paging.
> > > > >
> > > > > > Actually, my solution is less than optimal since, using mmap, data
> > is
> > > > > still
> > > > > > written to disk eventually.
> > > > > >
> > > > > > The goal is to avoid both writes to disk and data copies. This is
> > > > > sometimes
> > > > > > called "zero copy" optimization and would work because in my
> > application,
> > > > > > we enforce that input data is read only. Output data is modified
> > > > > obviously
> > > > > > but can be seen as read only just before transmission.
> > > > > >
> > > > > > I have started a thread on the HDF forum on the topic. I am also
> > looking
> > > > > > into Apache Common VFS Ram filesystem as a fallback workaround.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > address@hidden>:
> > > > > >
> > > > > > > Actually, we do use mmap already. The problem is that the mapped
> > memory
> > > > > > > is hidden deep in, for example, the hdf5 memory driver. So there
> > is no
> > > > > > > way I can see to get access to it. If we did have access to it,
> > then
> > > > > > > of course, we could copy it out and give you the functionality
> > you
> > > > > desire .
> > > > > > >
> > > > > > > >
> > > > > > > > My ideal use case would allow users to create/modify NetCDF4
> > datasets
> > > > > > > > in-memory. Once done, my software would read the binary data
> > and
> > > > > send it
> > > > > > > > over the network.
> > > > > > > > One workaround I see is to use POSIX memory mapped file
> > > > > > > > <https://en.wikipedia.org/wiki/Memory-mapped_file> to trick
> > the
> > > > > library
> > > > > > > > into working in-memory.
> > > > > > > >
> > > > > > > > I now understand this issue originates from a limitation of
> > the HFD5
> > > > > > > > library. I can raise this issue to the HDF group. Publications
> > such
> > > > > as
> > > > > > > this
> > > > > > > > one <https://support.hdfgroup.org/pubs/papers/Big_HDF_FAQs.pdf>
> > (see
> > > > > > > also
> > > > > > > > this
> > > > > > > > <https://www.hdfgroup.org/2015/03/from-hdf5-datasets-to-
> > > > > > > apache-spark-rdds/>)
> > > > > > > > seems to indicate they would be interested bythe feature.
> > Indeed,
> > > > > > > avoiding
> > > > > > > > unecessary transfers to disk is key to achieving good
> > performance in
> > > > > Big
> > > > > > > > Data systems (this is the whole point of Apache Spark BTW).
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > address@hidden>:
> > > > > > > >
> > > > > > > > > This question has come up before.
> > > > > > > > > This is currently not possible for netcdf-4 files.
> > > > > > > > > Using the NC_DISKLESS mode flag allows for keeping
> > > > > > > > > the file in memory. The nc_open_mem function allows for the
> > > > > > > > > read only access to treat a chunk of memory as if it was
> > > > > > > > > a netcdf file. Unfortunately, for netcdf-4, we ultimately
> > > > > > > > > depend on the HDF5 operation H5P_set_core
> > > > > > > > > (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#
> > > > > > > Property-SetFaplCore
> > > > > > > > > ).
> > > > > > > > > Upon close(), this can optionally store the memory buffer in
> > a
> > > > > file.
> > > > > > > > > Unfortunately, it does not (currently) provide an option to
> > copy
> > > > > out
> > > > > > > the
> > > > > > > > > memory.
> > > > > > > > > The only solution I can see for now is to build an
> > alternative to
> > > > > the
> > > > > > > core
> > > > > > > > > driver
> > > > > > > > > that provides access (somehow) to the memory.
> > > > > > > > > BTW, this is all going on at the netcdf-c library level. Our
> > pure
> > > > > jave
> > > > > > > HDF5
> > > > > > > > > reader is read-only, hence cannot create or modify files.
> > > > > > > > >
> > > > > > > > > I have created an issue for this
> > > > > > > > > (https://github.com/Unidata/netcdf-c/issues/708)
> > > > > > > > > but it is not likely to get implemented anytime soon.
> > > > > > > > >
> > > > > > > > > You will have to be content with writing the contents to a
> > file.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I am using NetCDF Java library 4.6.10.
> > > > > > > > > >
> > > > > > > > > > My goal is to efficiently send NetcdfFile
> > > > > > > > > > <https://www.unidata.ucar.edu/software/thredds/current/
> > > > > > > > > netcdf-java/javadoc/ucar/nc2/NetcdfFile.html>
> > > > > > > > > > objects over the network using a communication library
> > (such as
> > > > > > > ZeroMQ) .
> > > > > > > > > > Because the NetcdfFile
> > > > > > > > > > <https://www.unidata.ucar.edu/software/thredds/current/
> > > > > > > > > netcdf-java/javadoc/ucar/nc2/NetcdfFile.html>
> > > > > > > > > > class
> > > > > > > > > > does not implement Serializable, I need to implement that
> > step
> > > > > > > myself.
> > > > > > > > > >
> > > > > > > > > > Since NetCDF is a machine independent data format, I would
> > like
> > > > > to
> > > > > > > access
> > > > > > > > > > the raw binary data.
> > > > > > > > > >
> > > > > > > > > > This is trivial if the data kas been written to a file on
> > the
> > > > > disk.
> > > > > > > But,
> > > > > > > > > > what about in memory datasets ? If possible, I would like
> > to
> > > > > access
> > > > > > > to
> > > > > > > > > > binary data without writing it to disk...
> > > > > > > > > >
> > > > > > > > > > Can I access the buffer of an in-memory NetcdfFile
> > > > > > > > > > <https://www.unidata.ucar.edu/software/thredds/current/
> > > > > > > > > netcdf-java/javadoc/ucar/nc2/NetcdfFile.html>
> > > > > > > > > > object
> > > > > > > > > > from the Java API ? Any pointers will be appreciated.
> > > > > > > > > >
> > > > > > > > > > Kind regards,
> > > > > > > > > >
> > > > > > > > > > Michaël
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > =Dennis Heimbigner
> > > > > > > > > Unidata
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ticket Details
> > > > > > > > > ===================
> > > > > > > > > Ticket ID: MQO-415619
> > > > > > > > > Department: Support netCDF Java
> > > > > > > > > Priority: Normal
> > > > > > > > > Status: Open
> > > > > > > > > ===================
> > > > > > > > > NOTE: All email exchanges with Unidata User Support are
> > recorded
> > > > > in the
> > > > > > > > > Unidata inquiry tracking system and then made publicly
> > available
> > > > > > > through
> > > > > > > > > the web. If you do not want to have your interactions made
> > > > > available
> > > > > > > in
> > > > > > > > > this way, you must let us know in each email you send to us .
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > =Dennis Heimbigner
> > > > > > > Unidata
> > > > > > >
> > > > > > >
> > > > > > > Ticket Details
> > > > > > > ===================
> > > > > > > Ticket ID: MQO-415619
> > > > > > > Department: Support netCDF Java
> > > > > > > Priority: Normal
> > > > > > > Status: Open
> > > > > > > ===================
> > > > > > > NOTE: All email exchanges with Unidata User Support are recorded
> > in the
> > > > > > > Unidata inquiry tracking system and then made publicly available
> > > > > through
> > > > > > > the web. If you do not want to have your interactions made
> > available
> > > > > in
> > > > > > > this way, you must let us know in each email you send to us.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > =Dennis Heimbigner
> > > > > Unidata
> > > > >
> > > > >
> > > > > Ticket Details
> > > > > ===================
> > > > > Ticket ID: MQO-415619
> > > > > Department: Support netCDF Java
> > > > > Priority: Normal
> > > > > Status: Open
> > > > > ===================
> > > > > NOTE: All email exchanges with Unidata User Support are recorded in
> > the
> > > > > Unidata inquiry tracking system and then made publicly available
> > through
> > > > > the web. If you do not want to have your interactions made
> > available in
> > > > > this way, you must let us know in each email you send to us.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > =Dennis Heimbigner
> > > Unidata
> > >
> >
> > =Dennis Heimbigner
> > Unidata
> >
> >
> > Ticket Details
> > ===================
> > Ticket ID: MQO-415619
> > Department: Support netCDF Java
> > Priority: Normal
> > Status: Open
> > ===================
> > NOTE: All email exchanges with Unidata User Support are recorded in the
> > Unidata inquiry tracking system and then made publicly available through
> > the web. If you do not want to have your interactions made available in
> > this way, you must let us know in each email you send to us.
> >
> >
> >
>
>
=Dennis Heimbigner
Unidata
Ticket Details
===================
Ticket ID: MQO-415619
Department: Support netCDF Java
Priority: Normal
Status: Open
===================
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata
inquiry tracking system and then made publicly available through the web. If
you do not want to have your interactions made available in this way, you must
let us know in each email you send to us.