[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NetCDF Java Read API
- Subject: Re: NetCDF Java Read API
- Date: Fri, 14 Nov 2008 16:56:58 -0700
Hi Greg:
1) what version of netcdf-java are you using?
2) how much memory do you give the JVM (-Xmx option). the default can be as low
as 32 Megs.
3) are you reading the entire data into memory?
Greg Rappa wrote:
> This morning, Ai-Hoa sent a query message to you all but, since then,
> we've discovered more and interesting behavior of the Java readers
> that I'd like to share. A question regarding Java file reading and
> memory usage has come up. I've tried to present the situation as
> clearly as possible here.
>
>
> The NetCDF files we write contain one variable, named 'VIL',
> of four dimensions: 24 times corrsponding to that many forecasts
> of the VIL product, 1 altitude layer and 2-D grids sized to the
> extent of the CONUS at 1 km resolution (5120 rows by 3520 columns).
>
> The CDL appears as follows (edited for size):
>
> netcdf edu.mit.ll.wx.ciws.VILForecast.20080912T182500Z {
> dimensions:
> time = 24 ;
> z0 = 1 ;
> y0 = 3520 ;
> x0 = 5120 ;
> variables:
> double time(time) ;
> time:standard_name = "time" ;
> time:long_name = "Product validity time" ;
> time:units = "seconds since 1970-01-01T00:00:00Z" ;
> time:calendar = "gregorian" ;
> time:string = "2008-09-12T18:30:00Z/2008-09-12T20:25:00Z" ;
> double z0(z0) ;
> z0:standard_name = "altitude" ;
> z0:long_name = "Product altitude" ;
> z0:units = "meters" ;
> z0:axis = "Z" ;
> z0:positive = "up" ;
> double y0(y0) ;
> y0:standard_name = "projection_y_coordinate" ;
> y0:long_name = "Distance from projection reference point
> latitude" ;
> y0:units = "meters" ;
> double x0(x0) ;
> x0:standard_name = "projection_x_coordinate" ;
> x0:long_name = "Distance from projection reference point
> longitude" ;
> x0:units = "meters" ;
> short VIL(time, z0, y0, x0) ;
> VIL:standard_name = "atmosphere_cloud_liquid_water_content" ;
> VIL:long_name = "Vertically integrated liquid water (VIL)" ;
> VIL:class_name = "FCST" ;
> VIL:product_name = "FCST" ;
> VIL:units = "kg m-2" ;
> VIL:grid_mapping = "grid_mapping0" ;
> VIL:scale_factor = 0.00244148075807978 ;
> VIL:add_offset = 0. ;
> VIL:_FillValue = -1s ;
> VIL:valid_range = 0s, 32767s ;
>
> The variable is written to disk using the NetCDF4 C++ library,
> with compression enabled (Level 6), by a single call to NcVar::put(),
> as depicted in the following abbreviated code snippet:
>
> unsigned int tBins = tDim->size(); // 24 forecasts
> unsigned int zBins = zDim->size(); // 1 altitude layer
> unsigned int yBins = yDim->size(); // 5120 columns
> unsigned int xBins = xDim->size(); // 3520 rows
> unsigned int allBins = tBins*zBins*yBins*xBins; // 18,022,400 bins
>
> short* shortBuffer = new short[ allBins ];
>
> NcVar* ncVar = ncFile->add_var( varName.c_str(), ncByte,
> tDim, zDim, yDim, xDim );
>
> ncVar->put( shortBuffer, tBins, zBins, yBins, xBins );
>
>
> The chunking size for this file is set equal to the X/Y grid size:
> 5120 * 3520 = 18,022,400.
>
> Files written this way can be read by the NetCDF C++ library.
> However, a number of users from different agencies have been
> reporting that their Java VMs run out of memory while reading
> the file.
>
> Ai-Hoa, Bill and I have demonstrated that the file can be read,
> but only on a 64-bit Linux platform using a Java VM configured
> with a 5 GB maximum memory. Running with a 4 GB max memory
> limit results in the Java VM crash. The question we have is:
>
> Given that the raw variable written to disk consumes about
> 45 MB, and requires a total of 433 MB when uncompressed
> into a 4-D array of shorts ... what else is the Java NetCDF
> and/or HDF5 layer doing to consume the remaining 4.5 GB of
> memory in the Java VM?
>
> I've exported a sample file to our public ftp site. You are all
> welcome to download the file and see what you can make of the
> Java VM memory constraints. The file is available at:
>
> ftp://ftp.ll.mit.edu/outgoing/gregr/edu.mit.ll.wx.ciws.VILForecast.20080912T182500Z.nc
>
> Of course, if there are any suggestions for alternate methods for
> writing the variable to disk, I'd appreciate that too. For instance,
> should I set my chunking size to the maximum data size, that is,
> 24 * 1 * 5120 * 3520 * 2 (bytes) = 865,075,200 ? That seemed
> a little extreme to me, so I stuck with 5120*3520.
>
> Thanks,
> Greg.
>
>
> Sanh, Ai-Hoa wrote:
>>
>> Hello,
>>
>>
>>
>> I hope you don’t mind our asking a question about the NetCDF Java Read
>> methods.
>>
>>
>>
>> Greg has written some NetCDF files with 24 hours worth of forecasts as
>> their data. I am getting “Out of Memory” errors when I try to read
>> these files, even when I am trying to read only a small portion of the
>> data.
>>
>>
>>
>> I tested my code with smaller files, and the reads worked fine.
>>
>>
>>
>> So I was wondering if you have any suggestions for what may be wrong.
>> Perhaps I am not calling the methods correctly. Or there is a limit of
>> the size of files or data.
>>
>>
>>
>> I’m more than happy to send you the code I am using. And we’ll find a
>> way to get the test files to you. Just let me know to whom I should
>> send them, so that I’m not inundating all of your mailboxes.
>>
>>
>>
>> Thanks much.
>>
>> Ai-Hoa
>>
>