This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Hi Bob:1) What heap memory settings are you giving to the JVM on tomcat startup? (to verify, you can look at http://server/thredds/debug and choose "showRuntimeInfo", assuming you have remote debugging enabled)
2) what is the hardware, OS, and JVM version? 3) how big is /thredds/Hfradar/aggreghfradarC1/u33h/catalog.xml ?4) remind me of the URL of your server so i can see how big some of these datasets are.
Im guessing there is a "memory leak" (objects not getting GCd) but i need to eliminate the obvious first.
Re: the info in the log, eg: 2007-11-28T04:55:27.959 -0800 [ 633762099][ 1094643] = date [msecs since server start] [request number] so this server has been up about 7 days, and served > 1 million requests (!)I need to investigate what TDS does with bad files, especially if you replace them. Ill get back to you asap.
Bob Simons wrote:
OK. Thredds crashed this morning attached are the catalina.out and threddsServlet.logs from that time period.I hope that first bracketed number on each line of threddsServlet.log isn't memoryInUse. Is it?A possibly related question about bad files:If there is a bad data file (e.g., 0 bytes) which Thredds notices and we subsequently delete, if we then replace that file, does thredds notice that it is a new file (because the cacheAged cache includes information about the files date and notices that the file's date has changed) or do we need to delete the relevant cacheAged file whenever we delete a data file?Thanks for looking into this. John Caron wrote:too bad. if.when it happens again, grab it. i will try to replicate here.(you are checking all files of the form threddsServlet.log.N N=1..10 ?) Bob Simons wrote:Sorry. It has been too long. That information is no longer available. John Caron wrote:sorry to be vague; in content/thredds/logs/threddsServlet.log there should be at least one more, possibly 2Bob Simons wrote:That was the only message with "[ 6343]". John Caron wrote:Hi Bob: Can you look in the log file for all messages from request 6343:2007-11-15T08:17:57.883 -0800 [ 1024030][ 6343] ERROR - thredds.server.opendap.NcDODSServlet - Errorjava.lang.OutOfMemoryError: Java heap space these will be in order, but may be interleaved with other requests: id like to see what size the original request was. Bob Simons wrote:We're having a lot of problems with thredds crashing lately. We're hoping you can look into the problem or give us advice.We recently updated to THREDDS Data Server Version 3.16.22 Build Date = 2007-10-25 19:10:28.The general scenario is: 1) One or more invalid .nc files are added to the data directories. Obviously, this isn't intentional. And we just found the source of many of these: a bad NIC card. But given the huge number of files involved, it has always occurred periodically. Often, it is a file with 0 bytes. 2) During aggregation, thredds notices the bad file and reports it to catalina.out. For example:2007-11-15T09:17:45.277 -0800 [ 1446340][ 3538] ERROR - thredds.server.opendap.NcDODSServlet - Error java.io.IOException: Cant read /u00/satellite/AK/sstd/hday/AK2007272_014200h_sstd.nc: not a valid NetCDF file.at ucar.nc2.NetcdfFile.open(NetcdfFile.java:459) at ucar.nc2.NetcdfFile.open(NetcdfFile.java:261)at ucar.nc2.dataset.NetcdfDataset.openFile(NetcdfDataset.java:341) at ucar.nc2.ncml.Aggregation$Dataset$PolymorphicReader.open(Aggregation.java:1286) at ucar.nc2.NetcdfFileCache.acquire(NetcdfFileCache.java:188) at ucar.nc2.ncml.Aggregation$Dataset._acquireFile(Aggregation.java:1186) at ucar.nc2.ncml.Aggregation$Dataset.acquireFile(Aggregation.java:1172) at ucar.nc2.ncml.Aggregation$Dataset.getNcoords(Aggregation.java:1100) at ucar.nc2.ncml.Aggregation$Dataset.setStartEnd(Aggregation.java:1121) at ucar.nc2.ncml.Aggregation$Dataset.access$100(Aggregation.java:1002) at ucar.nc2.ncml.Aggregation.buildCoords(Aggregation.java:315) at ucar.nc2.ncml.AggregationExisting.buildDataset(AggregationExisting.java:61)at ucar.nc2.ncml.Aggregation.finish(Aggregation.java:293) at ucar.nc2.ncml.NcMLReader.readNetcdf(NcMLReader.java:397) 3) Sometime afterwards(not immediatly, but it seems to be sooner if there are more bad files),thredds crashes. For example:2007-11-15T08:17:57.883 -0800 [ 1024030][ 6343] ERROR - thredds.server.opendap.NcDODSServlet - Errorjava.lang.OutOfMemoryError: Java heap spaceDODServlet ERROR (anyExceptionHandler): java.lang.OutOfMemoryError: Java heap spacejava.lang.OutOfMemoryError: Java heap space ReqState: serverClassName: 'thredds.server.opendap.NcDODSServlet' dataSet: 'satellite/CM/usfc/hday' requestSuffix: 'dods' CE: 'altitude,lat,lon,time' compressOK: false InitParameters: DebugOn: '' maxNetcdfFilesCached: '10' java.lang.OutOfMemoryError: Java heap spaceThere may be a couple of these errors (referring to different datasets) in rapid succession, then:2007-11-15T08:18:18.048 -0800 [ 1044195][ 6487] ERROR - org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/thredds].[NetcdfDODS] - Servlet.service() for servlet NetcdfDODS threw exceptionjava.lang.OutOfMemoryError: Java heap spaceNov 15, 2007 8:18:12 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor processChildrenSEVERE: Exception invoking periodic operation: java.lang.OutOfMemoryError: Java heap space Nov 15, 2007 8:18:53 AM unknown unknown SEVERE: Exception invoking periodic operation: java.lang.OutOfMemoryError: Java heap spaceSo, we aren't certain that the bad files are the cause of the OutOfMemoryErrors, but the correlation is strong. With no or few bad files, thredds seems to stay up longer. With more bad files, thredds seems more likely to crash.As a possible cause, I'll mention the previous problem with .nc files not being automatically closed by an over-ridden finalize method. (See Original Message below.) This caused a memory and resource leak if the program didn't explicitly close every file (even if an Exception or Error is thrown during processing). A similar problem could certainly be the cause of the problem here.And I'll mention our previous request to have Thredds simply ignore bad files, but keep serving the rest of the dataset. (See Original Message below.) You said you were going to work on that. I don't know if you made changes or what changes you made. I'll just point out that we only recently upgraded to a version of thredds which might have these changes in it. We had previously been using a somewhat older version because of other problems with the intervening versions of thredds.I hope that is helpful. Thanks for looking into this. Sincerely, Bob Simons Satellite Data Product Manager Environmental Research Division NOAA Southwest Fisheries Science Center 1352 Lighthouse Ave Pacific Grove, CA 93950-2079 (831)658-3205 address@hidden The contents of this message are mine personally and do not necessarily reflect any position of the Government or the National Oceanic and Atmospheric Administration. <>< <>< <>< <>< <>< <>< <>< <>< <>< -------- Original Message -------- Subject: finalize Date: Fri, 18 Aug 2006 12:10:57 -0700 From: Bob Simons <address@hidden> Organization: NOAA/ERD To: John Caron <address@hidden> I had a bug in a new procedure I was writing (my fault, I admit),wherein the close() method of many NetcdfFile objects was never called.On Windows, this code ran fine. The Java implementation on Windows (specifically the garbage collector) must have been smart enough to recover all the memory allocated and close all the files when the NetcdfFile objects were no longer referenced.But on Linux, the procedure failed after many calls, and I got an error message (from Java? from Tomcat? it doesn't matter) saying "too manyfiles open". Fine. I fixed the problem in my code. But I two requests for you: 1) In the JavaDocs for the NetcdfFile class (up at the top), pleaseinclude the same suggestion as you already have in NetcdfFileCache'sJavaDocs regarding the use of "finally" to ensure that close() is called. E.g., NetcdfFile ncfile = null; try { ncfile = NetcdfFile.open(fileName); ... } finally { ncfile.close(); }2) I think it would be a huge improvement to NetcdfFile (and presumablyNetcdfFileCache, NetcdfDataset, NetcdfDatasetCache, DODSNetcdfFile(which isn't in the latest copy of the documentation I have), and any similar object), to include the following method (or something like it):protected void finalize() throws Throwable { close(); } As you probably know, the finalize() method is inherited from thejava.lang.Object class and is called whenever an object is eligible for garbage collection. In this case, overriding finalize() lets you ensurethat the close() method always gets called.In real life, it isn't always possible to use try/finally. That opens upthe possibility of not calling close() if an error occurs. It is dangerous to have that resource leakage be a possibility. Your use of finalize() prevents this type of resource leakage (as dangerous as memory leaks in other languages) from occurring. I admit again that it was my coding error that lead to these issue coming up. But I think it is good defensive programming to have a finalize() method for classes like these. Thank you. ***end -------- Original Message -------- -------- Original Message -------- Subject: Request for enhancement Date: Wed, 03 Jan 2007 15:23:39 -0800 From: Bob Simons <address@hidden> Organization: NOAA/ERD To: John Caron <address@hidden>CC: Roy Mendelssohn <address@hidden>, Dave Foley <address@hidden>The Problem: We often use THREDDS to aggregate near-real-time data. We have one directory per data set and frequently (hourly or daily) add anadditional data file to each data set's directory. Sometimes things go wrong (e.g., power outages) and a partial and/or damaged file is left in a data set's directory. We try to deal with this, but our methods aren'tperfect.Currently, when THREDDS is aggregating a data set, if THREDDS finds an invalid .nc data file in with all the .nc data files for a given data set, it throws an Exception ("java.io.IOException: Cant read [fileName]: not a valid NetCDF file.") and rejects the entire dataset. This is aserious problem for us because one bad file makes an entire dataset disappear from THREDDS. The only way for us to find out if this hashappened is to go through the <tomcat>/logs/catalina.out log file (whichwe do periodically) and look for the Exception messages. The Request:It would be nice if THREDDS simply rejected the bad files (throwing an Exception as it does now), but kept going with the aggregation of thedata set.(I realize that the proposed recommendation is good for our situation and you may not be comfortable making it the standard behavior for THREDDS.)Thank you. ***end -------- Original Message --------