[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thredds out of memory



Tennessee Leeuwenburg wrote:

Not quite. I have something called a MARS database. This is an object-oriented database, whose only access is via a compiled C program, and which does not support any kind of network request.

I wrote a Java servlet which parses the URL, extracts query information, runs the query against the database, converts the resulting GRIB file to NetCDF, and serves that back to the user. In this case, Thredds.


This, to the client, is (should be) invisible compared with requesting a NetCDF file via HTTP from any source, such as a web server like apache.

Thredds is capable of sourcing its data from HTTP sources as opposed to files on the local disk. My configuration file looks a little something like this :

<!DOCTYPE catalog SYSTEM "http://www.unidata.ucar.edu/projects/THREDDS/xml/AggServerCatalog.dtd";> <catalog name="THREDDS - DODS Aggregation Server Catalog" version="0.6" xmlns="http://www.unidata.ucar.edu/thredds"; xmlns:xlink="http://www.w3.org/1999/xlink";>
   <dataset name="Top-Level Dataset" dataType="Grid" serviceName="this">
       <service name="this" serviceType="DODS" base=""/>

<service name="apache" serviceType="NetCDF" base="http://kahless.ho.bom.gov.au/"/> <service name="marslet" serviceType="NetCDF" base="http://kahless.ho.bom.gov.au:8080/marslet/"/>

       <dataset name="Large Internal Marslet" serviceName="this">
           <property name="internalService" value="marslet"/>
           <dataset name="Surface Data" urlPath="verylarge.nc"/>
       </dataset>
             <dataset name="Large Internal Apache" serviceName="this">
           <property name="internalService" value="apache"/>
           <dataset name="Surface Data" urlPath="laps-levels-large.nc"/>
       </dataset>

   </dataset>
</catalog>

For small files, this actually works. For larger files, the interaction with the servlet breaks somehow, however the file sources from apache works okay.

I don't understand why this is the case. Software such as wget, firefox etc is happily able to download the file, resume partial downloads etc. Thredds is happily able to get smaller files. I fail to see why file size is affecting the system so badly.

generally a netcdf client like the thredds data viewer will treat the file as random access, and so may skip around in the file. if all you do is read the file sequentially, HTTP is ok. but for random access it can be really slow. Opendap is much better in this case.


That's exactly the goal - we want to use Opendap to give data to the various software clients that will use the data, in order to gain the many advantages offered. However, we have to get the files IN to thredds somehow.

yes, but the thredds AS server is a client of your HTTP server. When the AS server gets a request, it skips around the HTTP file to read it. So it depends what request the AS server gets, as to what its access pattern is.

try giving it very simple requests that are contiguous in the file and of known, reasonable size. those should work ok. then increase the size/complexity of your request and see where it degrades.

Is there any way you can give it access to the file directly, like through an NFS mount?



Is there a "magic number" in thredds which is a best window size to use? Would it "prefer" to get its data in any particular way? Thredds is basically the only client for this servlet, so I will just tune it for best performance.



what do you mean by "window size" ?


When I'm serving data from my servlet, I create an 8k buffer which reads data from disk, then is flushed to the output stream.

               // Sent in 8 byte chunks
byte[] dataBuf = new byte[8192]; //we'll read 8K chunks in.seek(0L);
               in.skipBytes(firstByte);
               int length = 0;
                             long bytecount = 0;
while(in != null && (length = in.read(dataBuf,0,dataBuf.length)) != -1) { if(debug) servletContext.log("doGet() valid: serving bytes " + bytecount + " to " + length);
                   bytecount = bytecount + length;
                   out.write(dataBuf, 0, length);
               }

I have attached the full code for your interest.

you will see better performance as you increase this buffer size, at the cost of needing more heap space.

you should also tune the buffer size in HTTPRandomAccessFile, probably matching the sizes would be best.


Cheers,
-Tennessee

------------------------------------------------------------------------

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;

public class Marslet extends HttpServlet
{

boolean debug=true; /**
    * This returns the header only for the equivalent GET request. It may be
    * used by some clients to establish file-sizes or otherwise make use
    * of summary information before performing a GET request. The HEAD
    * response may not include a message-body.
    */
protected void doHead(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
   {
HttpSession session = request.getSession();
       ServletContext servletContext = getServletContext();
       ServletOutputStream out = response.getOutputStream();
       File ncFile = doMarsQuery(request);
// Abort if file cannot be found
       if(ncFile == null) {
           servletContext.log("Null netCDF file - cannot continue");
           out.println("Error in processing - database request failed. Please 
contact the administrator address@hidden");
           return;
       }
String filename = ncFile.getPath();
       RandomAccessFile in = null;
       String contentType = "application/x-netcdf";
servletContext.log("Found file, processing HEAD request"); try {

           long filesize = ncFile.length();
           if(debug) servletContext.log("doHead(): filesize is "+ filesize);
String rangeHeader = request.getHeader("range"); // Behave differently if request is bad for some reason
           if(!isRangeHeaderValid(rangeHeader, filesize)) {
// Bad numbers in range, log error
               if(rangeHeader != null && rangeHeader != "") {
servletContext.log("*** Invalid byte range header, sending entire file: " + rangeHeader); } servletContext.log("Sending entire file"); //headEntireFile(response, out, ncFile, contentType, servletContext); FileInputStream inStream = null; inStream = new FileInputStream(ncFile);
               int length = (int)ncFile.length();
response.setStatus(response.SC_PARTIAL_CONTENT); response.setHeader("Accept-Ranges", "bytes");
               response.setContentLength(length);
               out.println("Content-Length: " + length);
               response.setContentType(contentType);
out.flush();
               if(debug) servletContext.log("doHead() invalid: " + response.SC_PARTIAL_CONTENT + 
"Accept-Ranges: bytes" + "Content-Length: " + length + "Content-Type" + contentType);
} // If a valid byte range has been requested
           else {
               in = new RandomAccessFile(ncFile, "r");
// byte range variables
               int firstByte = 0;
               int lastByte = 0;
               int nBytes = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
               String range = "";
range = st.nextToken();
               range = range.trim();
// Determine firstByte
               if(range.indexOf('-') == 0) { firstByte = -1; }
               else { firstByte = (new Integer(range.substring(0, 
range.indexOf('-')))).intValue(); }
// Determine lastByte
               if(range.indexOf('-') == range.length() - 1) { lastByte = -1; }
               else { lastByte = (new 
Integer(range.substring(range.indexOf('-') + 1))).intValue(); }
// If firstByte < 0, then client wants from there to EOF if( firstByte < 0 ) {
                   firstByte = (int) filesize - lastByte;
                   lastByte = (int) filesize - 1;
               }
// If last byte < 0, set to EOF
               if(lastByte < 0) { lastByte = (int) filesize - 1; }
               nBytes = (lastByte = firstByte) + 1;
////////////////////////////////////////
               // Send the headers, do not write data
               ////////////////////////////////////////
String contentRange = "bytes " + firstByte + "-" + lastByte + "/" + filesize; response.setStatus(response.SC_PARTIAL_CONTENT);
               response.setHeader("Accept-Ranges", "bytes");
               //response.setContentLength(nBytes);
               out.println("Content-Length: " + nBytes);
               response.setContentType(contentType);
               response.setHeader("Content-Range", ""+contentRange);
if(debug) servletContext.log("doHead() valid: " + response.SC_PARTIAL_CONTENT + "Accept-Ranges: bytes" + "Content-Length" + nBytes + "Content-Type" + contentType + "Content-Range: " + contentRange); } out.flush();
           out.close();
       }
catch(Exception e) { servletContext.log("Catching: " + e.toString(), e); e.printStackTrace();
       }
finally { try { if(in != null) { in.close(); } }
           catch( Exception e2) { servletContext.log( "Finally: " + 
e2.toString(), e2); }
       }
} protected void doGet(HttpServletRequest request, HttpServletResponse response)
   throws ServletException, IOException
   {
HttpSession session = request.getSession();
       ServletContext servletContext = getServletContext(); // For logging
       ServletOutputStream out = response.getOutputStream();
       File ncFile = doMarsQuery(request);
if(ncFile == null) {
           servletContext.log("NULL netCDF file - cannot continue");
           out.println("Error in processing - database request failed. Please 
contact the administrator address@hidden");
return; }
       String filename = ncFile.getPath();
       RandomAccessFile in = null;
       String contentType = "application/x-netcdf";

       servletContext.log("doGet(): Found file, processing GET request");
try { Enumeration e = request.getHeaderNames();
           while(e.hasMoreElements()) {
               String headerName = (String) e.nextElement();
Enumeration e2 = request.getHeaders(headerName);
               String requestStr = "";
               while(e2.hasMoreElements()) {
                   String headerValue = (String) e2.nextElement();
                   requestStr = requestStr + "Request> " + headerName + ": " + 
headerValue + "\n";
               }
               servletContext.log(requestStr);
           }


           long filesize = ncFile.length();
String rangeHeader = request.getHeader("range"); //If a bad byte-range has been requested
           if(!isRangeHeaderValid(rangeHeader, filesize)) {
// Bad numbers in range, log error
               if(rangeHeader != null && rangeHeader != "") {
                   servletContext.log("*** Invalid byte range header, sending entire 
file: " + rangeHeader);
               }
if(debug) servletContext.log("doGet() invalid: Sending entire file");
               sendEntireFile(response, out, ncFile, contentType, 
servletContext);
           }
// If a valid byte range has been requested
           else {

               in = new RandomAccessFile(ncFile, "r");
// byte range variables
               int firstByte = 0;
               int lastByte = 0;
               int nBytes = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
               String range = "";
///////////////////////////////////////////
               // Note - the original code snippet I found handled multiple
// byte-range requests, but this isn't actually correct. // When handling byte-range requests, you must not have multipart
               // responses, but simply serve the data requested. (See HTTP 1.1 
Specification)
               // Presumably this means you can only have one byte-range 
request in the
// request header, thus this code retrieves only the first token for // byte-range
               ///////////////////////////////////////////
range = st.nextToken();
               range = range.trim();
// Determine firstByte if(range.indexOf('-') == 0) { firstByte = -1; } // range format is "-lastbyte" else { firstByte = (new Integer(range.substring(0, range.indexOf('-')))).intValue(); } // Determine lastByte if(range.indexOf('-') == range.length() - 1) { lastByte = -1; }//range format is "firstbyte-" else { lastByte = (new Integer(range.substring(range.indexOf('-') + 1))).intValue(); } // If first or last byte < 0, then client wants from there to EOF
               if(firstByte < 0) {
                   firstByte = (int) filesize - lastByte;
                   lastByte = (int) filesize - 1;
               }
// If last byte is < 0, set to EOF
               if(lastByte < 0) { lastByte = (int) filesize - 1; }
               nBytes = (lastByte - firstByte) + 1;
//////////////////////////////////////////// // Send the headers and start writing data //////////////////////////////////////////// String contentRange = "bytes "+ firstByte + "-" + lastByte + "/" + filesize; response.setStatus(response.SC_PARTIAL_CONTENT);
               response.setHeader("Accept-ranges", "bytes");
               response.setContentType(contentType);
               response.setHeader("Content-Range", ""+contentRange);
               response.setContentLength(nBytes);
String responseStr = "";
               responseStr = responseStr + "Response> Status: " + 
response.SC_PARTIAL_CONTENT + "\n";
               responseStr = responseStr + "Response> Accept-ranges: bytes\n";
               responseStr = responseStr + "Response> Content-Type: " + contentType + 
"\n";
               responseStr = responseStr + "Response> Content-Range: " + contentRange + 
"\n";
               responseStr = responseStr + "Response> Content-length: " +nBytes + 
"\n";
if(debug) servletContext.log("doGet(): valid \n" + responseStr); //out.println("Content-Length: " + nBytes);

               // How it used to be - suspect of causing a buffer overflow
//byte[] dataBuf = new byte[8192]; //we'll read 8K chunks //byte[] dataBuf = new byte[nBytes + 1];
               //in.seek(0L);
               //in.skipBytes(firstByte);
               //int length = in.read(dataBuf, 0, nBytes);
               //out.write(dataBuf, 0, length);

               // Sent in 8 byte chunks
byte[] dataBuf = new byte[8192]; //we'll read 8K chunks in.seek(0L);
               in.skipBytes(firstByte);
               int length = 0;
long bytecount = 0;
               while(in != null && (length = in.read(dataBuf,0,dataBuf.length)) 
!= -1) {
                   if(debug) servletContext.log("doGet() valid: serving bytes " + 
bytecount + " to " + length);
                   bytecount = bytecount + length;
                   out.write(dataBuf, 0, length);
} out.flush(); } out.close(); } // Ignore some client-caused exceptions
       catch(java.io.IOException ioe) {

           // Ignore "Connection reset by peer" exceptions which can be cause by
           // a number of reasons attributable to the client. They are generally
           // harmless and out of our control. Log others.
//if(ioe.toString().compareToIgnoreCase("java.io.IOException: Connection reset by peer") != 0) {
               servletContext.log(ioe.toString(), ioe);
           //}
       }
// Log generic exceptions
       catch(Exception e) {
servletContext.log(e.toString(), e); } // Try to close the file if it's still open
       finally {

           try {
if(in != null) { in.close(); } }
           catch(Exception e2) {
servletContext.log(e2.toString()); } } } /**
    * Interpret the GET variables into a mars request and execute
    */
private File doMarsQuery(HttpServletRequest request) { try {
           return new File("/data/laps-levels-large.nc");
       }
       catch(Exception e) {
           getServletContext().log(e.getMessage());
           e.printStackTrace();
       }
return null;
   }
private File getFromCache(String requestString) {
       String tmpDirName = "/nm/scratch/marslet/";
File tmpDir = new File(tmpDirName); int hash = requestString.hashCode();
       File ncFile = new File(tmpDir, "marslet" + requestString.hashCode() + 
".nc");
       if(ncFile.exists()) { return ncFile; } else { return null; }
   }
private void addToCache(File ncFile) { // Do nothing - no accounting just yet } /**
    * Send the entire file to the client
    * <p>
    * @param HttpServletResponse the response object
    * @param HttpServletRequest the request object
    * @param File the file we're sending
    * @param String Content-type: header value
    */
   private void sendEntireFile(HttpServletResponse response,
                               ServletOutputStream out,
                               File file,
                               String contentType,
                               ServletContext servletContext
                               )
   throws IOException, Exception
   {

       FileInputStream inStream = null;
try {
           inStream = new FileInputStream(file);
           int length = 0;
           response.setHeader("Accept-Ranges", "bytes");
           response.setContentType(contentType);
           response.setContentLength((int)file.length());
           //out.println("Content-Length: " + (int)file.length());
byte[] buf = new byte[8192]; //we'll read 8K chunks
           while(inStream != null && (length = inStream.read(buf,0,buf.length)) 
!= -1) {
out.write(buf, 0, length); }
       }
       catch(IOException ioe) {
throw ioe; }
       catch(Exception e) {
           throw e;
       }
       finally {
           if(inStream != null) {
inStream.close(); }
       }
} /**
    * Validate the byte range request header
    * The following byte range request header formats are supported:
    * <ul>
    *   <li> firstbyte-lastbyte (request for explicit range)
    *   <li> firstbyte- (request for 'firstbyte' byte to EOF)
    *   <li> -lastbyte (request for 'lastbyte' byte to EOF)
    * </ul>
    * @param String the byte range header
    * @return boolean true=valid header, false=invalid header
    */
private boolean isRangeHeaderValid(String rangeHeader, long filesize) { if(rangeHeader == null || rangeHeader.equals("")) { return false; } String range = "";
       int firstbyte = 0;
       int lastbyte = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
       while(st.hasMoreTokens()) {
           range = st.nextToken();
           range = range.trim();
int index = range.indexOf('-'); if( index == -1 ) { return false; } //Illegal: must contain a '-'
           if(range.length() <= 1) { return false; } //Illegal = musthave more 
than '-'
//Case -lastbyte
           if(index == 0) {
               lastbyte = (new Integer(range.substring(range.indexOf('-') +1 
))).intValue();
               if(lastbyte > filesize) { return false; }
               else continue;
           }
//Case firstbyte-
           if(index == range.length() -1) {
               firstbyte = (new 
Integer(range.substring(0,range.indexOf('-')))).intValue();
               if( firstbyte > filesize) { return false; }
               else { continue; }
           }
//Case firstbyte=lastbyte
           if(index != 0 && index != range.length() -1) {
               firstbyte = (new 
Integer(range.substring(0,range.indexOf('-')))).intValue();
               lastbyte = (new Integer(range.substring(range.indexOf('-') + 
1))).intValue();
if(firstbyte > lastbyte) { return false; }
               else { continue; }
           }
       }
return true;
   }
/**
    * Break the range header into token. Each token represents a single 
requested range.
    *
    * The most common tange header formate is ...
    *
    * range = firstbyte-lastbyte,firstbyte-lastbye
    *
    * ... with one ofr more firstbyte-lastbyte values, all comma separated
    *
    * @param String the byte range header
* @return StringTokenizer the tokenized string */ private StringTokenizer getRangeAsTokens(String rangeHeader)
   {
       String ranges = rangeHeader.substring(rangeHeader.indexOf('=') + 1, 
rangeHeader.length());
       return new StringTokenizer(ranges, ",");
   }
/**
    * Handle HTTP post requests
    */
public void doPost(HttpServletRequest request, HttpServletResponse response)
    throws ServletException, IOException
    {
        return;
    }
}