[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thredds out of memory
- Subject: Re: Thredds out of memory
- Date: Thu, 31 Mar 2005 15:11:08 +1000
Not quite. I have something called a MARS database. This is an
object-oriented database, whose only access is via a compiled C program,
and which does not support any kind of network request.
I wrote a Java servlet which parses the URL, extracts query information,
runs the query against the database, converts the resulting GRIB file to
NetCDF, and serves that back to the user. In this case, Thredds.
This, to the client, is (should be) invisible compared with requesting a
NetCDF file via HTTP from any source, such as a web server like apache.
Thredds is capable of sourcing its data from HTTP sources as opposed to
files on the local disk. My configuration file looks a little something
like this :
<!DOCTYPE catalog SYSTEM
"http://www.unidata.ucar.edu/projects/THREDDS/xml/AggServerCatalog.dtd">
<catalog name="THREDDS - DODS Aggregation Server Catalog" version="0.6"
xmlns="http://www.unidata.ucar.edu/thredds"
xmlns:xlink="http://www.w3.org/1999/xlink">
<dataset name="Top-Level Dataset" dataType="Grid" serviceName="this">
<service name="this" serviceType="DODS" base=""/>
<service name="apache" serviceType="NetCDF"
base="http://kahless.ho.bom.gov.au/"/>
<service name="marslet" serviceType="NetCDF"
base="http://kahless.ho.bom.gov.au:8080/marslet/"/>
<dataset name="Large Internal Marslet" serviceName="this">
<property name="internalService" value="marslet"/>
<dataset name="Surface Data" urlPath="verylarge.nc"/>
</dataset>
<dataset name="Large Internal Apache" serviceName="this">
<property name="internalService" value="apache"/>
<dataset name="Surface Data" urlPath="laps-levels-large.nc"/>
</dataset>
</dataset>
</catalog>
For small files, this actually works. For larger files, the interaction
with the servlet breaks somehow, however the file sources from apache
works okay.
I don't understand why this is the case. Software such as wget, firefox
etc is happily able to download the file, resume partial downloads etc.
Thredds is happily able to get smaller files. I fail to see why file
size is affecting the system so badly.
generally a netcdf client like the thredds data viewer will treat the
file as random access, and so may skip around in the file. if all you
do is read the file sequentially, HTTP is ok. but for random access it
can be really slow. Opendap is much better in this case.
That's exactly the goal - we want to use Opendap to give data to the
various software clients that will use the data, in order to gain the
many advantages offered. However, we have to get the files IN to thredds
somehow.
Is there a "magic number" in thredds which is a best window size to
use? Would it "prefer" to get its data in any particular way? Thredds
is basically the only client for this servlet, so I will just tune it
for best performance.
what do you mean by "window size" ?
When I'm serving data from my servlet, I create an 8k buffer which reads
data from disk, then is flushed to the output stream.
// Sent in 8 byte chunks
byte[] dataBuf = new byte[8192]; //we'll read 8K
chunks
in.seek(0L);
in.skipBytes(firstByte);
int length = 0;
long bytecount = 0;
while(in != null && (length =
in.read(dataBuf,0,dataBuf.length)) != -1) {
if(debug) servletContext.log("doGet() valid: serving
bytes " + bytecount + " to " + length);
bytecount = bytecount + length;
out.write(dataBuf, 0, length);
}
I have attached the full code for your interest.
Cheers,
-Tennessee
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;
public class Marslet extends HttpServlet
{
boolean debug=true;
/**
* This returns the header only for the equivalent GET request. It may be
* used by some clients to establish file-sizes or otherwise make use
* of summary information before performing a GET request. The HEAD
* response may not include a message-body.
*/
protected void doHead(HttpServletRequest request, HttpServletResponse
response)
throws ServletException, IOException
{
HttpSession session = request.getSession();
ServletContext servletContext = getServletContext();
ServletOutputStream out = response.getOutputStream();
File ncFile = doMarsQuery(request);
// Abort if file cannot be found
if(ncFile == null) {
servletContext.log("Null netCDF file - cannot continue");
out.println("Error in processing - database request failed. Please
contact the administrator address@hidden");
return;
}
String filename = ncFile.getPath();
RandomAccessFile in = null;
String contentType = "application/x-netcdf";
servletContext.log("Found file, processing HEAD request");
try {
long filesize = ncFile.length();
if(debug) servletContext.log("doHead(): filesize is "+ filesize);
String rangeHeader = request.getHeader("range");
// Behave differently if request is bad for some reason
if(!isRangeHeaderValid(rangeHeader, filesize)) {
// Bad numbers in range, log error
if(rangeHeader != null && rangeHeader != "") {
servletContext.log("*** Invalid byte range header, sending
entire file: " + rangeHeader);
}
servletContext.log("Sending entire file");
//headEntireFile(response, out, ncFile, contentType,
servletContext);
FileInputStream inStream = null;
inStream = new FileInputStream(ncFile);
int length = (int)ncFile.length();
response.setStatus(response.SC_PARTIAL_CONTENT);
response.setHeader("Accept-Ranges", "bytes");
response.setContentLength(length);
out.println("Content-Length: " + length);
response.setContentType(contentType);
out.flush();
if(debug) servletContext.log("doHead() invalid: " +
response.SC_PARTIAL_CONTENT + "Accept-Ranges: bytes" + "Content-Length: " +
length + "Content-Type" + contentType);
}
// If a valid byte range has been requested
else {
in = new RandomAccessFile(ncFile, "r");
// byte range variables
int firstByte = 0;
int lastByte = 0;
int nBytes = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
String range = "";
range = st.nextToken();
range = range.trim();
// Determine firstByte
if(range.indexOf('-') == 0) { firstByte = -1; }
else { firstByte = (new Integer(range.substring(0,
range.indexOf('-')))).intValue(); }
// Determine lastByte
if(range.indexOf('-') == range.length() - 1) { lastByte = -1; }
else { lastByte = (new
Integer(range.substring(range.indexOf('-') + 1))).intValue(); }
// If firstByte < 0, then client wants from there to EOF
if( firstByte < 0 ) {
firstByte = (int) filesize - lastByte;
lastByte = (int) filesize - 1;
}
// If last byte < 0, set to EOF
if(lastByte < 0) { lastByte = (int) filesize - 1; }
nBytes = (lastByte = firstByte) + 1;
////////////////////////////////////////
// Send the headers, do not write data
////////////////////////////////////////
String contentRange = "bytes " + firstByte + "-" + lastByte +
"/" + filesize;
response.setStatus(response.SC_PARTIAL_CONTENT);
response.setHeader("Accept-Ranges", "bytes");
//response.setContentLength(nBytes);
out.println("Content-Length: " + nBytes);
response.setContentType(contentType);
response.setHeader("Content-Range", ""+contentRange);
if(debug) servletContext.log("doHead() valid: " +
response.SC_PARTIAL_CONTENT + "Accept-Ranges: bytes" + "Content-Length" +
nBytes + "Content-Type" + contentType + "Content-Range: " + contentRange);
}
out.flush();
out.close();
}
catch(Exception e) {
servletContext.log("Catching: " + e.toString(), e);
e.printStackTrace();
}
finally {
try { if(in != null) { in.close(); } }
catch( Exception e2) { servletContext.log( "Finally: " +
e2.toString(), e2); }
}
}
protected void doGet(HttpServletRequest request, HttpServletResponse
response)
throws ServletException, IOException
{
HttpSession session = request.getSession();
ServletContext servletContext = getServletContext(); // For logging
ServletOutputStream out = response.getOutputStream();
File ncFile = doMarsQuery(request);
if(ncFile == null) {
servletContext.log("NULL netCDF file - cannot continue");
out.println("Error in processing - database request failed. Please
contact the administrator address@hidden");
return;
}
String filename = ncFile.getPath();
RandomAccessFile in = null;
String contentType = "application/x-netcdf";
servletContext.log("doGet(): Found file, processing GET request");
try {
Enumeration e = request.getHeaderNames();
while(e.hasMoreElements()) {
String headerName = (String) e.nextElement();
Enumeration e2 = request.getHeaders(headerName);
String requestStr = "";
while(e2.hasMoreElements()) {
String headerValue = (String) e2.nextElement();
requestStr = requestStr + "Request> " + headerName + ": " +
headerValue + "\n";
}
servletContext.log(requestStr);
}
long filesize = ncFile.length();
String rangeHeader = request.getHeader("range");
//If a bad byte-range has been requested
if(!isRangeHeaderValid(rangeHeader, filesize)) {
// Bad numbers in range, log error
if(rangeHeader != null && rangeHeader != "") {
servletContext.log("*** Invalid byte range header, sending
entire file: " + rangeHeader);
}
if(debug) servletContext.log("doGet() invalid: Sending entire
file");
sendEntireFile(response, out, ncFile, contentType,
servletContext);
}
// If a valid byte range has been requested
else {
in = new RandomAccessFile(ncFile, "r");
// byte range variables
int firstByte = 0;
int lastByte = 0;
int nBytes = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
String range = "";
///////////////////////////////////////////
// Note - the original code snippet I found handled multiple
// byte-range requests, but this isn't actually correct.
// When handling byte-range requests, you must not have
multipart
// responses, but simply serve the data requested. (See HTTP
1.1 Specification)
// Presumably this means you can only have one byte-range
request in the
// request header, thus this code retrieves only the first
token for
// byte-range
///////////////////////////////////////////
range = st.nextToken();
range = range.trim();
// Determine firstByte
if(range.indexOf('-') == 0)
{ firstByte = -1; } // range format is "-lastbyte"
else
{ firstByte = (new Integer(range.substring(0,
range.indexOf('-')))).intValue(); }
// Determine lastByte
if(range.indexOf('-') == range.length() - 1)
{ lastByte = -1; }//range format is "firstbyte-"
else
{ lastByte = (new
Integer(range.substring(range.indexOf('-') + 1))).intValue(); }
// If first or last byte < 0, then client wants from there to
EOF
if(firstByte < 0) {
firstByte = (int) filesize - lastByte;
lastByte = (int) filesize - 1;
}
// If last byte is < 0, set to EOF
if(lastByte < 0) { lastByte = (int) filesize - 1; }
nBytes = (lastByte - firstByte) + 1;
////////////////////////////////////////////
// Send the headers and start writing data
////////////////////////////////////////////
String contentRange = "bytes "+ firstByte + "-" + lastByte +
"/" + filesize;
response.setStatus(response.SC_PARTIAL_CONTENT);
response.setHeader("Accept-ranges", "bytes");
response.setContentType(contentType);
response.setHeader("Content-Range", ""+contentRange);
response.setContentLength(nBytes);
String responseStr = "";
responseStr = responseStr + "Response> Status: " +
response.SC_PARTIAL_CONTENT + "\n";
responseStr = responseStr + "Response> Accept-ranges: bytes\n";
responseStr = responseStr + "Response> Content-Type: " +
contentType + "\n";
responseStr = responseStr + "Response> Content-Range: " +
contentRange + "\n";
responseStr = responseStr + "Response> Content-length: "
+nBytes + "\n";
if(debug) servletContext.log("doGet(): valid \n" + responseStr);
//out.println("Content-Length: " + nBytes);
// How it used to be - suspect of causing a buffer overflow
//byte[] dataBuf = new byte[8192]; //we'll read 8K chunks
//byte[] dataBuf = new byte[nBytes + 1];
//in.seek(0L);
//in.skipBytes(firstByte);
//int length = in.read(dataBuf, 0, nBytes);
//out.write(dataBuf, 0, length);
// Sent in 8 byte chunks
byte[] dataBuf = new byte[8192]; //we'll read 8K chunks
in.seek(0L);
in.skipBytes(firstByte);
int length = 0;
long bytecount = 0;
while(in != null && (length =
in.read(dataBuf,0,dataBuf.length)) != -1) {
if(debug) servletContext.log("doGet() valid: serving bytes
" + bytecount + " to " + length);
bytecount = bytecount + length;
out.write(dataBuf, 0, length);
}
out.flush();
}
out.close();
}
// Ignore some client-caused exceptions
catch(java.io.IOException ioe) {
// Ignore "Connection reset by peer" exceptions which can be cause
by
// a number of reasons attributable to the client. They are
generally
// harmless and out of our control. Log others.
//if(ioe.toString().compareToIgnoreCase("java.io.IOException:
Connection reset by peer") != 0) {
servletContext.log(ioe.toString(), ioe);
//}
}
// Log generic exceptions
catch(Exception e) {
servletContext.log(e.toString(), e);
}
// Try to close the file if it's still open
finally {
try {
if(in != null) { in.close(); }
}
catch(Exception e2) {
servletContext.log(e2.toString());
}
}
}
/**
* Interpret the GET variables into a mars request and execute
*/
private File doMarsQuery(HttpServletRequest request) {
try
{
return new File("/data/laps-levels-large.nc");
}
catch(Exception e) {
getServletContext().log(e.getMessage());
e.printStackTrace();
}
return null;
}
private File getFromCache(String requestString) {
String tmpDirName = "/nm/scratch/marslet/";
File tmpDir = new File(tmpDirName);
int hash = requestString.hashCode();
File ncFile = new File(tmpDir, "marslet" + requestString.hashCode() +
".nc");
if(ncFile.exists()) { return ncFile; } else { return null; }
}
private void addToCache(File ncFile) {
// Do nothing - no accounting just yet
}
/**
* Send the entire file to the client
* <p>
* @param HttpServletResponse the response object
* @param HttpServletRequest the request object
* @param File the file we're sending
* @param String Content-type: header value
*/
private void sendEntireFile(HttpServletResponse response,
ServletOutputStream out,
File file,
String contentType,
ServletContext servletContext
)
throws IOException, Exception
{
FileInputStream inStream = null;
try {
inStream = new FileInputStream(file);
int length = 0;
response.setHeader("Accept-Ranges", "bytes");
response.setContentType(contentType);
response.setContentLength((int)file.length());
//out.println("Content-Length: " + (int)file.length());
byte[] buf = new byte[8192]; //we'll read 8K chunks
while(inStream != null && (length =
inStream.read(buf,0,buf.length)) != -1) {
out.write(buf, 0, length);
}
}
catch(IOException ioe) {
throw ioe;
}
catch(Exception e) {
throw e;
}
finally {
if(inStream != null) {
inStream.close();
}
}
}
/**
* Validate the byte range request header
* The following byte range request header formats are supported:
* <ul>
* <li> firstbyte-lastbyte (request for explicit range)
* <li> firstbyte- (request for 'firstbyte' byte to EOF)
* <li> -lastbyte (request for 'lastbyte' byte to EOF)
* </ul>
* @param String the byte range header
* @return boolean true=valid header, false=invalid header
*/
private boolean isRangeHeaderValid(String rangeHeader, long filesize)
{
if(rangeHeader == null || rangeHeader.equals("")) { return false; }
String range = "";
int firstbyte = 0;
int lastbyte = 0;
StringTokenizer st = getRangeAsTokens(rangeHeader);
while(st.hasMoreTokens()) {
range = st.nextToken();
range = range.trim();
int index = range.indexOf('-');
if( index == -1 ) { return false; } //Illegal: must contain a '-'
if(range.length() <= 1) { return false; } //Illegal = musthave more
than '-'
//Case -lastbyte
if(index == 0) {
lastbyte = (new Integer(range.substring(range.indexOf('-') +1
))).intValue();
if(lastbyte > filesize) { return false; }
else continue;
}
//Case firstbyte-
if(index == range.length() -1) {
firstbyte = (new
Integer(range.substring(0,range.indexOf('-')))).intValue();
if( firstbyte > filesize) { return false; }
else { continue; }
}
//Case firstbyte=lastbyte
if(index != 0 && index != range.length() -1) {
firstbyte = (new
Integer(range.substring(0,range.indexOf('-')))).intValue();
lastbyte = (new Integer(range.substring(range.indexOf('-') +
1))).intValue();
if(firstbyte > lastbyte) { return false; }
else { continue; }
}
}
return true;
}
/**
* Break the range header into token. Each token represents a single
requested range.
*
* The most common tange header formate is ...
*
* range = firstbyte-lastbyte,firstbyte-lastbye
*
* ... with one ofr more firstbyte-lastbyte values, all comma separated
*
* @param String the byte range header
* @return StringTokenizer the tokenized string
*/
private StringTokenizer getRangeAsTokens(String rangeHeader)
{
String ranges = rangeHeader.substring(rangeHeader.indexOf('=') + 1,
rangeHeader.length());
return new StringTokenizer(ranges, ",");
}
/**
* Handle HTTP post requests
*/
public void doPost(HttpServletRequest request, HttpServletResponse
response)
throws ServletException, IOException
{
return;
}
}