This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
> Organization: NOAA/PMEL > Keywords: 199401260159.AA11032 Hi Steve, > A quick question: I am involved with a committee that is investigating > some options for network-wide (WAN) data sharing. One of the strategies > being discussed is to design an API for accessing the desired classes of > data and then to cast that API as an RPC interface. The netCDF API is > under consideration as a subset of the full API. > > Do you know if anyone has already cast the netCDF API as an RPC interface? > Do you have any thoughts/recommendations on the subject? I don't know if anyone has already done this. We had planned to do it as part of a joint proposal to develop a netCDF server, but the effort never got funded. I think it would be a good idea. I've appended some discussion generated earlier on this topic. --Russ Date: Tue, 1 Dec 92 10:58:33 -0700 From: Russ Rew <address@hidden> Message-Id: <address@hidden> To: address@hidden (Emmanuel Arbogast) In-Reply-To: Emmanuel Arbogast's message of Wed, 25 Nov 92 19:46:08 PST <address@hidden> Subject: netCDF > Organization: StatSci > Keywords: 199211260411.AA18267 Emmanuel, Here's some email disussion we had earlier this year about a netCDF servers. I haven't asked any of the people involved if it was OK with them that I forwarded this discussion to someone else, but I think it should be illuminating for background information on some of the issues. --Russ >From russ Tue Jul 14 08:26:39 1992 To: address@hidden CC: support, davis, fulker, steve, ben In-reply-to: Joe Sirott's message of Mon, 13 Jul 92 14:12:17 -0700 <address@hidden> Subject: netCDF data server Hi Joe, > Have you guys done any work on a netCDF data server? It would be nice > to exchange netCDF data between processes without having to use > files. No, we haven't done any work on a netCDF data server other than discussing the idea with the Unidata Implementation Working Group, who decided it was lower priority than several other development tasks, including implementing the netCDF operators we have specified. You are the second person to bring this up recently, so perhaps it deserves to be reexamined. Here's some excerpts from an email discussion I had with Tomas Johannesson (address@hidden) in June on this subject (my comments are unprefixed): tj> 1) I see that a network server for netCDF is planned (section 1.6 on p. tj> 11 in the netcDF User's Guide). Has the date of the release of such a tj> server been fixed? Has a design draft been distributed? Will it tj> support TCP/IP? The development of a netCDF server has dropped in priority, mostly because the benefits of a server didn't seem to justify the work required. The initial ideas behind a netCDF server were to make reads from scattered parts of a netCDF file more efficient across a network, to support memory-resident netCDF data, and to put a netCDF interface on non-netCDF data. These three uses don't seem to fit together very well. If we do implement any servers (or if someone else contributes one), they would probably use remote procedure calls, which will work over TCP/IP but can also be made transport-independent. tj> Regarding my question about a network server, I had plans to use tj> a possible netCDF server to make "reads from scattered parts tj> of a netCDF file more efficient across a network" as you say and also tj> and perhaps more importantly to make a kind of database of netCDF file tj> available over a network without the need to mount the directories tj> containing the netCDF files over NFS on all computers that want to tj> access the database or manually move each file with ftp (this must tj> be an important and useful feature for other people than me). tj> The latter possibility might allow you to open a file on a remote tj> computer with something like tj> "ncopen("<computer>:/pub/netcdfdata/file1234.cdf",NC_NOWRITE)" tj> where <computer> is a server with a large collection of netCDF files. I hadn't realized it was so important to avoid NFS mounts. I was under the impression that read-only NFS mounts are relatively cheap. We use Sun's "automounter" here, and I understand there is a freely available NFS automounter called "amd" that also runs on other NFS platforms. It seems to me that having the server's netCDF file directory mounted as needed by the automounter daemon on all the computers that want to access the database would be easy, and you can set the automounter timeout to a short enough time that the filesystem gets unmounted quickly when not needed by a client. But there may be other problems with this I'm not aware of, perhaps the client machines are MSDOS computers without multitasking, so they can't run daemons? If you know of common circumstances under which having a netCDF server would provide significant performance advantages over using NFS and an automounter, I'd be interested. tj> Using a TCP/IP server should be more economical as all the tj> seeks/reads/writes are performed on the computer where the data is tj> stored instead of going through the NFS layer, which probably needs a tj> request to be sent between the client computer and the server computer tj> for each and every seek/read/write call. I tested this briefly on my tj> computer and I found a factor on the order of two performance gain. tj> Another point is that you don't need NFS if you have the TCP/IP server tj> (NFS might not be installed on the computer) and you can have very tj> strict control of the file access of the users when you are using a tj> TCP/IP server. I might also mention that most commercial DBMSs are tj> based on client/server networking in order to boost performance by tj> having the reading and writing of data taking place on the machine where tj> the data is physically stored. In that case an important point is that tj> the query processing is done locally which minimized the data volume tj> which is sent over the network. This is probably not important for tj> netCDF. --Russ >From russ Mon Jul 27 10:36:00 1992 To: fulker Subject: [address@hidden: Re: netCDF data server] Dave, I forgot to forward this netCDF server discussion to you, as I had promised. I've concatenated three messages. The first is Glenn's reply to Joe Sirott, followed by Joe's reply to Glenn and finally Joe's reply to me. --Russ Date: Tue, 14 Jul 1992 12:34:37 -0600 From: "Glenn P. Davis" <address@hidden> To: address@hidden, address@hidden Subject: Re: netCDF data server > From: Joe Sirott <address@hidden> > > Some comments on your previous message: > > 1) One advantage a netCDF server would have over NSF mounts is that > many sites are not willing to export a file system to the general public; > this means that the netCDF files are only available via anonymous ftp. This is an important point. Note that NFS is a mature system with many people working on it. It's security is important to the community at large, so it gets tested very thoroughly. It's shortcoming are know and understood by the systems community. It is unlikely that we would come up with a system that has better access control than NFS. Even if we did, it is unlikely that a site that doesn't trust NFS to control access to its stuff is going to trust our stuff. Think about it. > 2) A netCDF server would require the definition of a communications > protocol between two netCDF dependent processes. This would lead to some > interesting results. For instance, a program I developed (Freud) allows > visualizion of data sets, but no analysis; however, if netCDf data could > be exchanged between processes without requiring the creation of files > (via TCP or shared memory, for instance), I could seamlessly send data > from my program to another program (like MATLAB) and then ship it back > after transformation. One program, the data server, is somehow going to have to instanciate a netcdf 'object' that the client can reference (nc_open). If multiple objects are to available, that object must exist in a namespace. There must be functions to find out what is in the namespace. UNIX / NFS has a mature, well understood namespace: the file system. It also has functions and utilities for querying and maniupulating that namespace. Again, we would be hard pressed to come up with something better. It is very easy for two processes to share information via the filesystem, NFS or not. If you want something fancier, like shared memory sort of access to the shared object, you can 'mmap(2)' the file. netCDF will soon support this transparently on systems that support mmap(2). > Another possibility would be shipping live data from models or simulations > across a network to a visualization program that could dynamically view > the model results. People do this now. The reader does ncsync() to get the update. ---- The point is, to do a netcdf server "right", you end up duplicating functionality that is provided by the (network) operating system: namespace, access control, efficient I/O blocking, etc. My opinion is that it is better to let the OS do this. -glenn Date: Tue, 14 Jul 92 14:49:33 -0700 From: Joe Sirott <address@hidden> To: address@hidden Subject: Re: netCDF data server > > 1) One advantage a netCDF server would have over NSF mounts is that > > many sites are not willing to export a file system to the general public; > > this means that the netCDF files are only available via anonymous ftp. > > This is an important point. > Note that NFS is a mature system with many people working on it. > It's security is important to the community at large, so it gets > tested very thoroughly. It's shortcoming are know and understood > by the systems community. > > It is unlikely that we would come up with a system that has better access > control than NFS. Even if we did, it is unlikely that a site that > doesn't trust NFS to control access to its stuff is going to trust our > stuff. Think about it. > That's not true, for a couple of reasons. First, my understanding (I'm not a networking guru) is that NFS relies on RPC authentication procedures for security. That means that any application that uses the highest levels of RPC security will be as secure as NFS. Also, obviously the NFS server has to be able to write a file to a filesystem. This means that there is the possibility of tricking the server by intercepting a packet destined for the server and changing the request mode to the server. A netCDF server would be read-only -- even if someone played around with a request to the server, it couldn't force a write to the filesystem. Finally, the NFS daemon has to run as root, so that it can set ownership, etc. on files. A netCDF server would not have to be run as root, so the damage it could do could be limited. Now, convincing users that it`s secure might be a different matter. > > 2) A netCDF server would require the definition of a communications protocol > > between two netCDF dependent processes. This would lead to some interesting > > results. For instance, > > a program I developed (Freud) allows visualizion of data sets, but no > > analysis; however, if netCDf data could be exchanged between processes > > without > > requiring the creation of files (via TCP or shared memory, for instance), > > I could seamlessly send data from my program to another program (like > > MATLAB) > > and then ship it back after transformation. > > One program, the data server, is somehow going to have to instanciate a > netcdf 'object' that the client can reference (nc_open). If multiple > objects are to available, that object must exist in a namespace. > There must be functions to find out what is in the namespace. > UNIX / NFS has a mature, well understood namespace: the file system. > It also has functions and utilities for querying and maniupulating > that namespace. Again, we would be hard pressed to come up with something > better. > I'm not sure what is so difficult about defining the namespace you refer to. For netcdf objects that are stored as files, you can still use the Unix name space (for a given machine); for netcdf objects from processes, the namespace could be defined, for instance, using a string with the process id and the variable name. A process would register with the server which could demultiplex the data to as many clients as requested data. > It is very easy for two processes to share information via the filesystem, > NFS or not. If you want something fancier, like shared memory sort of > access to the shared object, you can 'mmap(2)' the file. > netCDF will soon support this transparently on systems that support > mmap(2). > How do processes handshake when sharing a file in this way? I suppose you could use file locking, but why not do it right? How do processes on multiple machines communicate if they don't have NFS? All machines that are connected to a network have to support SOME kind of transport protocol, but they don't necessarily have NFS. > > Another possibility would be shipping live data from models or simulations > > across a network to a visualization program that could dynamically view > > the model results. > > People do this now. The reader does ncsync() to get the update. > > ---- > > The point is, to do a netcdf server "right", you end up duplicating > functionality that is provided by the (network) operating system: namespace, > access control, efficient I/O blocking, etc. My opinion is that it is better > to let the OS do this. > RPC calls don't require rewriting the operating system; they're built on the TPC/IP transport and network layers of the OSI network model. Namespace (when connecting to another process) is specified as the 5-tuple { protocol, server address, server process, client address, client port}, such as a file is defined by the pathname of the file. Access is controlled by RPC authentication methods. You can use standard I/O routines with network communication. The OS does this already. Sure, you can communicate between processes with files using NFS. You can also communicate between process without NFS (process 1 writes file. process 2 spawns ftp process, and blocks until complete. Ftp process grabs file. Process 2 gets ftp output). In fact, you can communicate between different processes using tape, if you want. That doesn't make it a good way to do it. > -glenn > Joe S. Date: Tue, 14 Jul 92 14:57:53 -0700 From: Joe Sirott <address@hidden> To: address@hidden Subject: Re: netCDF data server ... > You're right, and we shouldn't imply that user votes determine our > development priorities. I think the issues of the usefulness of a > netCDF server still deserve more discussion (i.e., I don't know whether > you or Glenn is right). We're discussing ways of getting more > resources for netCDF development that would make it possible for us to > take on such projects. > > --Russ I'm not arguing that a server should be your top priority. I just would like it to be A priority, depending on your resources. I'm looking at the possibility of creating one myself, also (networking work is GREAT to have on your resume ;-)). Cheers. Joe S.