[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

portability of pathnames



    Date: Sun, 22 Jun 1986  20:30 EDT
    From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
    To:   SANDRA <LOOSEMORE at UTAH-20.ARPA>
    Cc:   common-lisp@SU-AI.ARPA

    I'm going through old mail to make a list of issues we need to settle,
    or at least work on.  I came across your complaint about portability of
    pathnames and the problems with make-pathname.  I was just wondering if
    you had anything specific to propose.  If not, I'll just add this to the
    agenda of things we collectively need to think about, but I'm not sure
    there's a good solution to be had.

By the way, in my work with Macsyma I've seen most of the same problems
as Sandra mentioned in her message that kicked off this line of conversation.
She sounded in that message like she expected to get dumped on, but I
hope that she's neither dumped on nor ignored. Most of those comments were
very to the point.

I do take very minor issue with her remark that the package issue seems 
a small one next to the other issues she cited. My reason for pushing so
hard to get these package issues resolved is that they impede everyone's 
ability to get a foothold in a Lisp they're trying to port to. If we can't
get expressions to read the same in each each Lisp, then we're stripped 
even of the ability to talk about language problems at the level of 
expressions and must too frequently resort to discussions of the meaning
of source text. Also, in practice, implementation-specific workarounds are
something you can get to much more easily once the syntactic barriers are
resolved.

But I don't mean to diminish the importance of these other issues. Pathnames
are a horror to use in CL. Here's a list of the gripes I have with pathnames
which come to mind just off the top of my head; I'm sure if I though harder
I could think of others. Maybe Sandra could add some of her favorites...

 * Canonical case. If you study the Symbolics pathname system, you'll note
   that elaborate pains are taken to make the case of the components
   be stored in uppercase for interchange purposes even if they're
   composed as a namestring in another case. This allows the internal
   representation of the Unix pathname /joe/math.text and the Tops-20
   pathname <JOE>MATH.TEXT to use the same internal notation, with a
   name of "MATH" and type of "TEXT", and allows cross-file-system
   merging to be done correctly. The result of the current system is
   that one must write gross things like:
     (MAKE-PATHNAME :NAME THE-GIVEN-NAME 
		    :TYPE (IF *LOWERCASE-FILENAMES-P* "text" "TEXT"))
   and initialize the *LOWERCASE-FILENAMES-P* variable on the basis of
   implementation-specific information. As Moon points out, the
   Symbolics pathname system does this sort of thing invisibly, and
   people interested in how to fix this should study the documentation.
   It may seem hairy, but a portable file system interface is going to
   necessarily be somewhat hairy just because of the variance of file
   systems. I think given the constraints, it's not gratuitously hairy.

 * What can go in a host slot? CLtL don't say whether a Lisp 
   implementation on host "FOO" is required to treat :HOST "FOO" 
   the same as :HOST NIL or :HOST "" in MAKE-PATHNAME. In fact,
   nothing says whether "FOO:" or "FOO::" might be allowed (depending
   on what the native notation was for hosts was); I definitely don't
   think they should be, but there's nothing I can find protecting me
   from an implementation making this the -only- way to notate a host.

 * What can go in a directory slot? CLtL says it can hold a string,
   but it doesn't say whether the string contains any notational 
   devices. eg, on VMS, is "FOO" ok for a directory or do you want
   "[FOO]". "FOO" would seem the most portable, since it doesn't get
   involved in the fact that TOPS-20 might want "<FOO>" and the LispM 
   might want ">FOO>" but it all doesn't matter much anyway because if 
   you want to talk about subdirs, "FOO.BAR" doesn't completely hide the
   implementation because it works for systems that use the notation
   "<FOO.BAR>" or "[FOO.BAR]" but not that use ">FOO>BAR" or "/FOO/BAR".
   Without this much information, the kinds of operations you can do
   on the contents are unreasonably limited. On the LispM, you say
   :DIRECTORY "JOE" but in VAXLISP you say :DIRECTORY "[JOE]". The
   LispM idea of allowing this to contain a list of directory names,
   as in ("FOO" "BAR") to mean /FOO/BAR or >FOO>BAR> is clearly more
   reasonable and I can't imagine why it was not adopted.

 * Canonical types. The extension which is used for certain standard
   kinds of files varies from implementation to implementation. eg,
   some systems call text files .txt and others .text. Some call 
   lisp files .lsp, others .lisp, and others .clisp. Some call binary
   files .BIN, others .FAS, and so on. It would be nice if we'd 
   adopted the LispM's canonical type system such certain dignified
   file types could be predefined for use with portable programs.
   Thus, (MAKE-PATHNAME :NAME "FOO" :TYPE :LISP) could refer to 
   "FOO.LISP" in some implementations, "foo.l" in others, etc.

 * This business about semi-standard features like :NEWEST and :OLDEST
   is a pain. We need those features, but we should fully enumerate the
   entire set of possible contents and exactly what they denote, even
   if not everyone supports them all. It should be possible to construct
   a program that would be "ready for anything". Perhaps each 
   implementation could keep a list of which keywords were valid for
   that implementation.

 * No way is provided for creating a relative pathname. This would
   be very useful for merging purposes even on systems which don't
   provide a namestring syntax for pathnames. It is especially 
   essential in the absence of a clear specification of what the
   directory slot contains.

 * On issue is that there are so many fields which are allowed to
   contain implementation-dependent gunk as to make those fields
   are effectively write-only.
   
 * Printing pathnames. We provide no convenient way to print a
   pathname. On the LispM you can do (FORMAT T "~A" pathname) but 
   not all implementations support that because CLtL doesn't say
   it should work. Doing (FORMAT T "~A" (NAMESTRING pathname))
   seems dumb since, among other things, it forces gratuitous consing.

 * How do you compare pathnames? EQUAL pathnames are not obliged
   to be EQ. Since pathnames contain all these options for 
   implementation-dependent featurism, the user is not able to 
   write a PATHNAME-EQUAL. As far as I can tell, an implementation
   in which a directory slot of "FOO.BAR" and ("FOO" "BAR") are
   equivalent is not constrained to return T for EQUAL on two 
   pathnames which contain identical things except one uses 
   "FOO.BAR" and the other uses ("FOO" "BAR"). Indeed, even doing
   (EQUAL (NAMESTRING X) (NAMESTRING Y)) isn't good enough because,
   for example, VAX VMS allows logical names like "FOO:[.BAR]X.Y" to
   expand into "DEV1A:[FOO][.BAR]X.Y". I don't care if 
   "FOO:[.BAR]X.Y" is PATHNAME-EQUAL to "DEV1A:[FOO][.BAR]X.Y"
   because that's a semantic issue that may get caught up in how
   the FOO logical device is implemented, but I do care that
   "DEV1A:[FOO][.BAR]X.Y" and "DEV1A:[FOO.BAR]X.Y" are PATHNAME-EQUAL
   because that's just a syntactic issue ... but I see no way of
   writing a portable PATHNAME-EQUAL.

 * I consider it to be a complete bug (and the only one that I've 
   seen which I believe to also be a bug in the Symbolics pathname
   system) that you can't create a non-hosted pathname. eg, in
   the case of someone doing
      (MERGE-PATHNAMES "" "FOO")
   and later planning to do
      (MERGE-PATHNAMES * "JOE::")
   where "::" is the host syntax used by the book, if you force the
   first merge to put a host on, then the second merge won't pick
   up the "JOE" and the wrong thing will happen. This actually came
   up in MACSYMA and I was forced to invent my own pathname system
   which holds a CL pathname in a slot and also holds host-valid-p
   info that it keeps set to NIL after the first merge above (which
   must be done via MY-MERGE-PATHNAMES, not CL's MERGE-PATHNAMES)
   so that MY-MERGE-PATHNAMES can correctly do the second merge.

 * The phrase "in which case no parsing is needed, but an error
   check may be made for matching hosts" at the end of the first
   paragraph of the description of PARSE-NAMESTRING on p414 is
   an invitation to disaster since we don't specify how to obtain
   even the current machine's host name or in what syntax it should
   be presented in order to make this function happy. For example,
   (PARSE-NAMESTRING "FOO.LISP") in VAXLISP might return
   #S(PATHNAME :HOST "PETER" :DEVICE NIL :DIRECTORY NIL 
	       :NAME "FOO" :TYPE "LISP" :VERSION NIL)
   but (PARSE-NAMESTRING "FOO.LISP" "PETER")
   errs telling me that host "" and "PETER" conflict. I might
   report this as a bug and maybe they'd even fix it for me but
   I'd have nothing to fall back on if they disagreed because CLtL
   certainly doesn't come out and claim it's a bug.

 * The description of the pathname system offers no examples of 
   using it any non-trivial way. All the examples use strings as
   arguments, but that's just the problem. In portable applications,
   strings just don't work. Sometimes, you're merging something that
   was typed in by the user but rarely is it being merged with something
   else typed by the user. The other thing is often something your 
   program wanted to have wired in. If you tried to write even the 
   simplest program using the given primitives in an even slightly 
   non-trivial way, the problem would become apparent. eg, try to 
   figure out how to specify the examples on p141 or p415 in a 
   portable way. To put yourself in the right frame of mind, 
   replace "DUMPER" on p414 with "MACSYMA" or "KEE" or "MYCIN" or 
   something that you don't think of as TOPS-20 specific. The 
   example on p415 is hard just as it is, for exactly the reasons 
   of canonical types I've mentioned above. What am I expected to
   write? Top of p423 is the only place CLtL tries to do this, and
   it more or less succeeds in this trivial case ... except for the
   fact MERGE-PATHNAME-DEFAULTS isn't in the index and I suspect 
   never made it into the spec. Any way you cut it, these three 
   examples just don't do enough to illustrate what you can and
   can't do with the given primitives.

I do think pathnames useful only for the most trivial purposes in CL.
I don't think this means we should flush them. I think we should 
seriously study systems, particularly those offered by the LispM 
vendors, where  there's been success in dealing with multiple file 
systems, and then I think we should agree on the additional mechanisms 
necessary to make things really work.