Uploading GED2HTML Output to your Web Provider

Once you have run GED2HTML to create the HTML output files and you have checked these files by viewing them with your HTML browser, you will probably then want to upload these files to a Web provider, so that they can be served up on demand to other people on the Internet. Exactly how this is achieved is not something that I can give detailed instructions about, because it depends very strongly on the details of the service you receive from your Web provider. However, there is a bit of general strategy that always applies, and this is what I discuss below.

The output of GED2HTML on a typical GEDCOM can be a very large number (thousands) of files and directories. Uploading each of these files individually to a Web provider would be an incredibly time-consuming and tedious task that is to be avoided at all costs. Similarly, performing any kind of individual editing on every file output by GED2HTML is also generally not feasible. About the only thing that it might be feasible to do (and sometimes necessary) is to systematically rename all the output files from GED2HTML, for example to change the filename extensions from .HTM to .HTML. Even this is to be avoided if at all possible.

To avoid having to upload individual files, the following general strategy is applied to get the files (and directories) from your system to your Web provider's system:

  1. First, an archiver is used, to package up all the files and directories output by GED2HTML into a single "archive" file. Examples of archivers are various versions of ZIP, as well as the Unix program "tar".

  2. Next, the archive file is uploaded to your Web provider's system, typically using FTP, but perhaps Kermit or Zmodem. Usually, archive files are binary data, so when using FTP it is important to select "binary" or "image" mode when uploading them.

  3. Finally, the archive file is unpacked into place on your Web provider's system. This requires that an "unarchiving" program, complementary to the one you used to create the archive, be available on your Web provider's system, and that also you have sufficient access to your Web provider's system (typically a "shell account") to be able to run the unarchiver on that system.
Some good advice is to work out the whole procedure using a small GEDCOM, so that you don't waste a lot of time uploading megabytes upon megabytes of data until you are pretty sure the whole thing is going to work in the end.

One big thing that can go wrong with the above strategy is that you might not have shell access to your Web provider's system. In that case, all bets are off, and you will have to check with your Web provider's technical support staff to find out what might be the best way to get the GED2HTML output files and directories from your personal computer to the Web provider's system. Often the best strategy in this case is to use a good FTP client such as WS-FTP that is capable of uploading multiple files without manual intervention.

What Archiver to Use?

If you do have shell access, then one question you have to work out is which archiver to use. Probably any decent system these days will have an "unzip" program available, so your best choice would probably be to use some form of PKUNZIP or compatible. My recommendation is Info-ZIP "WiZ", which is a very nice, free ZIP/UNZIP program, with a GUI. The Windows 95/NT version understands long filenames. I have made available for download wiz401xN.exe for Windows 95/NT, and wiz401x.exe for Windows 3.1. These are self-extracting archives, which unpack themselves on your system when they are launched from an otherwise blank folder. Versions of Info-ZIP software are also available for Unix systems. For more information about Info-ZIP, and for the most recent releases, see here.)

Note that when unpacking a ZIP archive with a program such as PKUNZIP, it may be necessary to tell the program to preserve the directories (folder) organization the files originally had when they were created by GED2HTML. When using PKUNZIP to unpack the archive, this is done by giving the "-d" parameter on the command line. When using WinZip to unpack the archive, the "Use Folder Names" box should be checked to do the same thing.

Another alternative is the classic unix "tar" archiver. Many Web providers are Unix-based systems, and essentially every Unix system has the "tar" program available, so using "tar" to create the archive is often a viable option. You can download a TAR.EXE for use under DOS or Windows from here. (Note that this version of TAR.EXE is a DOS program that doesn't understand Windows 95 long filenames. There may be more up-to-date versions available on the 'net, but I haven't looked recently.)

Use the DOS command

TAR cvf HTML.TAR HTML (note that the ``cvf'' must be in lower-case letters) to create a binary file ``HTML.TAR'' containing the contents of the entire ``HTML'' subtree. On your service provider's system, execute tar xvf html.tar to unpack the file and recreate the ``HTML'' subtree. Note that if you packaged up your data as suggested above, then when you unpack your data, the ``HTML'' directory will be created in the directory from which you run ``tar''. Thus, you want to make sure you are in the directory where you want the ``HTML'' subdirectory to go before you unpack your data.

Can't Use Subdirectories

Some major online providers do not allow users to create subdirectories in their Web directory. If you subscribe to such a provider, all is not lost, because GED2HTML can be instructed to produce a "flat" set of output files that does not use subdirectories. Simply set the "Number of files per directory" option to zero to disable the use of subdirectories.

You should be aware that, with a GEDCOM of any significant size, GED2HTML will produce a great many output files. One reason why GED2HTML uses subdirectories in the first place is that it can be somewhat inconvenient to manipulate a directory containing several thousand files. In addition, under Windows accessing such directories becomes very slow. If you ask GED2HTML to create a "flat" file structure, be aware that it could take substantially longer to create your HTML files, and that dealing with the resulting directory might be a bit cumbersome.

Upper-case Filenames and Lower-case Links

Another common thing that goes wrong is for people to find that, once they have uploaded all their files and unpacked them on their Web provider's system, the hyperlinks within the HTML files don't work because the files all end up having upper-case names on the Web provider's system, whereas all the links in the HTML files are in lower case. This type of problem is unavoidable when files are transferred from a case-insensitive operating system like DOS (Windows 3.1) to a case-sensitive operating system like Unix. The uploading program or unarchiver that is used has to make some decision about how to render the upper-case only DOS names as Unix names. Two natural choices are (1) to render all such names in upper case on the Unix system, and (2) to render all such names in lower case on the Unix system. Windows 95 has introduced a third, not so natural, choice, which is to render all such names as mixed upper and lower case, where the first letter is capitalized, and the remaining letters are in lower case.

One way to get around this problem, should you encounter it, is to see if the uploading program or unarchiver you used has an option that allows you select whether DOS files are unpacked with upper-case or lower-case names. If so, that may be the fastest way out. If not, then GED2HTML is now fully configurable to handle this type of situation. If you find that files are ending up on your Web provider with all upper-case names, then set the "Case-fold-links" option of GED2HTML to "Upper" before processing your GEDCOM. This will cause GED2HTML to make all the links in the HTML output upper-case only, so that they will work properly once the whole batch of files arrives on your provider's system. If you find that files are ending up on your Web provider with all lower-case names, then set the "Case-fold-links" option of GED2HTML to "Lower". If both your system and your Web provider's system are case-sensitive systems, then setting "Case-fold-links" to "None" is probably the option for you. See here for more details on this, and other customization options provided by GED2HTML.

HTM versus HTML

There is another technical problem that you might encounter when you have uploaded your data and you begin to try to access it via the Web. The symptom of this problem is that your Web browser will apparently not recognize your uploaded files as HTML code, and will instead display them as plain text. The cause of this problem actually involves the configuration of the Web server on your provider's system, in a way that is explained below.

When you create HTML files under Windows 3.1, all the files are created with a .HTM extension, because DOS only supports three-letter filename suffixes, and will truncate anything longer than that. When run under Windows 3.1, the default behavior of GED2HTML is to use a .HTM extension for all the hyperlinks internal to the output files, in order to be compatible with the .HTM extension with which the files are created.

When you upload files created under Windows 3.1 to your provider's system, everything will be fine as long as the HTTP (Web) server running on your provider's system is set up to recognize files ending in .HTM as HTML source code. The server passes this information along to your browser, which formats displays the information properly. However, not all HTTP servers are configured to recognize files ending in .HTM as HTML files. In this case, the server will tell your browser that the file is text, and it will not be properly displayed.

The simplest way out of this predicament is to ask for the cooperation of your Web service provider. Web servers generally have a "mime.types" configuration file, which lists mappings from filename extensions to MIME types. This information allows the server to determine what kind of information is in a file by looking at the extension. The server communicates this information to your browser when the file is retrieved, and the browser, in turn, uses the information to control how the file is displayed. Some servers lack an entry:

text/HTML htm in this file, which would tell them that files ending in ".htm" are to be interpreted as HTML source. What you need to do is to ask your friendly Web service provider to add the above entry to the server "mime.types" configuration file. Usually they will agree to do this for you, as it doesn't have any bad effect, and it will generally help out DOS users who are uploading HTML from their PC's.

Although many Web service providers will make the configuration change described above, some providers will refuse to make the change and will tell you that you should change all filenames from .HTM to .HTML, perhaps using a special script that they provide. In this case, it will be necessary to reprocess and re-upload your data, so that all the internal links in each of the datafiles say .html instead of .htm. Under the Windows 3.1 version of GED2HTML, this can be done by putting the following in the "Additional Options:" field in the top-level dialog box:

-D FILENAME_TEMPLATE="%s.html" You may also need to adjust the "Case-fold-links" option, depending on whether the files are coming out on your Web provider with upper-case names or lower-case names. After processing your GEDCOM with FILENAME_TEMPLATE option, archive the output files, upload them to your Web provider, and unpack them, as described above. At this point, you will have a bunch of files ending in .HTM, but the within the files will be ".HTML", so it will be necessary to systematically rename all of your files. If your provider provides a script to change .HTM to .HTML, use it. If not, then on a Unix-based provider you can use the following shell script. Enter it into a file ``renameall.csh'' on the provider's system exactly as shown. foreach f ( HTML/*.HTM ) mv $f $f:s/.HTM/.HTML/ end foreach d ( HTML/D0* HTML/INDEX HTML/SURNAMES HTML/SOURCES HTML/NOTES ) foreach f ( $d/*.HTM ) mv $f $f:s/.HTM/.HTML/ end end Put it into the directory on the provider's system that contains the "html" subdirectory created when you unpacked your uploaded data. From that directory, type csh renameall.csh If the above command succeeds, you should be ready to go! If the command fails for some reason, you'll have to enlist the help of someone who is familiar with Unix systems, as it is not possible for me to anticipate and explain here all the possible failure modes and ways to recover.

I have also been told about a utility called "MultiRen", which is a Multiple File Rename utility. A GED2HTML user writes:

I discovered a neat Multiple File Rename utility, MultiRen, a great idea for those of us that use your GED2HTML for generating multiple databases, so we can give the g0000... and ind00... files different names (g12xx and ind12xx for instance) for each database. [I've been doing this by hand, and editing the html with a global find & replace utility]
> You can find it on p. 269 of the June 9th, '98 issue of PC Magazine and
> http://www.pcmag.com (at the home page click on downloads) or
> ftp.zdnet.com. The full source code is also available from
> http://www.pcmag.com.
GED2HTML home page

Copyright © 1995-2004 Eugene W. Stark. All rights reserved.