Sie sind hier: Startseite > ... > FakultätFächer und Einrichtungen > Professur für Alte Geschichte

BETAUTF8

NAME

betautf8 - a fast, flexible beta code to unicode (utf8) file converter

SYNOPSIS

beta8utf8 input_file output_file [ -n ] [ -g or -h or -l ] [-eol=nn,nn]

DESCRIPTION

Betautf8 is a fast, flexible translation program which converts beta code text files as used by PHI or TLG into unicode standard text files. Betautf8 translates all beta-coded Greek into unicode (utf8); the more common beta code symbols (brackets, diacritical marks, special symbols etc.) are converted correctly. All unknown symbols and combinations are put out in unconverted form, except for formatting options (@nnn &nnn $nnn): With these, the numeric options are simply deleted.

Without any parameters given, the program works best with output created by Burkhard Meißner´s View and Find system for beta-coded text files (now in the PUBLIC DOMAIN), assuming that all lines in the files start with a 32-byte reference section which is put out in unconverted form. If your input to BETAUTF8 was not produced by V&F, use the /N or -n parameter.

The program puts out all lines which start with a ∼ (column 0, beginning-of-line) as they are in the original file. If this is unsatisfactory, please let me know.

COMMAND LINE OPTIONS

-n   Specify the "no reference" option: The program normally assumes there are 32 bytes of reference information present at the beginning of each text line. With this option set, this assumption is not made. Instead, the program takes any line to begin directly with beta code symbols.

-h -l -g   Specify the startup language: H(ebrew) or G(reek) or L(atin).

-eol=nn,nn   Define end-of-line character(s).

For ASCII files use -eol=13,10 - for Unix style files -eol=10 - for Mac file -eol=13

EXAMPLES

The following example, P.Lips. I 90 = Stud.Pal. III 118 (Hermupolis, AD 614/615), shows how betautf8 converts beta code texts. If you have a browser or pdf viewer which is capable of utf8 (unicode), you can compare beta code input for betautf8 to the programs's unicode output. Choose between HTML Version and PDF file

This example, an anonymous Philipp history, may also be looked at. If your browser or pdf viewer is capable of utf8 (unicode), you can compare beta code input to unicode output. Choose between HTML Version and PDF file

The third example, a medley of astrological, mathematical, papyrological and epigraphical texts, shows how powerful the betacode-to-unicode conversion algorithm of betautf8 is, and how much of the beta code text files is rendered correctly in unicode (utf8) by betautf8. Just compare the beta code input file to the betautf8 output: the unicode (utf8) file. It may be easier for you to look at the output file in more common file formats: HTML, PDF, DOC, RTF or PostScript File.

FILES AND DIRECTORIES

These are subject to differences depending on local installation conventions; all what you need is betautf8.exe (32-bit DOS version or 32-bit Win-32 version) in a suitable place and DOS or Windows (or appropriate emulator) to execute it. To execute the system under the Linux operating system directly, install Phil Budne's CSnobol4 implementation of the SNOBOL4 language or Dave Shield's Linux:SPIBOL compiler. If used with Mark Emmer's SPITBOL386 compiler, the program automatically exits, writing the standalone module (betautf8.exe) which can then be used independently.

INTERNET RESOURCES

Main websites:

Das Papyrus-Projekt Halle-Jena-Leipzig

Ancient History at the University of the Federal Armed Forces

Snobol4/Spitbol:

Phil Budne's SNOBOL4 resources

Phil Budne's free CSnobol4 implementation

Dave Shield's Linux:SPIBOL compiler

Free Win-32 SPITBOL compiler

Mark Emmer's SPITBOL resources (including product and price lists for various versions of SNOBOL4 and SPITBOL)

Mark Emmer's free 16-bit SNOBOL4+ interpreter

Downloads:

Source Code (for SNOBOL4 and SPITBOL386)

Binary (DOS or emulation)

Binary (Win-32)

Manual page (for Unix groff compatible formatters)

General Public License

Download archives with ready-to-run files:

Linux: Contains 64bit and 32bit CSnobol4+betautf8.sno source file and the new Linux-SPITBOL-betautf8.spx version (10 times as fast!): linux.zip

MS-DOS, emulated DOS under Linux: 32-bit DOS extended betautf8.exe program file: dos32exe.zip

MS-DOS, emulated DOS under Linux: 32-bit CSnobol4+betautf8.sno source file: dos_csno.zip

Win32, emulated Windows under Linux: 32-bit betautf8.exe program: win32exe.zip

Win32, emulated Win32 under Linux: 32-bit CSnobol4+betautf8.sno source file: win32.zip

MS-DOS, emulated DOS under Linux: 16-bit snorun.exe+betautf8.sav save module: dos_sav.zip

Mac OS X: 32-bit CSnobol4+betautf8.sno: mac_osx.bz2

OpenSolaris: 32-bit CSnobol4+betautf8.sno: solaris.zip

FreeBSD: 32-bit CSnobol4+betautf8.sno: freebsd.zip

Additional utilities:

compose/decompose: filters to normalize and convert Unicode (UTF-8) files. These filters read a unicode text file and convert it to either a form with all (if possible) accents intimately combined with their respective letters (precombined accents), or else completely detached from them (accents uncombined). Also included are programs to convert between UTF-8 and UTF-16 encodings.

CONVCONC: Reformatting program useful to convert concordance files (index.v&f) to UTF-8 using betautf8.

tabtosp: replaces tabs by spaces in input files

To selectively remove references, refx.exe from the V&F distribution is a much more precise and flexible solution.

DUKE: Prepares output of internet version of Duke Database of Documentary Papyri for conversion by betautf8

Source code duke.sno

MS-DOS .exe duke.exe

16-bit .sav duke.sav

32-bit Windows .exe duke.exe

SPEED

To get an idea of how fast the different implementations are, the entire Polybius text (45381 lines, 2 621 444 bytes) was translated into unicode (45381 lines, 5 779 219 bytes), using a LapTop computer [Machine: Fujitsu Siemens ESPRIMO Mobile V5535 V1.06 with Intel(R) Celeron 540 @ 1.86GHz and 4 GBytes of RAM under SuSE Linux 11.0 with kernel 2.6.25.20-2 (x86_64)].

System Seconds
Linux, spitbol, betautf8.spx 3.80
Linux, wine, betautf8.exe 3.84
Linux, dosemu, MS-DOS 6.22, betautf8.exe 6.04
Linux, dosemu, FreeDOS, betautf8.exe 7.80
Linux, dosemu, CSnobol4+betautf8.sno 24.12
Linux, wine, CSnobol4+betautf8.sno 23.77
Linux, CSnobol4+betautf8.sno 26.80
Linux, dosemu, snorun.exe+betautf8.sav 2559.20

Effectively, there are therefore five speed-groups: Linux-SPITBOL or betautf8.exe (32bit Windows), betautf8.exe (MS-DOS), betautf8.exe (FreeDOS), the CSnobol4 implementations, and 16-bit DOS Snobol4+. Their relative speeds are roughly like:

100% : 65% : 50% : 15% : 0.15%

Therefore, from the point of view of speed, betautf8.exe (either the DOS or the 32bit Windows version) and Linux-SPITBOL+betautf8.spx is the way to go. Experiments with different sets of texts from the TLG "E" (FUJITSU SIEMENS ESPRIMO Mobile V5535 laptop, SuSE 11.0 Linux) have demonstrated the translation speed of these systems to be between 5827 (dosemu+betautf8.exe) and 11615 (wine+betautf8.exe) beta code text lines per second.

LICENSING

Betautf8 is distributed under the GNU General Public License without any warranty, its source being provided together with the running program itself. However, the author kindly asks anybody who makes additions, corrections or other suitable contributions to the program and/or its source, to provide the author with a copy of these contributions in order to provide the community with ever improved versions of the software. To contact the author, either use his e-mail address: bmeissne@hsu-hamburg.de

or his university post box:


Prof. Dr. Burkhard Meissner
Professur für Alte Geschichte
Helmut-Schmidt-Universität
University of the Federal Armed Forces

Holstenhofweg 85
D-22043 Hamburg (Germany)

AUTHOR

Betautf8 has been written by Burkhard Meissner, University of the Federal Armed Forces, Hamburg, Germany.



Autor: Burkhard Meißner (E-MailBurkhard Meißner), letzte Änderung: 07.05.2015


URI: https://www2.hsu-hh.de/hisalt/projects/betautf8.htm Druckdatum:07.05.2015