******************************************************************************
*                                                                            *
*                                                                            *
*               Eichstaett-to-Beta-Code-Text-Conversion-Program              *
*                                                                            *
*                        (c) B.Meissner, 1992-2010                           *
*                                                                            *
******************************************************************************

Preface: New Version
====================

mc has been ported to C-SNOBOL (Phil Budne), because of this system's
wide acceptance. mc has been augmented to include all the extended codes
which have been introduced into the Eichstaett text files, long line 
conventions and other oddities.

All I/O routines in mc. have been changed from SPITBOL conventions to SNOBOL4.
It would be easy to revert this at some stage (or introduce system-aware alternatives
as we did in betautf8.sno, which is completely transparent to all variants of 
the SNOBOL4 language), but given the rarity with which mc is probably run, we
did not want to invest time on this.

Thus you should use snobol4 -b -d 4096k -S 500k -P 500k ./mc.sno 

instead of mc.exe. 

Snobol4 can be retrieved in source format from:

http://www.snobol4.org/csnobol4/curr 

We provide a couple of SNOBOL4 versions in the snobol4.dir directory.

There are calling examples in snobol4.bat, mc.bat and mc.


MC and Errors in the Text Files
===============================

mc cannot, however, correct errors which arre present in the Eichstaett text
files themselves. In the most recent versions (2008 and 2009/2010), there are
three such errors which should be corrected at some stage:


1) ILS_III2 contains erroneous lines and line lengths in text no. 9259a. These
errors make the text completely incomprehensible, even to the ConcEyst program
itself. The ConcEyst program's rendering of the text is documented in the
screenshot "fehler.png" and shows that something must have gone very wrong with
the codes. 

In beta code, this text looks like:

98.9259a.1.1.1.1.1              $:E)\A(|U)\& undoc. `0 $*SHA)=*|T& undoc. `0
$*R&r[1centuria]1$I)HU)\&#59[1centuria]1$*T& undoc. `0 $*E&r$E)\A(|
H&[1milliaria]1$*T:&[1milliaria]1$*A*E&g$*SI*)T& undoc. `0
$*S*T&[1milliaria]1O$*T:&[1milliaria]1$*AHA*(|
T&[1milliaria]1x$*E&[1milliaria]1$U)\E)\A*(|R&dSVO AT EXORNATIONEM$A)=|
&BALINEI$H\|&DONO DEDIT 

Note that mc inserts an " undoc. " tag, followed by the numerical value of the
code in error (in this case: zero=0) to make identification of such errors easier.

2) SEG_52 seems as if there was an error after inscription number 1021, since
ConcEyst  does not display any  inscriptions after that.  The reason for this
seems to a  weird ordering of the texts in the  .NDX file for this file:  The
texts are  not ordered according  to their  numbers,  but to their respective
dates. This inconsistency is corrected on-the-fly by mc, which sorts the texts
according to their numbers.  Mc uses a complicated queue-structure, a multiply
linked list to re-arrange the texts.

3) There are a couple of files which are referred to differently internally
and externally by ConcEyst. These are the McCrum, Meiggs&Lewis and Rhodes_Osborne
files which have differing uppercase/lowercase renderings of their file names 
in the corresponding table files. On case-aware operating systems (all modern systems
are!) this may lead to errors. A small unix shell script provides a remedy by 
externally linking the files to symbolic names which more precisely correspond to 
the names these files have in the internal data bases ("rename_files").

Thus, mc cannot be used only to make the ConcEyst text files accessible by 
beta-code-aware programs (like V&F), but also to check and debug the ConcEyst 
text files themselves.




                                   Purpose
                                   =======

The Catholic University of Eichstaett, Germany (Prof.  Dr. J.Malitz)  provides
a  set  of  Greek  and  Latin  epigraphical  corpora  on disk.  They come with
concordance  routines  and  browser  facilities,  but  the files use a special
format.  To  adapt them to  the V&F program  for beta-coded texts  to get more
liberty in using the texts, a conversion routine is therefore necessary.   For
this  purpose, Burkhard  Meissner  (University  of  Halle)  has written the MC
conversion program that performs the task. MC converts the Eichstaett files in
beta code format to be read by Burkhard Meissner's V&F program. Other programs
which  use  beta-coded  text  files  (as  distributed by the Thesaurus Linguae
Graecae and the Packard Humanities Institute)  might also be used to view  and
search  the  resulting  text  files,  but  the  program  has been specifically
designed for the V&F environment.

Since there is no printed documentation for MC.EXE, you should print out  this
file on your standard line printer to get a reading copy:

COPY MC.DOC LPT1


                               Legal Conditions
                               ================

MC.EXE forms an integral part of V&F.  All legal conditions that apply to  the
other software items of the V&F package apply to MC.EXE, too:


1) Conditions of Use

MC is a program that its  author, Burkhard Meissner, has written privately for
his own purposes.  However, the author of the MC software grants the right  to
use that software to any legitimate user of MC, under the following terms  and
conditions:  A legitimate user is defined as any natural or legal person  that
is already a legitimate user of the V&F software.  This entails that the  user
may  have   paid  the  license  fee  of  1000  DM  (institutions)  or  800  DM
(individuals)  to  Burkhard Meissner  for  the use  of the V&F software or ob-
tained the software by  downloading from the  Ancient History home page at the 
the Helmut-Schmidt-University web site.


The legitimate user  does not  obtain the  software as  his property; instead,
all rights of copying and distributing  the  software  remain with the author.
However, the legitimate  user acquires the  right to use  the software on  any
single  machine  at  a  time  and  to  make  as many backup copies for his own
purposes as he thinks appropriate.   The legimtimate user being  a non-private
institutional (i.e.:  legal)  person, it may use  the software on any  machine
that is physically  contained within the  institution's building.   Therefore,
the right to use the MC  software is transferred under the condition  that the
aforementioned fee is paid to the author, and the right to use the MC software
is transferred  only to  the person  that has  paid the  fee.   Using MC in an
networking environment or on different machines within one physically coherent
institution does conform to these conditions of legitimate use of MC;  copying
and distributing MC to more than  one separate machine to use the  software on
more than one  machine by different  persons at the  same time does  not.  The
author of the MC software, Burkhard Meissner holds and will hold the copyright
of the  software, and  he is  its material,  intellectual and spiritual owner.
This entails  that users  have to  give due  credit to  the author  and his MC
software when they publish results produced from the use of MC. Therefore, the
right to  use MC  is transferred  under the  explicit condition  that the user
mentions the MC software and its  author in any publication that results  from
serious use  of the  MC software.   Any  use of  the MC  software that  is not
rendered legitimate according to  these conditions is considered  illegitimate
and illegal.


2) Warranty Conditions

a) Limited Warranty

The MC  software is  provided "as  is" without  warranty of  any kind,  either
expressed or implied, including, but not limited to the implied warranties  of
merchantability and fitness for a particular  purpose.  The entire risk as  to
the quality  and performance  of the  program is  with the  user.   Should the
program prove  defective, the  user assumes  the entire  cost of all necessary
servicing,  repair  or  correction.   The  author, Burkhard Meissner, does not
warrant  that  the  functions  contained  in  the program will meet the user's
requirements or  that the  operation of  the program  will be uninterrupted or
error-free.   However, he  warrants the  diskette(s) on  which the  program is
furnished,  if it  is so furnished,  to be free from defects  in materials and
workmanship under normal use for a period of ninety (90) days from the date of
delivery to the user.

b) Limitations of Remedies

The author's entire liability and the users exclusive remedy shall be:

1) The  replacement of  any diskette  not meeting  our "Limited  Warranty" and
which is returned to the author together with a copy of the user's receipt,

or

2)  if  the  author,  Burkhard  Meissner, is  unable  to deliver a replacement
diskette which is free  of defects in materials  or workmanship, the user  may
terminate the  Agreement between  author and  user by  returning the software,
upon which his money shall be refunded.

In no event will the author, Burkhard Meissner, be liable to the user for  any
damages,  including  any  lost  profits,  lost  savings or other incidental or
consequential damages arising out of the  use or inability to use the  program
even if the author has been advised of the possibility of such damages, or for
any claim by any other party.


                                   Using MC
                                   ========

To use this program, you should  know something about the organization of  the
Eichstaett text files:  Each text corpus is  organized around a corpus  (.CPS)
file which contains pointers to a  set of index (.NDX) and text  (.TXT) files.
In order to collect all  texts of such a corpus  into one beta code file,  one
has to resolve the  file pointers in the  corpus file and the  pointers to the
single texts in the .NDX files.

Our program does all this for single corpora.  This means:  You must re-run it
for each .CPS file to obtain a full conversion of all existing text files.  It
assumes that the .TXT, .NDX and .CPS which belong together are located in  the
present directory and writes its output  into the same directory.  It  creates
AUTHTAB.DIR  automatically,  the  beta  code  author  list.  Suppose, you have
DFG.CPS with its adjacent files.  To create a beta code file set CV_001, start
our conversion program MC:

MC

Then enter the name of the input file (e.d.:  DFG) and the output file  (e.d.:
CV_001).  For Latin texts, enter Y  upon the third question.  The Y  overrides
the default Greek font of the  Eichstaett files. This is necessary  with texts
which are mainly in Latin.  As a  new feature, you can then enter a string  of
up to  six reference  level names  which will  be used  instead of the default
strings (Inscription.....line).   E.d.:   Inscription....section  line Confirm
with Y, or else re-enter all informations.

The program will create CV_001.txt, CV_001.idt and AUTHTAB.dir.  Be sure there
are no files with the same names which you do not want to be overwritten.   An
ongoing MC conversion process can  always be interrupted using the  Ctrl-Break
key.  In  this case you  should allow some  time to the  MC.EXE virtual memory
manager system to restore  the memory state to  DOS standards, close the  swap
file and close the files written so far.

When the process finishes,  you can start V&F,  make the directory, where  the
new files are, your CDdrive directory  (Options menu, number 6, do not  forget
the back slash), Re-Read the author list and work on this file.

To append the new files to existing  sets of beta code texts, you may  use our
AUTHCONV utility with the /A switch and the DOS APPEND command (see below).

There are no restrictions  as to the use  of the resulting files;  they can be
read, searched and indexed like any other TLG or PHI beta code text.

The program  sorts the  texts by  names.   It uses  a special  algorithm which
compares the text names  with respect to the  positional value of their  first
numeric component, without comparing the text names only alphabetically.  Thus
it is ensured  that 400a sorts  after 400 but  before 4000.   A text name like
2347-9 sorts  after 2347  but before  2349.   The complicated  string and list
conversion operations during the sorting process make it quite slow with large
text collections.  In addition,  the memory requirements are gigantic.   There
is no  way of  running it  on a  1 MByte  machine; a  machine with only 2 or 3
MBytes  may  come  into  serious  difficulties  with  excessive swapping.  For
practical applications, 4 MBytes seem the minimum.


                           MC - An Example Session
                           =======================

Suppose you have a set of Eichstaett files.  These come as text files  (.TXT),
index files (.NDX) and  corpus files (.CPS).   The dictionary files (.DCT  and
alike) are of no interest to the conversion process and are not needed for  MC
(but  you  must  not  delete  them:   They  are  necessary  for the Eichstaett
concordance routines themselves).

You should copy all  these files in one  sub directory (for security  reasons,
you should perform  the conversions, using  copies of the  files, and not  the
original ones).

Now you should detemine how many and which .CPS files there are:

DIR *.CPS

For security reasons, create  a sub directory for  each single .CPS file  with
its distinct name (but empty as yet).

Suppose you have .CPS files
ann_epig.cps        73  14.12.91  11:09
dfg     .cps       109  10.09.92  16:54
germania.cps       118  22.01.88  17:04
ils     .cps        82  23.04.91  14:56

Now create sub directories:
MD ann_epig
MD dfg
MD germania
MD ils

Now start the conversion process. Call
MC
from the  DOS prompt  and enter  "ann_epig" when  being asked  for a .CPS file
name.    Enter  a  distinct  name  for  the  output  files  when  asked (e.d.:
"cv_001").

You should know, if most of the texts  are in Greek or in Latin.  If  they are
in Greek,  you should  simply press  <ENTER> when  being asked,  if the  texts
should be forced to have  Latin default font.  If  most of them are in  Latin,
enter "Y" or "y" as the answer to this third question.  This makes the program
enter the "l" language code in the  resulting files instead of "g".  This,  in
turn, allows for  much less language  codes within the  texts themselves, with
the consequence of making  them more compact and  all V&F processes upon  them
faster.

To start the  process, enter "Y"  or "y" upon  the fourth question,  any other
letter if you want to make corrections.  In this case, you must start MC anew.

In the above example, MC converts the texts and writes three files:
CV_001.TXT
CV_001.IDT
AUTHTAB.DIR

For security reasons, you should copy (or move, if you have a directory  entry
mover like UNIX' mv) these files to your ANN_EPIG sub directory:

COPY CV_001.TXT ANN_EPIG
COPY CV_001.IDT ANN_EPIG
COPY AUTHTAB.DIR ANN_EPIG

The same processes should be performed for all four .CPS files separately.  Of
course,  you  should  select  divers  file  names  for the output files (e.d.:
CV_002 CV_003 CV_004 and so forth).

After the last conversion has been performed (CV_004 being the last output  in
our above example), you should have all resulting files in the sub directories
(four in our  case).  Now  copy the files  of the first  sub directory in your
present directory:
COPY ANN_EPIG\*.*

Now you have  AUTHTAB.DIR (with one  entry) in your  present directory.   Call
AUTHCONV  with  this  file  specified  AS  THE  OUTPUT  FILE  and  the  single
AUTHTAB.DIR files within the other sub directories as INPUT FILES:
AUTHCONV   c:\DFG\AUTHTAB.DIR        AUTHTAB.DIR  /A
AUTHCONV   c:\GERMANIA\AUTHTAB.DIR   AUTHTAB.DIR  /A
AUTHCONV   c:\ILS\AUTHTAB.DIR        AUTHTAB.DIR  /A

Finally, you  should move  or copy  all .TXT  and .IDT  files as  well as  the
AUTHTAB.DIR file to one separate sub  directory where they can be accessed  by
V&F:

MD D:\MALITZ
XCOPY ANN_EPIG\*.IDT     D:\MALITZ
XCOPY ANN_EPIG\*.TXT     D:\MALITZ
XCOPY DFG\*.IDT          D:\MALITZ
XCOPY DFG\*.TXT          D:\MALITZ
XCOPY GERMANIA\*.IDT     D:\MALITZ
XCOPY GERMANIA\*.TXT     D:\MALITZ
XCOPY ILS\*.IDT          D:\MALITZ
XCOPY ILS\*.TXT          D:\MALITZ
COPY AUTHTAB.DIR         D:\MALITZ

The old files in the four sub directories can be deleted:
DEL ANN_EPIG
DEL ILS
DEL DFG
DEL GERMANIA
DEL CV_*.*
DEL AUTHTAB.DIR

If you want  to have these  files available together  with one of  your CD-ROM
texts, you should use the DOS command APPEND.

Insert your CD into the CD drive.  Let's suppose, the drive be G:.  Change  to
the drive, where your new inscription files are located:

CD C:\MALITZ

Now create a common AUTHTAB.DIR for both sets of texts.  Keep the old ones for
reference:

AUTHCONV G:\AUTHTAB.DIR AUTHTAB.DIR /A

Now use the DOS command APPEND to make both directories accessible:

APPEND G:\

You can append multiple directories, separated by ";".

In your V&F profile PROFILE.V&F you should make 'C:\MALITZ\' your CDdrive text
file directory.  In  this case you will  have both sets of  texts available at
one time.  If you insert the DOS APPEND command in your V&F.BAT file to  start
the program,  you can  automate the  process for  the future.   To  revoke the
APPEND assignments, give the command with only one ";":

APPEND ;

This should be given  in your V&F.BAT batch  file before returning you  to the
DOS prompt.

Using this strategy, we have connected  all TLG and PHI texts with  the Malitz
texts at Erlangen in one large text file inventory.


                 MC - Technical Details and Possible Problems
                 ============================================

MC uses  complicated lists  with a  multiple-pointer structure  to store  work
lists,  text  lists  and  single  texts  in  memory.  To sort the texts, these
structures must  temporarily be  converted to  linear arrays  to be sorted and
re-created afterwards.  This uses large amounts of memory and takes much time.
Once the text lists are sorted  and the conversion process is under  way (i.e.
when the CONVERTING ... lines  flicker on the screen), your  computer's memory
is  enough  for  the  process.    If,  however,  you get the SYSTEM ERROR #204
message, informing you  of the presence  of insufficient memory,  your present
configuration cannot be used for the job.

In this case, you should use  MODXCONF as described in the V&F  documentation.
Just  apply  it  to  MC  (MODXCONF  MC.EXE)  and follow the menu instructions.
Increase the region size and/or virtual memory size.  The swapping process  of
the virtual memory  manager uses large  amounts of time  and hard disk  space.
Therefore we  have decided  to set  the MC.EXE  region size  to only 3 MBytes.
This should  be enough  for most  processes.   As yet  we have  found that all
conversion processes could be fitted in  a memory region that was larger  than
1456 KBytes and smaller  than 1584 Kbytes.   With MC.EXE, you have  twice that
amount of free space which should be enough by all standards.


                               Acknowledgements
                               ================

The development of  MC has been  made possible by  Prof.  Dr.  Jaergen Malitz,
Eichstaett Catholic University, who disclosed his file formats, provided ample
documentation upon the internal structure of his texts and of the programs  to
handle them.  He also invested much time in beta-testing MC. Some errors  have
been  found  during  this  activity,  and  the  program  owes many features to
Professor Malitz' suggestions.  Professor Malitz' work on the  computerization
of Greek inscriptions  of Asia is  to be commended  for its yielding  positive
results.  Mark  Emmer of Catspaw,  Inc. has been  exceptionally cooperative in
responding to demands  for improving his  already excellent programming  tools
(SPITBOL-386, SPITBOL-8088).


                  MC Application Error Messages and Numbers
                  =========================================

 1  INPUT .CPS FILE CANNOT BE OPENED
 2  OUTPUT FILE CANNOT BE OPENED
 3  INPUT .CPS FILE DOES NOT HAVE CORRECT FORMAT
 4  INPUT .CPS FILE CORRUPT: NON-MATCHING NUMBER OF ENTRIES
 5  INPUT .NDX FILE CANNOT BE OPENED
 6  INPUT .NDX FILE CORRUPT: ERRONEOUS FILE FORMAT
 7  INPUT .NDX FILE CORRUPT: NON-MATCHING NUMBER OF ENTRIES
 8  INPUT .TXT FILE CANNOT BE OPENED
 9  INPUT .TXT FILE CORRUPT: TEXT OFFSET CANNOT BE REACHED
10  INPUT .TXT FILE CORRUPT: MISSING END MARKER (FFh)
11  INPUT .TXT FILE CORRUPT: ERRONEOUS LINE LENGTH
12  OUTPUT .TXT FILE CANNOT BE WRITTEN TO
13  OUTPUT .NDX FILE CANNOT BE WRITTEN TO
14  AUTHTAB.DIR CANNOT BE OPENED FOR WRITING
15  CONFLICTING I/O FILE NAMES



                              Technical Details
                              =================

MC used to be built with the INTEL DOS extender (DPMI-compatible). This meant,
it could be run on 32-bit DOS machines only, and not under DESQView.  Since MC uses large amounts of memory, this seemed reasonably.  Nowadays,  machines are
much  faster and much larger:  An interpreter implementation is feasible under
these circumstances.