User:Jhellingman/DP20

From DPCanadaWiki

Jump to: navigation, search

This is a proposed completely revised design for DP sites, authored by Jhellingman. DPC will follow any further development with great interest. Any member interested in participating in a discussion of such a design should PM Site Admin coachmike.

"DP 2.0 has been on my mind for a long time, the roots were already there before DP got off the ground, but I just didn't have the time to get into the serious programming effort to do it.

The main motivation for a DP 2.0 is to spend our efforts more efficiently, and automate whatever we can automate.

I estimate a team of several good programmers will need about half a year to build this type of infrastructure, using building blocks as can be found in open source projects, such as OpenOffice and ImageMagick, on top of a MySQL database and an Apache server.

Contents

DP 2.0 Architecture

Multi layer design

For a DP 2.0, I would consider a standard three-tier architecture.

  1. Database layer
  2. Middle layer with workflow support
  3. Presentation/Interaction layer

The Presentation layer may be distributed over the server and client.

We will need clean documentation of interfacing between layers.

  1. Database design
  2. Programming API (which may be based on web services, SOAP, XINS, etc.)
  3. Well designed GUI, taking into account usability for a range of users.

Database Layer

The database will contain the following main tables

  • User
    • userId
    • email
    • password
  • Author (simplifying here that authors can be known under multiple names)
    • authorId
    • name
    • dates
  • Project
    • projectId
    • title
    • clearance
  • AuthorProject (linking projects to authors)
    • authorId
    • projectId
  • Page
    • pageId
    • projectId
    • imageFile
  • Revision
    • pageId
    • text
    • state
  • Dictionary
    • language
    • projectId

Middle layer with workflow support

A programming API something like this.

  • SessionToken = Login(User, Password)
  • Status = Logout(SessionToken)
  • ProjectList = GetProjects(SessionToken, Filter, Phase)
  • ProjectDetails = GetProjectDetails(SessionToken, Project)
  • Dictionary = GetProjectDictionary(SessionToken, Project)
  • Status = AddToDictionary(SessionToken, Project, Word, Language)
  • Status = CreateProject(SessionToken, ProjectDetails)
  • Status = AddPage(SessionToken, Project, PageDetails)
  • PageDetails = CheckOutPage(SessionToken, Page)
  • PageHistory = GetPageHistory(SessionToken, Page)
  • PageVersion = GetPageVersion(SessionToken, PageVersion)
  • PageDelta = GetPageDelta(SessionToken, PageVersion, PageVersion)
  • Status = ReleasePage(SessionToken, Page)
  • Status = SavePage(SessionToken, Page)
  • ChangeDetails = PreCommitPage(SessionToken, Page)
  • Status = CommitPage(SessionToken, Page, Promote)

More functions will be needed for various management tasks.

Real users as well as bots could use the API and improve work-items.

Presentation/Interaction layer

These could be a web interface, but I tend to turn to two simple controls (objects) that can be loaded into a browser page, a graphic viewer control and a text editor control.

  • Graphic viewer control
    • Grayscale with good interpolation, to provide easy to read scans, even when working with high or medium resolution black and white scans.
    • view and manipulate (in limited ways) the page image. Note that we should not alter the original scan, we just store the manipulations and apply them again (Similar to the way 'presentation states' are applied to medical images).
      • mark text column; plate; table; music, etc...
      • eraser (hide selected area)
      • ROI (region of interest) mask (hide everything except selected area)
      • option to show masked or erased areas at 30% gray
      • option to fit only masked area on screen.
      • distortion grid (draw grid over distorted page image until it matches the distortion of the page, then transform the page image to make the grid rectangular again)
      • The distortion grids are specified as paths, each having the form (x, y)..(x, y).-(x, y), where - between the coordinates stands for a straight segment, .. stands for a curved segment (always a circle segment, such that consecutive curve points always have a smooth transition), and .- (or -.) for a straight segment smoothly attached to curved segment.
      • Options for gamma specification or bi-level cut-off.
      • Option to indicate column split.
    • Connects with server to retrieve page image, supports direct extraction from multipage tiff/PDF/DJVU with help of server.
    • can save images on server as well as on client, both before and after applying the manipulations.

Presentation specification in XML.

 <presentation project="xyz" page="123">
   <rotate degrees="90"/>
   <roi area="x, y, w, h"/>
   <erase area="x, y, w, h"/>
   <distort 
      top="(x, y)..(x, y)..(x, y).-(x, y)"
      bottom="(x, y)..(x, y)..(x, y).-(x, y)"
      left="(x, y)-(x, y)"
      right="(x, y)-(x, y)"/>
   <rotate degrees="1.64"/>
   <block type="image" area="x, y, w, h" gamma="1.6" levels="0.2, 0.9"/>
   <block type="text|music|table|math" area="x, y, w, h" levels="0.8, 1.0"/>
 </presentation>

Manipulation specifications are stored with each page version, and are thus versioned themselves.

  • Text editor control
    • Can display styled text as well as plain text
    • supports wikipedia-like tagging
    • supports spell checker and highlights unknown words
      • not in dictionary (red under-twiggle): pop-up menu appears with at most 5 suggestions, and options to add to project dictionary, overall dictionary, or suggest the word is in another language.
      • common scanno (blue under-twiggle): pop-up menu appears with suggestion, and option to indicate this is not a scanno.
      • changed in previous round(s) (green under-twiggle): pop-up menu with original word or fragment, and option to revert.
    • All under-twiggles can be toggled on or off.
    • Option to select text and tell system this is in a foreign language.
    • Option to select text and apply a tag to it.
    • supports Unicode
      • pop-up windows to select odd characters from.
    • Can deal with "inherited tagging" (tags inherited from previous pages in the project), for display purposes.
    • metadata fields for (at option of project manager)
      • language of page.
      • type of page.
      • page number.
      • footer and header segments.
      • page binder's signature.

Page specification in XML (not to be confused with the markup the user see, here represented as tags):

 <page project="xyz" page="123">
    <number n="14">XIV</number>
    <header></header>
    <inherit>
      <otag name="text"/>
      <otag name="body"/>
      <otag name="div"/ attributes="n='1'"/>
    </inherit>
    <text>
       <p><otag name="head" attributes="type='sub'"/>The Head<ctag name="head"/></p>
       <p>This is the text of the <corr round="p1" sic="pagc">page</corr> with some tagging.</p>
       <p>The <scanno status="resolved">arid</scanno> desert.</p>
       <p>The motto was <foreign lang="la">Luctor et Emergo</foreign>.</p>
    </text>
    <footer></footer>
    <signature>AA*</signature>
 </page>


This is the XML structure send from the server to the edit control on the client and back to the server again.

The otag and ctag elements contain presentation level tags in TEI. Note that these are visalized as wikipedia like markup.

Both controls can operate independently, but synchronized, use browser code to glue them together.

Both controls can show the first few lines of the next page, and the last few lines of the previous page (but these cannot be edited).

Main DP 2.0 Workflow

Note that currently, projects go through rounds. In my proposal, after the clearance and upload phase, individual pages go through phases, with each phase having one or more rounds until they reach the publication stage, depending on a number of criteria. This means that some pages can be almost completely done while other pages are still untouched.

The following phases are foreseen:

  1. Clearance: verify a work is eligible, that is, free from copyright restrictions.
  2. Upload: upload the complete scanned work.
  3. Metadata: add metadata and regions of interest to each page. Indicate what a page contains, such as text, illustrations, and tables.
  4. Cleanup: clean-up the OCR results.
  5. Proofing: proof the text for remaining transcription errors.
    • Visual: read text side-by-side with original.
    • Auditive: let text-to-speech software read the text while reading the original.
  6. Tagging: add tags to text elements, such as headers, italics, tables, etc.
  7. Special: add tags which require specific skills, such as transcribing Greek passages, music notation, complicated tables, etc.
  8. Publication: combine all completed pages into a complete ebook.

Image:Proposed DP20 process.png

Not all phases are required. Which phases a page goes through can be selected by the project manager. In addition, proofers in an early phase may indicate that a page needs some special processing on a page-by-page base. This indication can be semi-automatic, for example, if a page contains a Greek passage, the appearance of a tag Greek will activate a special phase to deal with it.

A phase does not have a predefined number of rounds. The number of rounds for a page in a phase depends on metrics. These are calculated on a page-by-page base.

When a user has worked on a page, and thinks his work is completed, he can commit his work. Before committing, system may show changes made highlighted, and give second option to confirm. After this confirmation, the page is "committed" back into pool.

After each commitment, the system decides whether page can promote to next phase based on a number of parameters:

  • User "Merit" (newbie or experienced user, good, average or sloppy work, measured for pages with similar stats (language, etc), tests passed)
  • Page "Merit" (number of corrections made, difficulty level)
  • User preference: When the system determines the page can be promoted, the user can "Commit" or "Promote" a page.

System always keeps a delta trail for each commitment. Difficult pages may make many rounds; simple pages one or two.

Optionally, the system can also run proofing rounds in parallel, combining the efforts of two proofers independently. This is especially advisable with difficult projects and type-in projects.

A typical difficult OCR-ed text thus could go through 4 rounds of proofreading in the second and third phase, as follows:

  • 1. Cleanup of text during first round (remove OCR artifacts, and first corrections)
  • 2a. Careful proofreading of output of 1
  • 2b. Careful proofreading of output of 1
  • 3. Reconciliation: compare differences between 2a and 2b. (omitted if pages are exactly the same)

A typical type-in text thus could go through 3 rounds:

  • 1a. Type-in by first volunteer
  • 1b. Type-in by second volunteer
  • 2. Reconciliation: compare differences between 1a and 1b. (omitted if pages are exactly the same)
  • 3. Careful proofreading of output of 2

The following sections describe the purpose of each phase in detail.

User Registration

Before a user can work in DP, he has to register. After registration, the provided email address will be verified before the user is accepted into the system. This is done by sending a link with a random token. To avoid fake registrations, a simple "Turing" test may be included.

Related user interface

  • registration form
  • registration confirmation
  • edit user details
  • edit user preferences
  • view user statistics

Phase 1: Clearance

The copyright clearance system will be integrated into DP. To apply for a clearance, a content provider does the following:

  • Provide basic facts on the book (title, authors, publisher, place and date, language of work, etc.)
  • Upload title page and verso (TP&V) and/or library records to proof these details.

Rules for scanned title page and verso for copyright clearance purposes,

  • scan, between 100 and 200 DPI, preferably full color JPEG compressed format.
  • scans should be clearly readable.
  • directly from scanner or digital camera, without further edits except scaling and cropping, that is, no removal of library stamps or other artifacts of the copy used as source.
  • when in non-English language, translation of relevant sections and phrases on each scan should be provided.
  • when working from harvested copies, a description of the source, and a URL when available.

The system will create an entry in DP database for the work, and will provide a means to verify work is not duplicated, by listing similar works that are in progress or completed.

All clearance requests should be in the open (except for the submitter details) from day 1, which means that they should not exceed fair use in themselves. Posting a title-page and verso, and a few pages with content that allows us to date a work for the purpose of establishing its copyright status is most likely fair use, even for works that cannot be cleared later on.

Books in the clearance system can have several states:

  • Submitted: A volunteer has submitted the book details.
  • On Hold: A copyright clearer has requested more information before being able to make a decision.
  • Not OK: The copyright of the book cannot be cleared.
  • OK: The US copyright of the book has been cleared.

With the clearance state can be a reason or motivation for the decision, which may include:

  • Expired: published before 1923.
  • Non renewal: US published work before 1964 without evidence of copyright renewal even after diligent research.
  • No Notice: US published work before 1989 without valid copyright notice.
  • Government Work: US federal government work.
  • Granted: copyrighted, but owner has granted permission to put into PG.

Only administrators with the special copyright clearance right can approve clearance requests. The clearance should only cover the exact copy described in the request, and is only valid for Project Gutenberg purposes. (This doesn't exclude using other copies to remedy defects as long as these fall under fair use.

Once approved, the interface becomes available for uploading all scanned images.

In parallel with the US clearances, we could introduce supplementary clearances for other jurisdictions. These have only an informational status, and will not affect the work going on-line.

Related user interface

Directly related to work-flow

  • clearance request submit
  • clearance overview (user)
  • clearance overview (clearance administrator)
  • clearance approve (clearance administrator)

Supplementary

  • Copyright clearance how-to pages
  • Search for works already done and works in progress (to avoid duplication of work)
  • Search interface on copyright renewal databases
  • Links to on-line catalogs.
  • Embed authority files (Warning: huge database)

Phase 2: Upload Scans

  • Supported formats
    • TIFF, PNG, GIF, JPG, PDF, DjVu
    • Including multipage when the format supports it
    • Including compressed archives (gzip, zip, tar, bzip2, 7zip, etc).
    • Build in OCR
  • Scanning guidelines
    • Scan all pages, including covers and blank pages.
    • text only: at least 300 DPI B&W
    • B&W image: at least 300 DPI Grayscale (use descreen when needed)
    • Color image: at least 300 DPI 24 bit color (use descreen when needed)

Build-in OCR is the most difficult feature here. This requires a rather heavy OCR server which can be used to convert individual page images to text. At the current processing rate, we need to 10.000 to 20.000 pages per day. Currently, no open source engines seem to be available that achieve the required OCR quality. Commercially, ABBYY Recognition Server may meet the requirements but is Windows only. Alternatively, the ABBYY FineReader Engine 8.0 is available for Linux.

Alternatively, we need to monitor the open source Tesseract OCR Engine.

This is also the place to configure the workflow for the work. That is, selecting which rounds a page should go through. This may be achieved by offering a set of workflow templates from which the uploader can select one as default, and which may be overridden for page-ranges. Note that during processing, the workflow for an individual page may be modified, based on features of the page. For example, the presence of a Greek citation may trigger a specialist round for Greek.

Workflow can be specified in a small XML specification, for example:

<dpworkflow>
  <sequence>
    <phase name="Cleanup"/>
    <parallel>
       <phase name="Proofing"/>
       <phase name="Proofing"/>
    </parallel>
    <phase name="Tagging"/>
    <phase name="Special:Greek"/>
    <phase name="Publication"/>
  <sequence>
</dpworkflow>

Phase 3: image cleanup and metadata

This phase extensively uses the graphic viewer control. It covers some tasks now typically done by content providers, and is optional.

Users only see the page image, and indicate interesting areas in graphical way.

  • overall content area (inherited from previous page or template if possible)
  • text columns (inherited from previous page or template if possible)
  • smudge
  • table
  • music
  • figures

The UI will have a range of buttons to select appropriate rubber bands to indicate the area.

The UI will not show anything outside the content area after this round. Smudge will be made plain white or very light gray.

In an advanced version of the graphics control, users may be able to correct bending and perspective distortion, by drawing a grid over the page image, which matches the distortion, and then asking the software to straighten the grid.

Users can add following information

  • page number (true page number as it appears on the page)
  • main section level and number
  • type of page
  • signature information (if project asks for it. Binders signatures are letters shown mostly on the bottom of the page, intended to help binders. They are mainly of interest only for very old antiquarian books.)

Build in OCR runs again after submitting page.

Note that no actual editing of the source image takes place: the edits and transformations are combined and applied when the page is served out or OCR-ed.

Phase 4: Text Cleanup

Users see image and text side-by-side, and clean up garbage left by OCR software, to make the page correct. The focus here is on removing dirt left by the OCR process.

Interface will be required here to add non-standard characters, that may not be present on the proofers keyboard, for example an a with a macron.

Phase 5: Proofreading

As second phase, but now concentrating on removing errors.

This phase includes a spell-check feature, using both language and project specific word lists.

Software highlights in text:

  • Not in dictionary (with drop-down menu with suggestions; Add to dictionary; Add to project Dictionary; Accept as is)
  • Scannos and other suspect words (with drop-down menu with suggestions and other options)

During this phase, a tailored project dictionary will be constructed. This dictionary can be reviewed later on in the project.

The following information is collected:

  • A word-frequency list, listing each word with its frequency in the latest version of the document (all pages in their most recent version)
  • A word-replacement list, list all word-level corrections and their frequency, for the proofing phase.
  • A potential scanno-list, listing all word-level replacements, where both the original and the changed word are in the dictionary.

Additional interface will be required here for project dictionary management:

  • Show word-frequency, word-replacement, and scanno list.
  • Show proofer suggestions and accept or reject them. (list with check-boxes)
  • Manage project specific dictionary
  • Manage scanno and suspect word lists (bad words)

Phase 6: Tagging (aka Formatting)

Users see image and text side-by-side

Users are expected to add formatting (but corrections may also be made)

Formatting will be based on wikipedia style

The system will track differences on two different levels:

  1. Tagging
  2. Core text (non tagging)

Typically, no changes are expected on the core text level. They are allowed, but will result in a warning.

Tagging needs to be correct before a text can be submitted. Client (and server) enforce this.

The core text is a version of the text with all tagging removed.

Phase 7: Specialist Rounds

Specialists will add

  • non-Latin script (fragments, full works in a non-Latin script will be dealt with normally by users who know the language and script in question -- in fact, this round may then be a specialist round to deal with phrases in Latin script!)
  • math notation (based on (La)TeX)
  • music notation (based on Lilipond)
  • image editing (for illustrations)
  • Descriptions to images (to aid visually 'challenged' readers)

This is a place where bots (automated clients, as opposed to human beings) can come in handy. Specially designed bots could perform any of the following tasks

  • tagging disambiguation of place names mentioned. (similar as in the Perseus Project)
  • tagging dates mentioned.
  • tagging measurements; disambiguation of units mentioned.
  • tagging cross references (and resolving them).

Using a set of general or tuned parsing rules.

List of foreseen specialist rounds:

  • Foreign Scripts
    • Arabic
    • Greek
    • Hebrew
    • Chinese
    • Japanese
  • Specialist Notations
    • Music
    • Math
    • Dance
    • Chemistry
  • Complex Layout
    • Tables
    • Indexes
  • Special Tagging
    • Image Descriptions (Describing images using keywords from a controlled vocabulary, as aid in classification on search)
    • Language (Bot with human review)
    • Cross References (Bot with human review, enabling linking together Project Gutenberg publications)
    • Dates (Bot with human review, adding tags linking dates to ISO dates)
    • Units (Bot with human review, adding SI units for older units)
    • Place names (Bot with human review, disambiguating place names and linking them to geographical coordinates)
    • Personal names (Bot with human review, disambiguating personal names.)

Phase 8: Publication

Works will remain available in the system, in the published section.

System will automatically assemble a TEI master file from the collected metadata and the tagged pages.

Pages remain available for continued improvement and additional tagging.

The automated PP will require more precise tagging than currently in use at PG. For this reason I propose a shift to Wiki-like tagging, which is easier to learn than most pointed angle tagging (although this will still be allowed)

Especially for hyphenated words and footnotes split between pages, special normalized tagging will be required.

The system will complain about wrongly formatted tagging.

Utilities & Ideas

We foresee a number of utility pages on the DP2.0 website, many as present on the current DP website.

  1. Statistics Central, with lots of interesting statistics on pages and projects.
  2. Merit & Quality Calculations, which indicate the effort a volunteer has donated in some way, and the quality of work..
  3. Skills, which indicate what skills (language and other) a volunteer has available.
  4. Progress indication bars as used on Project Runeberg.

Statistics

We introduce the following metrics.

Metric Description
P Pages proofed
C Characters proofed (counted on normalized core text)
Epn Edit distance between input and output of proofing round n, measured on normalized core text.
Efn Edit distance from output of proofing round n and final page, measured on normalized core text.
Tpn Tags applied in round tagging round n
Tf Tags applied in final page.

And the following constants. Note that these values are rather arbitrary, as they attempt to express the effort in terms of characters read.

Constant Value Description
Kc 1 Cost of proofing one character (used as unit of work).
Kp 200 Additional cost of proofing one page.
Ke 40 Cost of editing one character (either deletion or addition; change counts as deletion + addition).
Kt 120 Cost of adding a tag.
F1 4 Multiplication factor for missed edits in proofing round 1
F2 8 Multiplication factor for missed edits in proofing round 2
F3...Fn 16 Multiplication factor for missed edits in proofing round 3 and later
T 0.5 Threshold to be able to promote page to next phase (as rank of proofer in sorted list of proofing quality, 1.0 is best, 0.0 is worst).

Then we can calculate the following values:

  • Effort = C + (Kp * P) + (Ke * Epn) + (Kt * Tpn)
  • Residue Cost = Fn * (Ke * Efn)
  • Merit = Effort - Residue Cost

We can calculate the following figures

  • Effort and merit for a round (based on difference between input and output)
  • Effort and merit for a user (based on all time sum of efforts for a round)
  • Actual effort for a page (sum of effort for all rounds)
  • Effective effort for a page (based on difference between initial OCR output and final page)
  • Effective effort for a project (total of all pages)
  • Effective effort of entire site (total of all projects)

The residue cost includes a penalty factor for missed or wrong edits.

To encourage the completion of works in the pipeline, we may give a bonus on the last 10% or so of pages of a work still to be done. Similarly, we may give a bonus on the few oldest projects in the queue. This bonus could be anything between 10 to 100 percent. Note however that the bonus also weighs in on the quality of work calculations...

Normalized Text

Normalized text is the text without tagging. To create normalized text, we apply the following steps.

  1. drop all HTML or XML like tagging (in angle brackets), optionally replacing them with spaces or new-lines depending on type of tag.
  2. drop all DP internal tagging (in square brackets).
  3. normalize spacing (all sequences of spaces to a single space)
  4. normalize new-lines (multiple new-lines to one)

This normalized text will be the base for merit scores in proofreading.

Quality of work calculation for user

Quality if based on the residue cost, that is, based on the number of errors left in each round, we calculate a score based on percentiles.

  • Calculate residue cost per character for all proofers
  • Sort proofers by quality of work.
    • Best 1% get 5 stars (summa cum laude)
    • Next 4% get 4 stars (magna cum laude)
    • Next 17% get 3 stars (cum laude)
    • Next 30% get 2 stars
    • Next 40% get 1 star
    • Worst 10% get no star

People need at least a merit of 50,000 to earn one star (about 25 pages proofed), and 200,000 to earn two or more stars (about 100 pages proofed). Four and five stars will only be given with at least 1000 active proofers.

To allow people to improve, only the last 200.000 merit points earned will be taken into account for quality of work calculations. People with two or more stars can promote texts to the next round. (Technically, the best 50%, This is a default. In the Project Managers interface, PMs can select any percentage for this value, although if they put it too high, their projects will go very slow.)

Note that since the guidelines prescribe to match the text of the image exactly, proofers will not be penalized for mistakes in the source. Proofers too are entitled to add tagging, and are encouraged to do so when they encounter mistakes in the source.

Merit will be calculated both overall and per language and per type of page (based on page metadata), such that norms and stars can be awarded independently (besides the overall Hall of Fame). This is especially important for less widely spoken languages or old languages, and for difficult types of material, such as dictionaries and mathematical books.

Page Reservation

For some people, proofing is more fun when you can work on an unbroken sequence of pages. To facilitate this, the system will, besides edit-locks on pages actually being proofed, use reservations for a range of pages, say the next 20 or 50 pages, such that if a second proofer will start work on the same book, both their work will not be intertwined. Reservations are not absolute, and if unreserved pages run out for a certain project, they can be taken away again.

Use Case Scenarios

Read Published works

  1. User arrives (anonymous access possible)
  2. Uses browse or search

Register for proofreading

  1. User arrives, selects register
  2. Enters email address, nickname and selects passwords
  3. Receives confirmation email
  4. Confirms email
  5. Fills in interesting details (language proficiency, preferences, etc.)
  6. Done

Browse Projects

Apart from selecting a project, users can also just select a subject and get a page from a random project in that subject.

Configure Preferences

Users can set a large range of options and facts, including

  • language skills.
  • interesting subjects.
  • name, nickname, contact details.
  • connection type.

Make Quiz / Test

Before people can get started in a certain Phase, they will have to go through a one-page quiz, where they will be confronted with some of the bottlenecks, and be given guidance on common mistakes. This should not take more than 5 minutes for the easier phases.

Proofread page

  1. User arrives, logs in
  2. Selects project from lists
  3. Receives page
  4. Proofs page, makes corrections
  5. Submits pages, receives changed made overview
  6. Commits page
  7. (Optionally more pages)
  8. Done.

This task will be specialized for each specific phase.

Create Project

Add Pages to Project

Provide Clearance for Project

This step is limited to a few trusted admins. After this step, the project will become visible for the world, before this, only the project creator and the admins can see it.

Add Project to Release Queue

To limit the number of projects people work on simultaneously (as to maintain some notion of progress on individual projects), the site still works with release queues, taking care a wide spectrum of material remains available in each round.

Notation

The notation will be wiki-like; following wikipedia conventions whenever possible, with additions when needed.

Additional tagging will be provided using brackets.


End-of-line Hyphenated words will be undone or marked as follows.

example[-*?]text: doubtful end-of-line hyphen.

ex[-*]ample: end-of-line hyphen that can to be removed.

example[-*!]text: end-of-line hyphen that needs to stay.


Footnotes will be placed outline on the place the occur. The footnote markers will be placed in brackets.

Footnote marker[2] indicated as such. The following markers can be used [*], [**], [&dagger], [|]

[Footnote 2: Text of footnote.]

When the text of a footnote is continued on the next page it is as follows:

[Footnote 2: Text of foot[-*!]]*


Corrections made to the source will be marked as follows:

[sic: wroong speling] [corr: wroong speling|correct spelling] [ins: "] [del: .]


Illustrations are indicated as follows

[Illustration 1: =Caption=

Some more text.]

Sample Pages

Select

Project

  • Bibliographic Details
  • Project History
  • Project Discussion
  • Page Overview

Proofread Page

  • Page image
  • Page text

Format Page

Page History

n page Phase
1 - Proofread
page history
Phase Round Resp Delta Norm. Delta
Upload 1 JH - -
Cleanup 1 JH 23 12
Cleanup 2 JH 2 2
Proofread 1 JH 4 3
"
Personal tools