Proofreading Guidelines Explanation

From DPCanadaWiki

Jump to: navigation, search

The Proofreading Guidelines: "why do we do that?"

The aim of this page is to collect explanations for why certain guidelines are as they are. These are not the guidelines, and should not be taken as directions on how to proof. This page is intended as a resource for volunteers who would like to better understand the reasoning behind the Proofreading Guidelines. (A "sister" article also exists to do the same for the Formatting Guidelines at Formatting Guidelines Explanation.)

Paragraphs in italics are quoted from the current Proofreading Guidelines.

If you want to suggest changes or additions to the guidelines, please do so in the Documentation Forum. This article is only for explaining the reasoning behind the current Guidelines.


Contents

Line Breaks

Leave all line breaks in so that later in the process other volunteers can easily compare the lines in the text to the lines in the image. If the previous proofreader removed the line breaks, please replace them so that they once again match the image.

During the proofing and formatting rounds we keep the line breaks as they are to make it easier to compare with the page image. The lines will usually be re-wrapped during post-processing.

Double Quotes

Proofread these as plain ASCII " double quotes. Do not change double quotes to single quotes. Leave them as the Author wrote them.

For quotes from non-English languages, use the quotation marks appropriate to that language if they are available in the Latin-1 character set. The French equivalent, guillemets, «like this», are available from the pulldown menus in the proofreading interface, since they are part of Latin-1. The quotation marks used in some German texts, „like this” are not available in the pulldown menus, as they are not in Latin-1. The Project Manager may instruct you in the Project Comments to proofread non-English language quotation marks differently for a particular book.

A lot of of German projects are proofread with German guillemets, »like this«. This preserves the difference between opening and closing quote marks, while using only characters that are in Latin-1.

Single Quotes

Proofread these as the plain ASCII ' single quote (apostrophe). Do not change single quotes to double quotes. Leave them as the Author wrote them.

Quote Marks on each line

Proofread quotation marks at the beginning of each line of a quotation by removing all of them except for the one at the start of the first line of the quotation.

The text will be rewrapped in post-processing, changing the line breaks, so if we left the extra quote marks in the text they would end up in the middle of the paragraph.

If the quotation goes on for multiple paragraphs, each paragraph should have an opening quote mark on the first line of the paragraph.

Often there is no closing quotation mark until the very end of the quoted section of text, which may not be on the same page you are proofreading. Leave it that way—do not add closing quotation marks that are not in the page image.

This is the usual way that quotation marks work in modern English: each paragraph has an opening quote mark, and there is no closing quote mark until the speaker finishes.

End of Sentence Periods

Proofread periods that end sentences with a single space after them.

You do not need to remove extra spaces after periods if they're already in the scanned text—we can do that automatically during post-processing.

Punctuation

In general, there should be no space before punctuation characters except opening quotation marks. If scanned text has a space before punctuation, remove it.

Spaces before punctuation sometimes appear because books typeset in the 1700's & 1800's often used partial spaces before punctuation such as a semicolon or comma.

In older texts the spacing around punctuation may be inconsistent, or different than modern practices. There may be partial spaces around some punctuation marks (something like 1/2 of a regular space). Since computers don't deal well with partial spaces, the OCR interprets these as full spaces. We remove those full spaces and attach the punctuation to surrounding words according to current practice. Further, if we were to leave those spaces, lines might be rewrapped between the word and the punctuation
, leading to something like this line.
Also, in some languages other than English it's common to have spaces before certain punctuation marks, like semi-colons and question marks even in modern usage. Those spaces should be removed in proofreading. The correct kind of non-breaking space will be inserted during post-processing:
blah, blah 
blah. blah 
blah; blah 
blah: blah 
blah! blah 
blah? blah 
Conversely, punctuation marks that ought to have a space after them but don't should have a space inserted:
blah,blah -> blah, blah
blah ,blah -> blah, blah (otherwise this could wrap with a line beginning
,blah)
However, punctuation marks that normally appear in pairs, such as "quotation marks", (parentheses), [brackets], and {braces} normally have a space before the opening mark which should be retained. For example:
blah (blah) blah 
blah [blah] blah    (except footnote markers: blah[3] blah) 
blah {blah} blah 
blah "blah" blah 
blah 'blah' blah

Period Pause "..." (Ellipsis)

ENGLISH: Leave a space before the three dots, and a space after. The exception is at the end of a sentence, when there would be no space, four dots, and a space after. This is also the case for any other ending punctuation mark: the 3 dots follow immediately, without any space.

Q: What about when an ellipsis falls at the beginning or end of a line?

A: Unlike dashes, ellipses can normally be left at the beginning or end of a line. When the text is rewrapped during post-processing, a space will be inserted at the end of each line, so text like this:

blah blah ...
blah blah,
blah blah
... blah blah.

will become:

blah blah ... blah blah, blah
blah ... blah blah.

The ellipsis is treated just like a word, and it gets the appropriate spacing around it automatically. However, if the text looks like this:

blah blah.
... blah blah,

then you do need to move the ellipsis up (creating four dots together). If you don't, then after rewrapping it would become:

blah blah. ... blah blah,

Contractions

Extra spaces or tabs between Words

Trailing Space at End-of-line

Line Numbers

Italic and Bold Text

Italicized text may occasionally appear with <i> inserted at the start and </i> inserted at the end of the italics. Bold text (text printed in a heavier typeface) may occasionally appear with <b> inserted before the bold text and </b> after it. Do not remove this formatting information, unless it surrounds junk that does not appear on the page. Do not add it where it does not appear. The formatters will do that later in the process.

Some reasons to do no formatting:
  1. It may distract you from the proofreading tasks.
  2. It may confuse other proofers.
  3. The formatters miss all the fun.
  4. Formatters in F1 may be trying to qualify to F2. In order to do that, they have to have pages to format in F1, and if the formatting has already been done, then there's nothing left for them to qualify with.
In the proofreading rounds you should simply ignore the markup completely, and proof the text that's around it. However, if there is a lot of markup on the page and it interferes with your proofing, it's okay to remove it. An easy way to do this is to use the Remove Formatting button in the bottom right corner of the proofreading interface, that looks like a crossed-out 'x'. Select all text, and click the button.
Another way to do this is to use the Search/Replace button in the bottom left corner of the proofreading interface. Click on Search/Replace, and a window will pop up. In the "Search" box, type:
<[/ib]+>
Don't put anything in the "Replace" box, and be sure to check the Regular Expression? checkbox. Click "Replace all."
Alternatively, if you don't want to remove formatting markup, you can view the proofed page in the "Show All Text" window (in the Enhanced Interface, it's the button with an eye on it). This removes the markup and applies the formatting, but it does remove the markup clutter. You can't make changes in this window but it can make it easier to identify errors that are difficult to spot amongst the markup, like spaces before punctuation that shouldn't be there. Then go back to the Proofing Interface to correct them.

Superscripts

Subscripts

Font Size Changes

Words in Small Capitals

Large, Ornate opening Capital letter (Drop Cap)

Accented/Non-ASCII Characters

Please proofread these using the proper accented Latin-1 characters, where possible. See Diacritical marks for ways to proof some non-Latin-1 characters.

However, we usually don't use the fraction symbols (such as ½) that are in Latin-1 because there are very few of them available. It would look inconsistent if we had a mixture of forms like ¼ and 1/3 in the same text, so we just use the long form (1/2) for all fractions.

Q: Why do we use æ but proof œ as [oe]?

A: æ is in Latin-1, the character set we use on DP. œ is not, despite appearing in character sets in HTML, DPCustomMono and Windows. The [oe] will be replaced during post-processing, but during proofreading we are limited to the characters in Latin-1.

Characters with Diacritical marks

Non-Latin Characters

Fractions

Dashes, Hyphens, and Minus Signs

If an em-dash appears at the start or end of a line of your OCR'd text, join it with the other line so that there are no spaces or line breaks around it. Only if the author used an em-dash to start or end the paragraph or line of poetry or dialog should you leave it at the start or end of a line.

We do this because when the text gets rewrapped during post-processing, a space will be inserted at the end of each line of text. If the text is proofed like this:
senses--touch, smell, hearing, and sight--
with which we are here concerned,
then after rewrapping it would become:
senses--touch, smell, hearing, and sight-- with which we 
are here concerned,
To make the spacing around dashes consistent in the final text, proofers need to make sure that the dashes are always "clothed"--that there is always text on both sides of the dash.
Don't clothe dashes at the beginning or end of a paragraph, or in poetry, because in those cases the line break won't be changed in the final text.


Some suggestions on how to distinguish normal em-dashes (proofed as --) from longer em-dashes (proofed as ----)

The safe way: if you have seen other em-dashes in the book but this dash looks considerably longer, it's probably a long dash.

Letter-width ways (your mileage may vary): shorter em-dashes are roughly the width of 2-3 lowercase letters, or an uppercase M, while longer em-dashes are as long as 4-5 letters or two uppercase Ms.

If there are no points of comparison, and the dash is in between the lengths mentioned above, you're probably best to leave a [**note] and/or post in the forum thread.

End-of-line Hyphenation

End-of-page Hyphenation

Paragraph Spacing/Indenting

Put a blank line to separate paragraphs. You should not indent the start of paragraphs, but if all paragraphs are already indented, don't bother removing those spaces—that can be done automatically during post-processing.

If the page begins with a new paragraph, many proofreaders put a blank line before the start of the first page, and remove blank lines at the start of the page if the page begins mid-paragraph. This is not required, but also not forbidden, and will be checked in the formatting rounds.

The consensus seems to be not to change blank lines at the start of the page when mentoring, so as not to burden newcomers with optional changes. We have also been asked not to change blank lines at the start of the page when second-round proofing P3Qual projects if this is the only change to be made on the page. This is so that the people doing the evaluations don't have to open the page to find only an optional change, which isn't counted in the evaluation.

Multiple Columns

Blank Page

Page Headers/Page Footers

Remove page headers and page footers, but not footnotes, from the text.

During post-processing all of the pages will be joined together into one text, so if we left in the header (or footer) on each page it would disrupt the flow of the text.

Chapter Headers

Illustrations

Proofread any caption text as it is printed, preserving the line breaks. If the caption falls in the middle of a paragraph, use blank lines to set it apart from the rest of the text. If there is no caption in the original text, then the mark-up of the illustration is left to the formatters.

Most pages with an Illustration but no text will already be marked with [Blank Page]. Leave this marking as is.

If the body text wraps around the Illustration, leave the caption text wherever the OCR put it. Just make sure that it's actually present on the page somewhere, and that all the letters, punctuation, etc. are correct. Proofers don't need to worry about where it belongs; the formatters will move the caption to the correct position and mark it.
Sometimes an illustration will contain text, such as a map legend, a family tree, or a picture of a page from another book. That text content is often useful for the plaintext version of the posted e-book, even if it's replaced with an image of the illustration for the HTML version. Because of this, it's usually best to include all the text when proofing. If in doubt, ask about it in the Project Discussion, or add a note on the page to call the post-processor's attention to it.

Footnotes/Endnotes

Poetry/Epigrams

Paragraph Side-Descriptions (Sidenotes)

Some books will have short descriptions of the paragraph along the side of the text. These are called sidenotes. Proofread the sidenote text as it is printed, preserving the line breaks. Leave a blank line before and after the sidenote, so that it can be distinguished from the text around it. The OCR may place the sidenotes anywhere on the page, and may even intermingle the sidenote text with the rest of the text.. Separate them so that the sidenote text is all together, but don't worry about the position of the sidenotes on the page. The formatters will move them to the correct locations.

If a sidenote is rotated and written alongside the body text, just treat it as a normal sidenote. Separate it with a blank line before and after, like normal. It's a good idea to leave a [**comment] attached, explaining the situation, or to post in the project discussion to let the PPer know about it.

Tables

Front/Back Title Page

Table of Contents

Indexes

Please retain page numbers in index pages. You don't need to align the numbers as they appear in the scan; just make sure that the numbers and punctuation match the scan and retain the line breaks.

Specific formatting of indexes will occur later in the process. The proofreader's job is to be sure that all the text and numbers are correct.

If you are concerned that spaces after punctuation in an Index entry (e.g. p. 70, 71) might cause the second number to rewrap to the beginning of a new line (as they are often at the end of the index entry), be aware that Indexes are handled differently than the rest of the text during post-processing and the PPer will manage the rewrapping carefully so that situations like this won't arise.

Plays: Actor Names/Stage Directions

Anything else that needs special handling or that you're unsure of

Start your note with a square bracket and two asterisks [** and end it with another square bracket ]. This clearly separates it from the Author's text and signals the Post-Processor to stop and carefully examine this part of the text & the matching image to address any issues.

During post-processing, the PPer will search for [** to find all proofers' notes and comments, so it's important to use that format. Single asterisks * are used in some formatting items, so you shouldn't just leave an asterisk when you're unsure of something. It's better to write out a note explaining the problem, so that everyone in later rounds understands the situation.

Previous Proofreaders' Notes/Comments

Any notes or comments put in by a previous volunteer must be left in place. You may add agreement or disagreement to the existing note but even if you know the answer, you absolutely must not remove the comment. If you have found a source which clarifies the problem, please cite it so the post-processor can also refer to it.

Sometimes you may think that there is no need for a note, but others may disagree, so it's best if all notes are left just in case. Post-processors often like to see these notes, even if the situation has been resolved, so that they know what was going on during the proofing of the text.

Printer Errors/Misspellings

Correct all of the words that the OCR has misread (scannos), but do not correct what may appear to you to be misspellings or printer errors that occur on the scanned image. Many of the older texts have words spelled differently from modern usage and we retain these older spellings, including any accented characters.

If you are unsure, place a note in the txet [**typo for text?] and ask in the Project Discussion thread. If you do make a change, include a note describing what you changed: [**Transcriber's Note: typo fixed, changed from "txet" to "text"]. Include the two asterisks ** so the post-processor will notice it.

Sometimes a word or punctuation mark may seem incorrect, but it could turn out to be what the author intended. The older the text, the more differences there are compared to modern usage, so it's best to just reproduce what's in the image.
If you think it may have been an error on the part of the printer, then you should leave a note. Some post-processors correct these errors, and some don't; some note the errors (corrected or uncorrected) and some don't. The decision about how to deal with printing errors is left for the PPer, so during proofing we just mark them to make them easy to find later on. For instance:
If you believe the original printer made an error or has been inconsistent[**spelled "inconsistant" on previous 3 pages], or something just [**missing word here?] wrong somehow, proof it as the scan shows and and[**duplicate word] add a note at the place of debate describing your concren[**typo for concern?][**missing period]
Personal tools