|
44 titles preserved for the world! |
| DP |
WordCheck FAQContentsGeneral Questions
Proofer Questions
Project Manager Questions
General QuestionsWhat's up with the new spellcheck interface?The previous spellcheck interface had a couple of areas that could have used improvement:
To address these and other areas, the spellcheck code was revamped to add the following enhancements:
The new interface has been relabeled as WordCheck to identify the broader scope of the tool. What are 'Good', 'Bad', and 'Flagged' words?The WordCheck interface is designed to help proofers catch differences between the page image and the page text. Often when the OCR software identifies the word incorrectly the word becomes misspelled and can be caught by a spell checker. Other times the OCR software incorrectly identifies a word in the image but the resulting text is a valid word. These words are still wrong despite being valid words. The team has decided to use the Good/Bad nomenclature to better reflect the intent of the WordCheck interface - to help the proofer match the image and the text, rather than use an inaccurate label like 'misspelling'. After WordCheck has processed words at the various levels it comes up with a final set of Bad words to present to the user for validation or correction. These words are called Flagged words as they have been flagged by the system for closer inspection. Where do Flagged words come from?Flagged words can come from a variety of sources. These sources originate from one of three levels:
Each level takes precedence over the level before it. Words identified as Bad at the World level (by an external spell-checker) but are valid at the Project level (project Good words) will not be flagged. This allows the person closest to the text more control over what is flagged: Project Managers can adjust the Good and Bad Words Lists at the project level. Site administrators can manage Bad Words commonly found as stealth scannos at the Site level. Spellcheckers and other external validators can be used to determine Bad Words at the World level. Can you give me a simple example of how the levels work to flag words for the proofer to correct or accept?To help illustrate how the WordCheck system works, consider the following pseudo-project.
Now lets consider the following OCR'd text: Lubbock is a town of many things: arid fiat 1and, grid-like roads, arid the infamous tumbleweed. When a proofer selects to WordCheck the text, WordCheck evaluates the text at three levels: World, Site, and Project. At each level words are added or removed from the Flagged word list in order to determine the words to be flagged in the page text for the proofer to evaluate. Here's an example of how the "flagging" process works, level by level. World Current list of Flagged words entering level: none At the World level, the text is run through an external spell-checker (such as aspell) using the dictionaries of the project's Primary and Secondary (if specified) languages. In this case the text would be checked against the English dictionary. The results depend on the particulars of the spell-checker and dictionary, but lets assume that the following words are flagged as misspelled or Bad: Lubbock and tumbleweed Current list of Flagged words leaving level: Lubbock tumbleweed Site Current list of Flagged words entering level: Lubbock tumbleweed At the Site level, the text is checked for possible stealth scannos, that is OCR software errors which resulted in valid/correctly spelled, but yet incorrect words. In addition, words may be checked against a series of patterns that are frequently incorrect such as a word containing both alphabetic and numeric characters. In the text above, the following would be flagged as Bad: arid (a common stealth scanno) and 1and (matches a suspicious pattern). Current list of Flagged words leaving level: Lubbock tumbleweed arid 1and Project Current list of Flagged words entering level: Lubbock tumbleweed arid 1and The Project level allows the Project Manager to have more control over which words are considered Good and Bad. At this level the Flagged words are compared to the project's Good Words List. Any words found on the project's Good Words List are assumed to be correct and are removed from the page's list of Flagged words. This would result in Lubbock being removed from the Flagged words for this page. Also at this level, the text is compared against the project's Bad Words List. Any words in the text that are found on the project's Bad Words List are added to the list of Flagged words for this page. For this example, fiat is added to the list. Current list of Flagged words leaving level: tumbleweed arid 1and fiat The final list of Flagged words would be presented to the user and prompt the user to correct or accept them. The proofer might click the Unflag All button ( Because arid is a Site-level Bad word (a stealth scanno in this case), it will not have an Unflag All button. This will force the proofer to look closely at all instances. In this situation the first instance of arid is correct while the second instance of the word is a scanno for the word and. How does capitalization affect the word lists?Good and Bad words are treated as exact matches and therefore are capitalization specific, for example "Lubbock" and "lubbock" are considered separate words. Proofer QuestionsWhy should I use a spell-checker? I'm a good speller!WordCheck does much more than simply check the text for misspelled words -- it helps detect scannos and other OCR errors. It is intended to flag words which are not in the dictionaries and Good Word Lists, because such words are often situations where the OCR process has confused a letter or word with one that is visually similar. Since it is often visually similar, it is easy for a proofer to skip over, "seeing" it as the correct word. The Unflag All button exists for the common case where the word has been correctly transcribed, but isn't in the dictionaries. The spell checker is also used to flag words which are commonly incorrectly identified by OCR. The classic example is "arid" which is a perfectly good word, but is often a scanno for "and", a much more common word. Another example is "modem", which is very uncommon in books from before the 1960s, but can easily be a scanno for "modern". The checker will attempt to flag these kinds of situations for the proofer's attention, so that the proofer can consider them carefully, and take proper action in each case. Should I run WordCheck before or after I "manually" proof a page?The answer to this question is entirely up to you. Some people will like to use WordCheck as a "first pass" through the page text to catch the more obvious OCR errors, and to highlight potential typographical errors and stealth scannos. Some folks believe that finding and fixing those types of errors before they proof the page in regular text-editing mode eliminates them as a possible source of distraction at finding other errors remaining in the page. Other people will prefer to proof the page in text-editing mode first, and then use the WordCheck as a "final pass" through the page to re-check the punctuation and potential stealth scannos one more time. Some folks feel a great deal of satisfaction in finding that any word which WordCheck may flag is actually a "false flag" since they see it as an affirmation of their proofreading skills. And other proofers will prefer other approaches to using WordCheck. Thus, run WordCheck at the time when it best fits into your particular page proofreading method. What's the "Unflag All & Suggest" button (
|