The acronym CAT stands for “Computer Aided Translation”. CAT tools are software that support the human translator by providing databases and a wealth of useful functions. CAT tools should not be confused with pure machine translation.
Benefits of using a CAT tool
In contrast to machine translation, which requires large amounts of data and high computing power, CAT programs will run on a conventional Windows PC. A variety of data formats, such as Excel, Word, InDesign and XML, can be processed by the CAT tool, which converts them into its own native format in which the translator works. Once the translation has been completed, it is converted back into its original format. The databases used by CAT tools must first be created and populated by the user.
The source text and translation are displayed side-by-side in the editor. When a file is imported, the CAT tool segments the text, which is divided into so-called translation units. As a rule, the period at the end of a sentence or paragraph triggers segmentation. Most CAT tools display the text sentence by sentence, but some offer paragraph by paragraph segmentation.
As with conventional text editing, the translator has formatting tools at his or her disposal, as well as grammar and spell checkers in addition to other tools for quality assurance. The correct representation of number formats, complete brackets, correct punctuation marks, superfluous spaces and – depending on the selected settings – the maximum length of a segment can all be verified.
What is translation memory?
The CAT tool databases are initially empty. Translation memories must be created by the user and populated with sentence pairs during translation. Termbases are separate databases in which technical terminology can be saved alongside example sentences, definitions and other information. Both types of database can be integrated into the editor. During translation, the CAT tool displays existing translation units and technical terms. The translator checks these and can incorporate them into the translation and adapt them as required.
When translating, the CAT tool extracts existing content from its “memory”, which makes the translation more consistent, even if different translators contribute to it. Within the CAT tool, the final decision lies with the translator, and even the translation memory contains exact matches, the so-called 100% matches, a translator must check and accept the CAT tool’s suggested translation. Any hits with lower match rates, the so-called fuzzy matches, must be processed accordingly. More information about Fuzzy Matches and how inconsistent texts affect them can be found in other articles in this magazine.
The different types of matches
CAT tools are used to store translations produced by a translator in a database, usually as sentence pairs and very large databases may be created in the course of a long-term collaboration with a given language service provider, so there is a good chance that a number of matches will appear in new, but related projects. Think, for example, of the “boilerplate” texts often included in press releases, or the standard headings for user manuals.
When a new text is entered in the CAT tool, it is automatically compared with the database that already contains sentence pairs from completed translations, I.e., the new text is checked for the degree of conformity with existing content. If a new text contains sentences that are identical to those translated in a previous project, the match analysis function will report them as 100% matches or even context matches. Context matches are 100% matches that are already stored in the translation memory with matching adjacent sentences. Matches below 100% are divided into groups or intervals, for example 90-99%, 80-89% or 70-79%. A match below 70% – for some tools below 50% – is normally considered a no match , since translators usually have to completely reformulate the sentence in question.
Cost savings through matches
Language service providers usually grant their customers discounts, depending on the match rates for the text in question. A long-term collaboration with a specific service provider is therefore worthwhile, because the databases grow over time often resulting in higher match rates. Nevertheless, 100% matches still have to be remunerated.
What causes inconsistencies?
CAT tools use different data sets. If the text is imported into a tool in which the database of previously translated content – the translation memory – is very large, the probability of a higher match hit rate increases. However, a CAT tool in which the databases are empty will produce no matches. These differences can also occur between an internal user, who has direct access to all data on a server, and an external translator, to whom only an incomplete excerpt from the database has been sent by means of a data package.
But even with the same dataset, differences can occur because different CAT tools react differently to different formatting, line breaks, paragraph marks, embedded objects, links, tags, and so on. It also depends on the fine tuning of the respective CAT tool. CAT tools can be set so that there are deductions from matches, for example, if hits originate from the database of another subject area or another type of text. Such deductions are deliberately made to ensure that such hits are accurately checked by the translator.
In addition, the file format itself can determine how the analysis result is displayed. Formats such as HTML, RTF, or IDML files can contain tags, macros, or formatting that differ from those in the existing database, even if an identical translation job has already been completed, so 100% matches can quickly become fuzzy matches. Different segmentation, for example due to different line breaks in a PowerPoint slide or IDML document, can also reduce the match rate.