doc/kbabel/dictionaries.docbook


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517

<!-- <?xml version="1.0" ?>
<!DOCTYPE chapter PUBLIC "-//KDE//DTD DocBook XML V4.2-Based Variant V1.1//EN" "dtd/kdex.dtd"> -->
<!-- Uncomment the previous two lines to validate this document -->
<!-- standalone.  Be sure to recomment them before attempting to -->
<!-- process index.docbook -->

<chapter id="dictionaries">

<chapterinfo>
<!-- Fill in this section if this document has a different author -->
<authorgroup>
<author>
<personname><firstname></firstname><surname></surname></personname>
</author>
</authorgroup>

<!-- TRANS:ROLES_OF_TRANSLATORS -->
</chapterinfo>

<title>Dictionaries</title>

<para>&kbabel; has 3 modes which can be used to search translated
<acronym>PO</acronym> message strings:</para>

<itemizedlist>
     <listitem>
       <para>Searching translation, using a translation database
       </para>
     </listitem>
      <listitem>
         <para>Rough translation
	 </para>
      </listitem>
      <listitem>
         <para>&kbabeldict;
	 </para>
      </listitem>
</itemizedlist>

<sect1 id="database">
<!-- FIXME: settings -->
<title>Translation database</title>

<!-- ### TODO: only *one* file? Seems more to be four... -->
<para>Translation database allows you to store translations in a
database based on Berkeley Database IV, &ie; it is stored in a binary
file on your disk. The database guarantees fast searching in a large
number of translations.</para>

<para>This mode is the one best integrated with &kbabel;. Besides
searching and rough translation it also supports the following
features:</para>

<itemizedlist>
<listitem>
<para>Every new translation typed in the &kbabel; editor can be
automatically stored in the database.</para>
</listitem>
<listitem>
<para>This database can be used for <quote>diff</quote>-ing
<acronym>msgid</acronym>.</para>
</listitem>
</itemizedlist>

<para>Of course, the more translations are stored in the database, the
more productive you can be. To fill the database, you can use the
<guilabel>Database</guilabel> tab in the preferences dialog or you can
turn on automatic addition of every translated messages on the same
tab.</para>

<sect2 id="database-settings">
<title>Settings</title>
<para>
You can configure this searching mode and how it should be used by selecting
<menuchoice>
  <guisubmenu>Settings</guisubmenu>
  <guisubmenu>Configure Dictionary</guisubmenu>
  <guimenuitem>Translation Database</guimenuitem>
</menuchoice>
in &kbabel; menu.
</para>
<para>
The <guilabel>Generic</guilabel> tab contains general settings for searching in the
database.
</para>
<variablelist>
  <varlistentry>
    <term><guilabel>Search in whole database (slow)</guilabel></term>
    <listitem>
    <para>
    Do not use <quote>good keys</quote>, search in the whole database.
    This is slow, but will return the most precise results.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Search in list of "good keys" (best)</guilabel></term>
    <listitem>
    <para>
    Use <quote>good keys</quote> strategy. This option will give you the
    best tradeoff between speed and exact matching.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Return the list of "good keys" (fast)</guilabel></term>
    <listitem>
    <para>
    Just return <quote>good keys</quote>, do not try to eliminate any more
    texts. This is the fastest provided method, but can lead to a quite large
    number of imprecise matches.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Case sensitive</guibutton></term>
    <listitem>
    <para>
    Distinguish case of letters when searching the text.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Normalize white space</guibutton></term>
    <listitem>
    <para>
    Skip unnecessary white space in the texts, so the searching will ignore small
    differences of white space, &eg; number of spaces in the text.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Remove context comment</guibutton></term>
    <listitem>
    <para>
    Do not include context comments in search. You will want this to be turned on.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Character to be ignored</guilabel></term>
    <listitem>
    <para>Here you can enter characters, which should be ignored while searching.
    Typical example would be accelerator mark, &ie; &amp; for &kde; texts.
    </para>
    </listitem>
  </varlistentry>
  </variablelist>
<para>
The <guilabel>Search</guilabel> tab contains finer specification for searching the text.
You can define how to search and also allows to use another special way of searching
called <emphasis><guilabel>Word substitution</guilabel></emphasis>. By substituting
one or two words the approximate text can be found as well. For example, assume you
are trying to find the text <userinput>My name is Andrea</userinput>.
</para>
<variablelist>
  <varlistentry>
    <term><guilabel>Equal</guilabel></term>
    <listitem>
    <para>
    Text from database matches if it is the same as the searched string. In our example it can
    be <emphasis>My name is &amp;Andrea</emphasis> (if &amp; is set as ignored character
    in <guilabel>Characters to be ignored</guilabel> on <guilabel>Generic</guilabel> tab).
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Query is contained</guilabel></term>
    <listitem>
    <para>
    Text from database matches if the searched string is contained in it. For our example it can
    be <emphasis>My name is Andrea, you know?</emphasis>.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Query contains</guilabel></term>
    <listitem>
    <para>
    Text from database matches if the searched string contains it. For our example it can
    be <emphasis>Andrea</emphasis>. You can use this for enumerating the possibilities to
    be found.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Regular Expression</guibutton></term>
    <listitem>
    <para>
    Consider searched text as a regular expression. This is mainly used for
    &kbabeldict;. You can hardly expect regular expressions in PO files.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Use one word substitution</guibutton></term>
    <listitem>
    <para>
    If the query text contains less words than specified below, it also
    tries to replace one of the words in the query. In our example it will
    find <emphasis>Your name is Andrea</emphasis> as well.
    </para>
    </listitem>
  </varlistentry>
<varlistentry>
    <term><guibutton>Max number of words in the query</guibutton></term>
    <listitem>
    <para>
    Maximal number of words in a query to enable one word substitution.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Local characters for regular expressions</guilabel></term>
    <listitem>
    <para>
    Characters to be considered part of regular expressions.
    </para>
    </listitem>
  </varlistentry>
  </variablelist>
<note>
<para>
Two-word substitution is not implemented yet.
</para>
</note>
</sect2>

<sect2 id="database-fill">
<title>Filling the database</title>
<para>
The <guilabel>Database</guilabel> tab allows to define where is the database stored on
disk (<guilabel>Database folder</guilabel>) and if it should be used for automatic
storing of the new translations (<guibutton>Auto add entry to database</guibutton>).
In this case you should specify the author of the new translation in <guilabel>Auto added
entry author</guilabel>.
</para>
<para>
The rest of the tab allows you to fill the database from PO files that already exist. Use one
of the buttons in the middle of the dialog box. The progress of the file load will be
shown by progress bars below the buttons. The <guilabel>Repeated strings</guilabel>
button should be used in the special case where one translated string is repeated many
times, to prevent storing unnecessary copies. Here you can limit the stored strings.
</para>
<screenshot>
<screeninfo>Filling the database</screeninfo>
<mediaobject>
<imageobject>
<imagedata fileref="dbcan.png" format="PNG"/>
</imageobject>
<textobject><phrase>Filling the database by existing PO-files</phrase></textobject>
</mediaobject>
</screenshot></sect2>

<sect2 id="database-goodkeys">
<title>Defining good keys</title>
<para>
On the <guilabel>Good keys</guilabel> tab are the thresholds to specify how to fill 
the list of good keys.
<guilabel>Minimum number of query words in the key (%)</guilabel> specifies exactly that.
Text will need to contain only this per cent of the words to qualify as good key. Opposite can
be specified via <guilabel>Minimum number of words of the key also in the query (%)</guilabel>.
The length of the words can be set by <guilabel>Max length</guilabel> spinbox.
</para>
<para>Searched text typically contains number of generic words, &eg; articles. You can
eliminate the words based on the frequency. You can discard them by 
<guilabel>Discard words more frequent than</guilabel> or consider as always present by
<guilabel>frequent words are considered as in every key</guilabel>. This way the
frequent words will be almost invisible for queries.
</para>
</sect2>
</sect1>


<sect1 id="auxiliary">
<title>Auxiliary PO file</title>

<para>This searching mode is based on matching the same original
English string (the msgid) translated in some other language in an
auxillary <acronym>PO</acronym> file.  It is very common for Romance
<!-- ### TODO: is "Anglo-Saxon" not too Romance or too technical for an English text? -->
languages to have similar words, similarly for Anglo-Saxon and
Slavic ones.</para>

<para>
For example, say I wanted to translate the word
<quote>on</quote>, from <filename>tdelibs.po</filename>, into Romanian
but have no clue.  I look in the same file for French and find
<foreignphrase lang="fr">actif</foreignphrase>, and in the Spanish one find
<foreignphrase lang="es">activado</foreignphrase>.  So, I conclude that the best one in Romanian
will be <foreignphrase lang="ro">active</foreignphrase>.
(Of course, in English instead of <quote>on</quote> the word could have been
<quote>active</quote> or <quote>activated</quote>,
which would have made the translation process easier.)
&kbabel; automates this task. Currently you can define only one auxiliary file to search.
</para>

<sect2 id="auxiliary-settings">
<title>Settings</title>
<para>
You can configure this searching mode by selecting
<menuchoice>
  <guisubmenu>Settings</guisubmenu>
  <guisubmenu>Configure Dictionary</guisubmenu>
  <guimenuitem>PO Auxiliary</guimenuitem>
</menuchoice>
from the &kbabel; menu.</para>

<para>In the <guilabel>Configure Dictionary PO Auxiliary</guilabel>
dialog you can select the path to the auxiliary <acronym>PO</acronym>
file.  To automate <acronym>PO</acronym>-file switching when you
change current edited file there are many variables delimited by
<literal>@</literal> char that are replaced by appropriate
values:</para>

<variablelist>
  <varlistentry>
    <term>@PACKAGE@</term>
    <listitem><para>
      The name of application or package currently being translated.
      For example, it can expand to kbabel, tdelibs, konqueror
      and so on.
    </para></listitem>
  </varlistentry>
  <varlistentry>
    <term>@LANG@</term>
    <listitem><para>
      The language code.
      For example can expand to: de, ro, fr etc.
    </para></listitem>
  </varlistentry>
  <varlistentry>
    <term>@DIRn@</term>
    <listitem><para>
      where <quote>n</quote> is a positive integer.  This expands to
      the <quote>n</quote>-th folder counted from the filename (right to 
      left).
    </para></listitem>
  </varlistentry>
</variablelist>

<para>The edit line displays the actual path to the auxiliary
<acronym>PO</acronym> file. While it is best to use the
provided variables in a path it is possible to choose an absolute,
real path to an existing <acronym>PO</acronym> file.  Let's take an
example.</para>

<para>I'm Romanian and I have some knowledge about French language and
I work on &kde; translation.</para>

<!-- ### TODO: check URL, especially the kde-l10n part -->
<para>First step is to download a very fresh
<filename>kde-l10n-fr.tar.bz2</filename> from the <ulink
url="ftp://ftp.kde.org/pub/kde/snapshots/kde-l10n">&kde; &FTP;
site</ulink> or to use the <acronym>CVS</acronym> system to put on my
hard-disk a French translation tree. I do this into
<filename>/home/clau/cvs-cvs.kde.org/kde-l10n/fr</filename>.</para>

<para>My <acronym>PO</acronym> sources folder is in
<filename>/home/clau/cvs-cvs.kde.org/kde-l10n/ro</filename>.  Do not
forget to select <guilabel>PO Auxiliary</guilabel> as the default
dictionary and check <guilabel>Automatically start search</guilabel>
on the <guilabel>Search</guilabel> tab from &kbabel;'s
<guilabel>Preferences</guilabel> dialog.</para>

</sect2>
</sect1>

<sect1 id="compendium">
<!-- FIXME: examples -->
<title>PO compendium</title>

<para>A compendium is a file containing a collection of all
translation messages (pairs of <acronym>msgid</acronym> and
<acronym>msgstr</acronym>) in a project, &eg; in &kde;. Typically,
compendium for a given language is created by concatenating all
<acronym>PO</acronym> files of the project for the
language. Compendium can contain translated, untranslated and fuzzy
messages. Untranslated ones are ignored by this module.  </para>

<para>Similarly to Auxiliary <acronym>PO</acronym>, this searching
mode is based on matching the <quote>same</quote> original string
(<acronym>msgid</acronym>) in a compendium. Currently you can define
only one compendium file to search.  </para>

<para>This mode is very useful if you are not using the translation
database and you want to achieve consistent translation with other
translations. By the way, compendium files are much easier to share
with other translators and even other translation projects because
they can be generated for them as well.  </para>

<sect2 id="compendium-settings">
<title>Settings</title>

<para>
You can configure this searching mode by selecting
<menuchoice>
  <guisubmenu>Settings</guisubmenu>
  <guisubmenu>Configure Dictionary</guisubmenu>
  <guimenuitem>PO Compendium</guimenuitem>
</menuchoice>
in &kbabel;'s menu.
</para>

<para>In <guilabel>Configure Dictionary PO Compendium</guilabel>
dialog you can select the path to a compendium file. To automate
compendium file switching when you change the translation language,
there is a variable delimited by <literal>@</literal> char which si
replaced by appropriate value:</para>

<variablelist>
  <varlistentry>
    <term>@LANG@</term>
    <listitem><para>
      The language code.
      For example can expand to: de, ro, fr etc.
    </para></listitem>
  </varlistentry>
</variablelist>

<para>In the edit line is displayed the actual path to compendium
<acronym>PO</acronym> file. While you had best use provided variables in
path, it's possible to choose an absolute, real path to an existing
<acronym>PO</acronym> file to be used as a compendium.</para>

<!-- ### TODO: check URL, especially the kde-l10n part -->
<para>A very fresh compendium for &kde; translation into &eg; French
you can download <filename>fr.messages.bz2</filename> from the <ulink
url="ftp://ftp.kde.org/pub/kde/snapshots/kde-l10n">&kde; &FTP;
site</ulink>.  </para>

<para>You can define how to search in the compendium using options
below the path. They are divided into two groups: text-matching
options, where you can specify how the text is compared and whether to
ignore fuzzy translations, and message-matching options, which
determine if the translation from compendium should be a substring of
searching message or vice versa.</para>

<variablelist>
  <varlistentry>
    <term><guilabel>Case sensitive</guilabel></term>
    <listitem>
    <para>
    If the matching of message in compendium should distinguish between uppercase and lowercase letters.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Ignore fuzzy string</guilabel></term>
    <listitem>
    <para>
    If the fuzzy messages in the compendium should be ignored for searching. The compendium can contain fuzzy messages, since it is typically created by concatenating the <acronym>PO</acronym> files of the project which can include fuzzy messages. Untranslated ones are ignored always (You can't search for translation in untranslated messages, right?)</para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term><guilabel>Only whole words</guilabel></term>
    <listitem>
    <para>
    If the matching text should start and end at the boundaries of words.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>A text matches if it <guilabel>is equal to search text</guilabel></term> 
    <listitem>
    <para>
    A text in compendium matches the search text only if it is exactly the same (of course using the options above).
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>A text matches if it <guilabel>is similar to search text</guilabel></term>
    <listitem>
    <para>
    A text in compendium matches the search text only if it is <quote>similar</quote>. Both texts are compared by short chunks of letters (<quote>3-grams</quote>) and at least half of the chunks has to be same.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>A text matches if it <guilabel>contains search text</guilabel></term>
    <listitem>
    <para>
    A text in compendium matches the search text if it contains the search text.</para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>A text matches if it <guilabel>is contained in search text</guilabel></term>
    <listitem>
    <para>
    A text in compendium matches the search text if it is contained the search text.
    </para>
    </listitem>
  </varlistentry>
  <varlistentry>
    <term>A text matches if it <guilabel>contains a word of search text</guilabel></term>
    <listitem>
    <para>
    The texts are divided to words and a text in compendium matches the search text only if it contains some word from the search text.
    </para>
    </listitem>
  </varlistentry>
</variablelist>
</sect2>
</sect1>
</chapter>
<!--
Local Variables:
mode: xml
sgml-minimize-attributes:nil
sgml-general-insert-case:lower
sgml-indent-step:0
sgml-indent-data:nil
End:

vim:tabstop=2:shiftwidth=2:expandtab 
-->