- A named query result can be ``translated'' into an aligned corpus, which
allows more flexible display of the aligned regions, access to metadata, etc.
- Consider the following example:
> Zeit = [lemma = "Zeit"];
- The NQR Zeit now contains all occurrences of the German word for time in the German part of Europarl.
The following command ``translates'' the NQR to the English part of Europarl, i.e. it replaces each match by the complete aligned region in the target corpus (as would be displayed with show +europarl-en;.
> Time = from Zeit to EUROPARL-EN;
- This creates a new NQR EUROPARL-EN:Time containing the aligned regions. You can now e.g. tabulate metadata:
> tabulate EUROPARL-EN:Time match text_date;
- Some important notes are in order:
- matching ranges that are not aligned to the target corpus are silently discarded; you cannot expect the new NQR to contain the same number of matches as the original NQR
- if there are multiple matches in the same alignment bead, they will not be collapsed in the target corpus; i.e. the new NQR will contain several identical ranges
- in order to collate source matches with the aligned regions, make sure to discard unaligned hits from the original NQR first; the only practicable solution at the moment seems to specify an empty alignment constraint in the query:
> Zeit = [lemma = "Zeit"] :EUROPARL-EN ;
or post-hoc as a subquery filter:
> ZeitAligned = <match>  :EUROPARL-EN  !;
- the somewhat arcane syntax of the new command avoids introduction of a new reserved keyword such as translate
- while it looks similar to a corpus query or set operation, the assignment to a new NQR is mandatory (otherwise the parser won't accept the syntax)
- note that the new NQR isn't fully qualified with a corpus name; the name of the target corpus is implied and added automatically with the assignment
- don't cat the translated query directly (cat EUROPARL-EN:Time;), as this will mess up your context descriptor due to a long-standing bug in CQP; this is also the reason why the assignment to a NQR is mandatory
- it is safe to apply dump, tabulate and similar operations, though
A random note
Notefrom stefan on using a-atts:
A brief note on using alignment information in CQP, for the VMGERMAN-VMENGLISH alignment.
The following commands are typed in a CQP session ):
set Context 1 s;
# sentence alignment makes most sense if you're also viewing sentence context
# some CQP query, here German words starting with "Bahn-"
# activate display of sentence alignment
# redisplays query result, now giving aligned sentence for every query match
"Bahn.+" :VMENGLISH "rail.*";
# only those matches where aligned sentence contains "rail" or a similar word
"Bahn.+" :VMENGLISH ! "rail.*";
# only those matches where aligned sentence does NOT contain "rail"