A generic text template for subdomains of a larger domain
Motivation
The painting verbalization grammar, released in D8.2, is built for one ontology and the text building functions use words and expressions that are fixed for paintings. To verbalize an ontology of a different museum, for example a war museum, the grammar could be copypasted and relevant parts modified; however, a preferable solution would involve abstraction rather than repeating code. The motivation and the goal for this work is stated in D4.2:
"In analogy with the resource grammar API, we can envisage extending the museum case into a reusable library of verbalization/textualization patterns. In order to maintain simplicity, the GF templates can be made more abstract. Instead of hard-coding ontology specific description words (paint, display, size units), we generalize them as parameters chosen according to the domain and the ontology in question. UHEL has conducted some tests in this direction, generalizing the museum case patterns to more generic object description patterns."
Realistically speaking, the differences should be more subtle -- more about different kinds of art objects within a museum than different kinds of museums. For example, paintings and sculptures would have enough in common to use the same discourse patterns, but slightly different word choices, like painter and sculptor, but paintings and tanks are different enough that the benefit of finding common description patterns would probably be small.
Grammar design
The structure of the GF grammar is as follows. The code is found (with small differences, explained further) in TextTemplate.zip
.
instance interface instance LexArtEng - - - - LexEng - - - - LexWarEng | | | | incomplete | resource resource resource TextArtEng - - - TextEng - - - - TextWarEng abstract Museum = { cat <generic categories> fun <descriptions> } abstract ArtMuseum = abstract WarMuseum = Museum ** { Museum ** { fun fun <art museum specific> <war museum specific> } } incomplete concrete MuseumEng = open TextEng in { lincat <generic categories linearized> lin <using textgen opers from TextEng> } concrete ArtMuseum = MuseumEng concrete WarMuseum = MuseumEng with (TextEng=TextArtEng) ** { with (TextEng=TextWarEng) ** { lin lin <art museum specific> <war museum specific> } }
Ignoring the first six modules, the grammar is a Domain
that is extended by SubDomain = Domain ** {...}
.
Adding concepts to the grammar should happen via SubDomainL
with user-friendly functions, in the style mkConcept "str"
. The morphological functions are hidden in DomainL
as following,
lin Item = NP ; Author = NP ; oper mkItem : Str -> Item = \n -> mkNP (mkN n) ; mkAuthor : Str -> Author = \a -> mkNP (mkPN a) ;
and used in SubDomainL
:
lin Mona_Lisa = mkItem "Mona Lisa" ; PortraitPainting = mkItemType "portrait" ; Rembrandt = mkAuthor "Rembrandt" ; Ateneum = mkMuseum "Ateneum" ; Wood = mkMaterial "wood" ;
This, in itself, is a recommended design principle (see e.g. D2.3 5.2.2): a base grammar and domain extensions. The content of SubDomain
should be mostly lexical; the idea is that the textualization patterns are same for all subdomains (except some lexical choices) and they can be all linearized in the common part.
The lexical variance in the textualization patterns is what the first six modules are for. The abstract text descriptions in Domain
are linearized using , which is an incomplete resource module with parameterized text generation functions. For example, the following function in TextEng
describes the author of an item:
incomplete resource TextEng = open SyntaxEng, LexEng in { oper AuthorText : NP -> NP -> Text = \item,author -> mkText (mkS pastTense (mkCl item (mkVP (passiveVP make_V2) (mkAdv by8agent_Prep author)))) ;
The verb make_V2
is from the interface LexEng
, and it might have different values in LexArtEng
and LexWarEng
, for example paint and manufacture respectively.
The result is the following; the function Authorship
is linearized differently in WarMuseum and ArtMuseum, even though in the concrete syntax it is explicitly written just as Authorship = AuthorText ;
.
WarMuseum> gt -tr | l IType Pasi BattleTank Authorship Pasi FinnishArmy Pasi is a tank . Pasi was manufactured by Puolustusvoimat . ArtMuseum> gt -tr | l IType Mona_Lisa PortraitPainting Authorship Mona_Lisa Rembrandt Mona Lisa is a portrait . Mona Lisa was painted by Rembrandt .
GF questions
For (unknown) technical reasons, making a concrete syntax by extending an incomplete concrete doesn't work. So instead of the design on the left side, the code in the attached file is done as on the right side.
concrete ArtMuseum = MuseumEng concrete ArtMuseum = MuseumEng ** with (TextEng=TextArtEng) ** { open TextArtEng in { lin lin <art museum specific> <war museum specific> Authorship = AuthorText ; } }
This means that, instead of writing Authorship = AuthorText ;
only in MuseumEng
, the line is repeated in each MuseumSubdomainEng
. That is not a big problem with regard to abstraction; the function AuthorText
is still defined in only one place.
The second decision to make is the types of the arguments in the text patterns. The functions in TextEng could operate on GF resource grammar types, such as CN, NP and Adv. This has the downside that the functions will look messy, and it is easy to make mistakes if trying to modify them. For instance, a slightly longer description looks like this:
DescriptionText : NP -> CN -> NP -> NP -> Adv -> NP -> Text =\item,itype,author,museum,year,material -> ...
On the one hand, using RGL types makes the functions usable for any grammar -- although this is not very realistic concern. Other idea is to connect TextEng to the categories defined in MuseumEng, as in following:
incomplete resource TextEng = MuseumEng [Item,Author,ItemType,GenText] ** open SyntaxEng, LexTemplate in { oper AuthorText : Item -> Author -> GenText = \item,author -> ...
The body of the function would still consist of mkTexts and mkNPs, so that Item
, Author
and GenText
are nothing more than type synonyms, for easier readability. If the lincats for those types are changed in MuseumEng
, the functions in TextEng
need to be changed too.
Ontology compatibility
The grammar in D8.2 uses a database, where the existing paintings are defined as types, and the textualization function accepts only a combination of parameters for which there is a type.
data MkVerifiedText : (pg : Painting) -> (pr : Painter) -> (pt : PaintingType) -> (cr : OptColour) -> (se : OptSize) -> (ml : OptMaterial) -> (yr : OptYear) -> (mm : OptMuseum) -> CompletePainting pg pt pr yr mm cr se ml -> VerifiedText ; GSM940042ObjPainting : CompletePainting GSM940042Obj MiniaturePortrait JKFViertel (MkYear (YInt 1814)) (MkMuseum GoteborgsCityMuseum) (MkColour Grey) (MkSize (SIntInt 349 776)) (MkMaterial Wood) ;
There should be no problem in using the generic text template with this approach to ensure only valid combinations of data. (Haven't tested it yet though.)
Summary
- What it is?
- A parameterized text template for a domain and its subdomains.
- What's it good for?
- To avoid copypaste when making textual representation patterns for things that are almost the same. For example, an art museum has an ontology that contains paintings, sculptures and wood carvings. We can make
MuseumL
to contain lincats andmkConcept "str"
type of end-user constructors, andPaintingL
andWoodCarvingL
to contain all paintings and wood carvings in the collection. Parameterized textualization patterns for different items can be defined inTextL
, and right word choices come fromTextPaintingL
andTextCarvingL
.
Attachment | Size |
---|---|
TextTemplate.zip | 3.68 KB |
- Login to post comments
What links here
No backlinks found.