Multilingual CNL-based Semantic Wiki
Laura Canedo, Norbert E. Fuchs, Kaarel Kaljurand,
Tobias Kuhn, Victor Ungureanu
Institute of Computational Linguistics, University of Zurich
MOLTO Final Meeting, Barcelona
2013-05-23

Presenter Notes

Contributors: grammar: John, Aarne, Inari; GF cloud: Thomas; evaluation: Jussi, Maarit; users/testers: Olga, Jordi.

Structure of the talk

  • current wiki systems
    • types
    • shortcomings
  • multilingual CNL-based semantic wiki
    • definition
    • possible use cases
  • AceWiki-GF
    • combination and extension of Attempto Controlled English, Grammatical Framework, AceWiki
    • implementation
    • evaluation

Presenter Notes

Do not spend too much time on this slide.

Existing wiki systems

  • wiki
    • user-friendly collaborative environment for knowledge management
    • content typically unconstrained natural language (NL), therefore not easily automatically processable
    • powered by software, e.g. MediaWiki
    • e.g. Wikipedia
  • semantic wiki (= wiki + formal semantics)
    • provides: richer query language, consistency checking (via automatic reasoning)
    • content typically NL + typed links (i.e. RDF triples)
    • software: Semantic Mediawiki, ...
  • CNL-based semantic wiki (= semantic wiki using CNL)
    • formal languages hidden (=> can use more expressive formal languages)
    • software: AceWiki

Presenter Notes

Shortcomings: cannot copy content from one language to the other, cannot ask questions, cannot check that the different versions of an article in different languages as about the same thing.

Multilingual CNL-based Semantic Wiki

  • multiple languages
    • natural: English, German, ...
    • formal: first-order logic, OWL, ...
    • languages for content vs user interface
  • CNL-based
    • backed by formal grammar(s)
    • formal languages are hidden
  • semantic
    • content automatically kept in sync via precise translation
    • consistency checking, question answering, ... (depending on the domain)
  • wiki
    • user-friendly
    • collaborative

Presenter Notes

Possible use cases

  • multilingual ontology editor
    • e.g. environment where users agree on the content and multilingual vocabulary of an OWL-style geography ontology
  • collection of tourist phrases
    • book structure (chapters and sections)
    • multilingual content presented in parallel
  • catalog of museum objects (paintings, painters)
    • each object on its own wiki page
    • rich queries (e.g. "which Dutch painter painted which French painter?")

Presenter Notes

AceWiki-GF

Presenter Notes

AceWiki-GF is the name that we currently use to refer to the outcome of WP11.

Background technologies (as of 2011)

  • Attempto Controlled English (ACE)
    • subset of natural English
    • well-defined translation to and from first-order logic, OWL, ...
    • end-user documentation: construction and interpretation rules
    • monolingual, fixed grammar, deterministic ambiguity handling
  • Grammatical Framework (GF)
    • framework for multilingual grammar engineering
    • parser (translation, completion, ...) and libraries (resource grammar)
    • no GF-based wiki system connecting users, texts and grammars
  • AceWiki
    • expressive semantic wiki system
    • front-end language: ACE
    • background reasoning language: OWL
    • monolingual, fixed grammar, no ambiguity handling

Presenter Notes

Tasks of Work Package 11

  • D11.1: integrate ACE with GF (ACE-in-GF)
    • implement a multilingual grammar of ACE in the GF framework
    • cover the European languages supported by the GF resource grammar
    • joint work with the University of Gothenburg
  • D11.2: integrate AceWiki with GF (AceWiki-GF)
    • implement connection to GF tools (GF Webservice / Cloud Service)
    • add support for the management of multilinguality, ambiguity, grammar
    • make the user interface multilingual
  • D11.3: evaluate the outcome
    • multilingual and collaborative aspects of AceWiki-GF
    • translation accuracy of the ACE-in-GF grammar
    • suitability of the AceWiki-GF platform for other GF grammars
    • joint work with the University of Helsinki

Presenter Notes

The MOLTO partners tested the suitability of the AceWiki-GF platform for other GF grammars, such as MathTalk and Painting.

ACE-in-GF (main idea)

An ACE grammar implemented in GF adds multiple natural languages as front-ends to ACE. As a result, these languages can be mapped to and from various formal languages already supported by ACE.

Multilinguality

Presenter Notes

ACE in GF

  • implementation of the ACE syntax
    • extension of Angelov and Ranta (CNL 2009)
    • focus on the subset of ACE that can be mapped to OWL
    • almost 100% coverage at almost 0% ambiguity
    • no direct generation of discourse representation structures (DRS)
  • multilinguality
    • support most RGL languages, among them the European languages Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Latvian, Norwegian, Polish, Romanian, Russian, Spanish, Swedish
    • RGL-based design provides automatic increase in quality and language-coverage over time
  • status
    • some precision problems, e.g. anaphoric references do not obey DRS accessibility constraints
    • ambiguity and coverage problems in some languages

Presenter Notes

More development effort has gone into German, Spanish and Finnish. Other implementations have holes in the coverage of ACE constructs that are not provided by the RGL.

AceWiki integration with GF

  • wiki content is based on a (single) GF grammar
    • provided by GF Webservice / Cloud service
    • optimized for ACE-in-GF (but other GF grammars can also be used)
  • wiki entry is GF abstract tree set
    • viewed via linearization(s)
    • can represent ambiguity
  • multilingual viewing and editing of wiki content
    • grammar-based look-ahead editing that shows next possible tokens
    • ambiguity resolution via another concrete language
  • grammar integrated into the wiki
    • grammar modules as wiki articles (wiki-linking of grammar and content)
    • grammar can be changed while editing the wiki

Presenter Notes

ACE-based geography article

Presenter Notes

Depicted are the ACE version and the German version (containing the look-ahead editor).

Note that the UI is language dependent.

Ambiguity resolution


Presenter Notes

Ambiguity between object and subject relative clause. Occurs in German and Dutch. The wiki users can choose the correct tree by looking at the tree set in a language other than German, e.g. DisambGer (if it exists).

Grammar module page

Presenter Notes

GF source editing is available in the GF Cloud Service. AceWiki-GF just reflects that. Some types of errors can be pinpointed.

Reasoning via translation to OWL

Every country that does not border a sea is a landlocked-country.

SubClassOf(
   ObjectIntersectionOf(
      :country
      ObjectComplementOf(
         ObjectSomeValuesFrom(
            :border
            :sea
         )
      )
   )
   :landlocked-country
)

Which country is a landlocked-country?

ObjectIntersectionOf(
    :country
    :landlocked-country
)

Presenter Notes

Automatic question answering


Presenter Notes

Evaluation of ACE-in-GF

Design

  • select trees and automatically linearize them to all the languages
  • two selection methods:
    • hand-picked ACE-in-GF regression test sentences in ACE, parsed to trees (100 trees)
    • automatic bottom-up partial tree construction followed by probability-biased random generation (80 trees)
  • exclude some trees and languages, resulting in 114 trees and 10 languages
  • measure translation accuracy from ACE to other languages
  • use Google Translate as the baseline and human evaluators as the gold standard

Results

  • participants preferred ACE-in-GF translations to Google translations and post-edited them less

Presenter Notes

Evaluation of AceWiki-GF

Design

  • develop a 500-word geography lexicon
    • 3 languages: English, German and Spanish
    • 3 authors (incl. native speakers of German and Spanish, and a GF engineer)
  • ask users of different languages to supply the wiki with sentences and tag each as true or false
  • ask them then to evaluate others' sentences as true or false
  • measure the user (dis)agreement and how much it is influenced by the automatic translation

Results

  • 30 participants entered 316 sentences
  • agreement level was ~83% with no significant influence from the translation

Presenter Notes

Future work

  • customize to other types of grammars and reasoning
  • improve collaborative grammar editing features
  • improve ambiguity management (e.g. automatic reasoning-based ambiguity resolution)
  • use the wiki content to automatically generate documentation, grammar fragments, look-ahead editor customizations, etc. for novice users (depending on their language)

Presenter Notes

Links

Presenter Notes

Thank You!

Presenter Notes

...

Presenter Notes

Evaluation of ACE-in-GF

Presenter Notes