Program

The program is also available in a public google calendar for those who may want to subscribe and receive instantly notifications of changes.

Monday: Nov, 21 2011

HourWhat
14:00 Introduction to Text Mining
Gerard de Melo, ICSI Berkeley

This talk will introduce some popular techniques used to process and organize text document collections. Along the way, some of the most elementary natural language processing ingredients will be discussed, including tokenization, stemming, lemmatization, and stop word removal. We will then present the Vector Space Model and describe how it can be used 1) to assess how similar two documents are, 2) to categorize documents based on their contents, and 3) to cluster similar documents into groups. Finally, we will also point out how this model is used for information retrieval and web search.
15:30 Coffe
16:00 Introducing SUMO the Suggested Upper Merged Ontology
Valeria de Paiva, Rearden Commerce, CA

This talk presents an overview of ontology creation and development in SUMO, including how we envisage using it to help with reasoning about Portuguese texts.

The Suggested Upper Merged Ontology will be introduced (no previous knowledge assumed) and Sigma (the ontology environment) associated with it will be demonstrated. Issues of the capabilities and tradeoffs in first order logic inference are briefly explored. The SUMO mappings to the WordNet lexicon are also discussed.

Hopefully some hands-on axiom writing will be done together, bring your laptops!

17:00 The MIST project and the CPDOC/FGV information systems
Asla Sa, Suemi Higuchi, Moacyr Alvim and Renato Rocha

(in Portuguese) Asla apresentará o projeto MIST, parceria entre CPDOC e EMAp. Suemi irá apresentar os sistemas de informação do CPDOC e as demandas por processamento de linguagem natural que surgiram durante os trabalhos de modelagem conceitual e aperfeiçoamento das buscas.

Tuesday: Nov, 22 2011

HourWhat
14:00 Graph-Based Methods for Multilingual Knowledge about Words and Entities
Gerard de Melo, ICSI Berkeley

An increasing number of applications are making use of explicit knowledge about words and the entities they represent. This talk presents three graph-based data integration techniques to obtain such knowledge. The first involves learning models to connect words to their meanings. The second reconciles equivalence and distinctness information about entities from multiple sources. The third method adds a comprehensive taxonomic hierarchy, reflecting how different entities relate to each other. Together, they can be used to produce a large-scale multilingual knowledge base semantically describing over 5 million entities and over 16 million natural language words and names in more than 200 different languages.
15:00 A Semantic-Inferentialist Model for Natural Language Processing
Vládia Pinheiro

The information necessary for a complete understanding of texts in natural language is sometimes implicit, which requires drawing inferences from the use of concepts in the linguistic praxis. We claim that it is within the linguistic practice that the circumstances to use a word and the consequences thereof can be grasped, and – by disregarding them – much of what could be inferred is lost. In this paper, we present a computational model to treat the semantic-pragmatic level of natural languages – the Semantic Inferentialism Model (SIM). The semantic knowledge bases, the inferential relatedness measure and the inferential reasoning of SIM are detailed. SIM was used in the development of WikiCrimesIE – an application for extracting information about crime reported in news reports. Information related to the type of crime, causes of crime and type of weapon were implicit in most of the texts analyzed and the results of WikiCrimesIE demonstrate the feasibility and the differential of using SIM in natural language understanding systems.
16:00 Coffe
16:15 Contexts for Quantification
Valeria de Paiva, Rearden Commerce, CA

Logical systems conceived for providing semantics and logical forms for sentences of English abound. From Montague's original Higher-order Intensional Logic in the seventies, to Situation Theory and Discourse Representation Theory as well as the several frame languages (e.g. KL-ONE) and their descendents, the Description Logics in the 80's and 90's, to vanilla versions of First-order logic (FOL) with bells and whistles more recently, the field is rife with possibilities and issues.

I want to describe one more such language, the product of several years of development of the NLP-based knowledge representation system Bridge, at PARC. While the design of the language was historically tied-up to the development of the software system, I believe that the language and its inferential system are of independent interest. In previous publications, we have been calling this logic language TIL, for Textual Inference Logic. TIL can be considered one of several systems associated with Natural Logic.

Within the traditions of Natural logic, TIL distinguishes itself by the unorthodox treatment of quantification in terms of instantiation of concepts within contexts. I want to describe this mechanism of quantification, necessitated by what we consider a better modelling of negation and other intensional phenomena, ubiquitous in natural language. Further we want to map and relate TIL's expressive power to that of more traditional systems, as for example, the ones described by Moss.

17:15 NLP/Light
Renato Rocha

Serão apresentadas as técnicas de PLN usadas no projeto conjunto da EMAp com a distribuidora de energia Light.

Wednesday: Nov, 23 2011

HourWhat
14h Web Mining with Advanced Text Representations
Gerard de Melo, ICSI Berkeley

This talk will build on the Introduction to Text Mining and will discuss how language resources can allow us to construct more sophisticated text representations. Such advanced text representations are particularly useful when organizing information on the Web. For instance, we will discuss how user attitudes towards products can be extracted from blogs using sentiment mining techniques. We will also describe in great detail how language resources can be used to extend text mining to work with very short pieces of text (e.g. Twitter posts) and with multilingual text collections.
15:30 Coffe
16:00 History of logic and language crisis in representation of scientific knowledge
Diego Munk London, COPPE-UFRJ

The main hypothesis proposed by this work is the idea that as long as the catalogs of experimental data of each specific science expand, as a consequence of time, experience and technical advances, the specific formal languages used for the confection of their models exhaust their ability to increase semantic elasticity and force the scientist to give up on significant portions of their base of knowledge. This fact leads logicians and science philosophers to investigate the nature of formal languages and the origins and possible solutions to this crisis.
17:00 A wide-coverage free/open-source deep parser for Brazilian Portuguese: a work in progress
Leonel Figueiredo de Alencar
Research Group on Language and Computation, UFC

Aiming at reducing the shortage of freely distributed resources for the computational processing of Portuguese, we report on an ongoing project of a X-bar theory based parser for a wide range of texts in the Brazilian variety. In this talk we focus on the first stage of the project, which mainly resulted in the Aelius and Donatus modules. The former is a tool for morpho-syntactic tagging which features stochastic and hybrid taggers in three different architectures and also provides a user-friendly interface for third-party taggers. The latter is an interface between Aelius and NLTK's context-free grammar parsers. Therefore, Donatus performs deep parsing of text only resorting to a formalization of phrase structure rules, dispensing with the compilation of lexical entries or the implementation of a morphological lexical analyzer. The current phase of the project consists in developing a nominal expression chunker to be integrated into Aelius, and in implementing, by means of Donatus, a parser for a comprehensive grammar of DPs, taking as its starting point existing generative descriptions of Brazilian Portuguese.

Thursday: Nov, 24 2011

HourWhat
14:00 Information Extraction and Large-Scale Knowledge Bases
Gerard de Melo, ICSI Berkeley

This talk will describe methods that are used to extract knowledge from text. We start out with important natural language processing techniques like named entity recognition and coreference resolution, which help us identify what people and objects a given sentence is talking about. We then cover pattern-based information extraction techniques that allow us to extract facts abouts such objects from a text collection. Finally, we describe some of the more recent large-scale knowledge bases that provide large numbers of facts about the world.
15:30 Coffe
16:00 Localizing Math
Isabel Cafezeiro and Ivan da Costa Marques

We start from the analysis of Turing's work in On computable numbers, with an application to the Entscheidungsproblem raising evidences about how he proceeded to build the definition of computability. Taking advantage of his stepwise construction that starts from the materialities to achieve a satisfactory level of abstraction, we show how his way of doing mathematic fits an approach of knowledge construction where there is not definite separation between materia and form, and thus, the world and the language are not closed spheres. In the same line of reasoning, the abstract and the concrete, the deduction and the induction, the technical and the social, the objective and the subjective are unthinkable as pure entities. Considering controversies and discussions from the mid-nineteenth century to nowadays, we verify a social component that permanently takes part in what is usually considered "technical content" or “objectivity” undermining, thus, the axis of authority of mathematics, logic and computing. Under this view new possibilities for knowledge construction are acknowledged, allowing maths that are done and lived outside the major centers.

17:00 On the Computational complexity of Intuitionistic Modal and Description Logics
Edward Hermann, DI/PUC-Rio

In this talk we will present a proof of PSPACE completeness for the satisfiability problem of Intuitionistic Modal Logic IK and Intuitionistic Description Logic iALC. We propose a 2-person game which can be polynomially implemented in an Alternating Turing Machine, and which relates the existence of a winning strategy, for one player, to the satisfiability of a given formula. It is well-known that any polynomial time implementation on an Alternating Turing Machine can be solved in an ordinary Turing Machine using polynomial space. We also prove the finite model property for both logics. This is achieved because if there exists a winning strategy for a given formula then a finite model is built at the final stage of the game.
18:00 Reception, 12th floor of FGV's building (praia de Botafogo, 190).

Friday: Nov, 25 2011

HourWhat
14:00 Intuitionistic Description Logic and Legal Reasoning
Edward Hermann, DI/PUC-Rio

Classical Logic has been frequently used as a basis for knowledge representation and reasoning in many specific domains. Legal Knowledge Representation is particularly interesting due to the natural occurrences of conflicts among law systems, individual laws and cases. Those conflicts are usually taken as logical inconsistencies. Due to its inherently normative feature, coherence (consistency) in legal ontologies is more subtle than in most other domains. An adequate intuitionistic semantics for negation in a legal domain comes to the fore when we take legally valid individual statements as the inhabitants of our legal ontology. This allows us to elegantly deal with particular situations of legal coherence, such as conflict of laws, as those solved by Private International Law analysis. In this talk: (1) Briefly presents our version of Intuitionistic Description Logic, called IALC for Intuitionistic ALC; (2) Shows a study on the logical coherence analysis of "Conflict of Laws in Space", in the scope of Private International Law, by means of the IALC Sequent Calculus proposed herein.
15:00 Aquisição Automática de Conhecimento
Christian Nunes Aranha, Cortex Intelligence

(in Portuguese) A Web 3.0 ainda não aconteceu, a promessa é que ela seja uma Web mais Semântica. Uma Internet que liga não apenas documentos, mas também informação. Uma nova camada pousará sobre a Internet atual reconhecendo coisas e entidades e enriquecendo-as com metadados. Estes metadados auxiliarão aos computadores trocarem melhor a informação entre si, provendo melhores serviços aos usuários. Algumas especificações já estão ganhando espaço, como microformatos, RDF e OWL. Mais ainda são muito sofisticadas para os humanos disseminarem conteúdos enriquecidos pela Web. A proposta aqui é mostrar um nível de inteligência artificial que poderá manipular ontologias do conhecimento para enriquecer automaticamente os conteúdos da Web atual.
16:00 Coffe
16:15 A database approach to monitoring the quality of information in RDF stores
Alexandre Rademaker

We will start presenting the problem of Truth Maintainance in Database systems. Roughly speaking, this problem addresses the question of how to maintain the integrity constraints during the lifetime of a database. In some recent papers a discussion concerning the use of inconsistency-tolerant methods for obtaining partial integrity constraints satisfaction shows up. Partial integrity is obtained by means of flexible repairs, that is, integrity repair updates on the database that must follow a failed integrity constraint checking in order to try to recover the truth state of the database.

Absolute consistency is out of question due its intractability. On the other hand, naive inconsistency-tolerant repairs can be data-destructive. In order to have a rational flexible repair strategy, one needs criteria. These criteria can be expressed in terms of metrics.

We can consider the extension of what was discussed above to non-SQL DBs. A natural extension is to consider associative based Knowledge Representation bases, such as those based on RDF.

17:15 TBA