| XML Based Linking Concept | Table of contents | Indexes | The Future is Today: Case Studies in Innovation | |||
Fondazione Ugo Bordoni ![]() Iocchi, Luca Italy ![]() Rome ![]() | Luca Iocchi |
| Post-Doc student |
| Fondazione Ugo Bordoni |
| Via B. Castiglione 59
Rome
Italy
(00142)
Email: iocchi@fub.it |
| Biography |
Carpineto, Claudio Fondazione Ugo Bordoni ![]() Italy ![]() Rome ![]() | Claudio Carpineto |
| Researcher |
| Fondazione Ugo Bordoni |
| Via B. Castiglione 59
Rome
Italy
(00142)
Email: carpinet@fub.it |
| Biography |
. Introduction |
|
Cognitive agents for Web Information Extraction |
| 1. the cognitive agent does not depend on the application domain, i.e. it is a general purpose extraction agent driven by its own knowledge base; |
| 2. the user is involved in the extraction process only at design time. |
The stock market agent |
<!DOCTYPE LISTSHARES &lsb; |
<!ELEMENT LISTSHARES (SHARE+)> |
<!ELEMENT SHARE (NAME, PRICE, DATE)> |
<!ELEMENT NAME (#PCDATA)> |
<!ELEMENT PRICE (#PCDATA)> |
<!ELEMENT DATE (DAY, MONTH, YEAR)> |
<!ELEMENT DAY (#PCDATA)> |
<!ELEMENT MONTH (#PCDATA)> |
<!ELEMENT YEAR (#PCDATA)> &rsb;> |
The agent's KB |
| The axioms in the KB are expressed in the following notation (see &lsb;1, 2, 5&rsb; for a formal description of the syntax and ; semantics of the language): |
| P : A →Q |
| Page: findbigtable → &lsb;BigTable ; BigTable&rsb; |
| Page: finddate -> &lsb;IsDate ; IsDate &rsb; |
| BigTable: findcolname → &lsb;ColName ; ColName&rsb; |
| BigTable: findcolprice → &lsb;ColPrice ; ColPrice&rsb; |
| BigTable: findbigpretable → &lsb;PreBigTable ; PreBigTable&rsb; |
| PreBigTable: findprecolname → &lsb;ColName ; &rsb; |
| ColName: extractname → Name |
| ColPrice: extractprice → Price |
| IsDate: extractdate →Date |
The extraction program |
FINDDATE(); |
if (FINDDATE_T) {
|
FINDBIGTABLE(); |
if (FINDBIGTABLE_T) {
|
FINDCOLNAME(); |
if (FINDCOLNAME_T) {
|
FINDCOLPRICE(); |
if (FINDCOLPRICE_T) {
|
EXTRACTDATE(); |
EXTRACTPRICE(); |
EXTRACTNAME();
}
|
else {
FAIL();
}
}
|
else {
FAIL();
}
} |
else {
FINDPRETABLE();
|
if (FINDPRETABLE_T) {
|
FINDPRECOLNAME(); |
if (FINDPRECOLNAME_T) {
|
FINDPRECOLPRICE(); |
if (FINDPRECOLPRICE_T) {
|
EXTRACTDATE(); |
EXTRACTPRICE(); |
EXTRACTNAME();
}
else {
FAIL();
}
}
|
else {
FAIL();
}
}
|
else {
FAIL();
}
}
}
|
else {
FAIL();
}
|
The data extraction process |
|
<LISTSHARES> |
<SHARE> |
<NAME>B Agr Mantov</NAME> |
<PRICE>24009.75</PRICE> |
<DATE> <DAY>9</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>B Des-Br r99</NAME> |
<PRICE>3446.56</PRICE> |
<DATE> <DAY>9</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>B Desio-Br</NAME> |
<PRICE>6718.8598</PRICE> |
<DATE> <DAY>9</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
... |
</LISTSHARES> |
<LISTSHARES> |
<SHARE> |
<NAME>1 % 1 AG % CO. KGAA AKTIEN DM 5 |
/DE0005089007</NAME> |
<DATE> <DAY>09</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>1 % 1 AG % CO. KGAA AKTIEN DM 5 |
/DE0005089007</NAME> |
<PRICE>120.00</PRICE> |
<DATE> <DAY>09</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>AC-SERVICE AG NAMENS-AKTIEN O.N. |
/DE0005110001</NAME> |
<PRICE>27.90</PRICE> |
<DATE> <DAY>09</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
... |
</LISTSHARES> |
<LISTSHARES> |
<SHARE> |
<NAME>ABB AG I</NAME> |
<PRICE>2084</PRICE> |
<DATE> <DAY>08</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>ABB AG N</NAME> |
<PRICE>418</PRICE> |
<DATE> <DAY>08</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
<SHARE> |
<NAME>ADECCO I</NAME> |
<PRICE>710.00</PRICE> |
<DATE> <DAY>08</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> </DATE> |
</SHARE> |
... |
</LISTSHARES> |
Querying the extracted data |
construct <SHARE>$s</SHARE> |
where <*.SHARE>$s </> in "CH-Apr08.xml", |
<NAME>$n</> in $s, <PRICE.PCDATA>$p</> in $s, |
$p>1000 |
| The XML-QL interpreter has generated the following output |
<SHARE> |
<NAME>ABB AG I</NAME> |
<PRICE>2084</PRICE> |
<DATE> |
<DAY>08</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> |
</DATE> |
</SHARE> |
<SHARE> |
<NAME>ALETSCH I</NAME> |
<PRICE>3700.00</PRICE> |
<DATE> |
<DAY>08</DAY> |
<MONTH>april</MONTH> |
<YEAR>1999</YEAR> |
</DATE> |
</SHARE> |
. Conclusions |
References |
| &lsb;2&rsb; G. De Giacomo, L. Iocchi, D. Nardi, and R. Rosati. Planning with sensing for a mobile robot. In Proc. of 4th European Conference on Planning (ECP'97), 1997. |
| &lsb;3&rsb; A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A query language for XML. http://www.w3.org/TR/NOTE-xml-ql/. |
| &lsb;4&rsb; D. Florescu, A. Levy, and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, September 1998. |
| &lsb;5&rsb; L. Iocchi.Design and Development of Cognitive Robots. PhD thesis, DIS, Universit‡ di Roma "La Sapienza", 1999. |
| &lsb;6&rsb; L. Iocchi and D. Nardi. Information access in the Web. In Proceedings of WebNet'97, 1997. |
| &lsb;7&rsb; World Wide Web Consortium (W3C). Extensible Markup Language (XML) 1.0 (1998). http://www.w3.org/TR/1998/REC-xml-19980210. |
| &lsb;8&rsb; World Wide Web Consortium (W3C). XHTML 1.0: The extensible HyperText Markup Language. http://www.w3.org/TR/xhtml1. |
| XML Based Linking Concept | Table of contents | Indexes | The Future is Today: Case Studies in Innovation | |||