Top Banner
Using XML files as real corpora making an XML database with the dbXML program http://www.dbxml.com
21

Using XML files as real corpora making an XML database with the dbXML program .

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using XML files as real corpora making an XML database with the dbXML program .

Using XML files as real corpora

making an XML database with the dbXML program

http://www.dbxml.com

Page 2: Using XML files as real corpora making an XML database with the dbXML program .

The dbXML program

• The dbXML program is one of a range of programs that lets you use a set of XML files as a database.

• The program is free and can be downloaded from the web.

• It is likely that many more programs like this will be springing up over the next couple of years.

Page 3: Using XML files as real corpora making an XML database with the dbXML program .

Basic concepts

• Using a database requires the following basic concepts

– the set of files you are looking at is called a collection

– a collection of files must be indexed so that the program can find things quickly

– you ask questions by posting queries to the database manager

Page 4: Using XML files as real corpora making an XML database with the dbXML program .

Using the dbXML program to manage an XML database

• Our starting point assumes that we have some set of marked-up XML files that we want to manage.

• We first set up these files as a database

• We then use the dbXML tool for extracting information from this database.

Page 5: Using XML files as real corpora making an XML database with the dbXML program .

Example XML files in our data set

Page 6: Using XML files as real corpora making an XML database with the dbXML program .

Steps…

• Now we will see:– how to add a collection of files to a database– how to index those files– how to ask queries to get information about

the content of those files

Page 7: Using XML files as real corpora making an XML database with the dbXML program .

Getting started… (1)

• First, we need to start up the DBXML server program

This is the program the does all the actual work.

To do this:– Make sure you know where the dbxml folder is

– Run the program startup-server.bat in that folder (e.g., by double clicking on it).

– This should start the dbxml server with a message like:

dbXML 2.0 (Dragonfly)Logging to E:\junk\logging\dbXML.out

Page 8: Using XML files as real corpora making an XML database with the dbXML program .

Getting started…(2)• Next, we turn a set of XML files into an XML

database. To do this we must start the dbxml administration program and tell it which files to use.– Start a DOS-Command window

– Make sure you know where the dbxml folder is

– Run the command ‘startup-command-line.bat’ that is in the dbxml folder

– This should then start the dbxml program and you should get something that looks like the window on the next slide…

Page 9: Using XML files as real corpora making an XML database with the dbXML program .

The program when it starts…

Page 10: Using XML files as real corpora making an XML database with the dbXML program .

The DBXML administration actions

• Now you can tell the program which files you want to include in your database.– To do this, you first have to login to the program:

You must use exactly this name and password for the moment!

– make a collection

– Finally, go to the collection and say that everyone is allowed to look at it and exit:

connect user= scott pass= tiger

mkcol myXMLfiles

col myXMLfilesgrant admin READ WRITE EXECUTE CREATEexit

Page 11: Using XML files as real corpora making an XML database with the dbXML program .

The dbXML program proper

• With the administrative details aside, we can start the main program.

• Find the dbxml item in the normal program start menu from Windows and click on it.

• This should bring up the following window:

If it does not, or if you cannot find it, you will have to ask for help.

Page 12: Using XML files as real corpora making an XML database with the dbXML program .

Finding your collection

Expand the items in the list under “localhost” until you find the collection that you made in the previous step.

Page 13: Using XML files as real corpora making an XML database with the dbXML program .

Finding your collection

Page 14: Using XML files as real corpora making an XML database with the dbXML program .

Adding files to your collection

Expand your collection to find the ‘documents’

Click on this.

Select ‘Documents>Import Documents’ from the menu bar.

You will then be asked which files are to be added to the collection.

Previous slide

Page 15: Using XML files as real corpora making an XML database with the dbXML program .

When you have added your documents…

select them all at one go if possible

… you then have to index them…

Page 16: Using XML files as real corpora making an XML database with the dbXML program .

Select the indexes folder in your collection…

Page 17: Using XML files as real corpora making an XML database with the dbXML program .

Define an index as follows…

1. Give the index a name2. Then you must type “pattern=*@*” to index all

ELEMENTS + ATTRIBUTES3. and click on create.

1

2

3

Page 18: Using XML files as real corpora making an XML database with the dbXML program .

… you can now ask questions about

their content

• using XPath

• XSLT

• full text

QUERY WINDOW

RESULT WINDOW

Page 19: Using XML files as real corpora making an XML database with the dbXML program .

Selecting all ‘turns’ in the corpus

Page 20: Using XML files as real corpora making an XML database with the dbXML program .

Selecting all ‘attrib’ in the corpus

Page 21: Using XML files as real corpora making an XML database with the dbXML program .

The results….• are presented as

XML• therefore you can

pass them straight to a style sheet to look at them…