Introduction to Computing Using Py Data Storage and Processing How many of you have taken IT 240? Databases and Structured Query Language Python Database Programming
Jan 04, 2016
Introduction to Computing Using Python
Data Storage and Processing
How many of you have taken IT 240? Databases and Structured Query Language Python Database Programming
Introduction to Computing Using Python
Data storage
Beijing × 3Paris × 5Chicago × 5
Chicago × 3Beijing × 6
Bogota × 3Beijing × 2Paris × 1
Chicago × 3Paris × 2Nairobi × 1
Nairobi × 7Bogota × 2
one.html four.html
two.html
three.html five.html
We wish to store data about Web pages in a way that Python programs can access the data conveniently
Introduction to Computing Using Python
Data storage
Beijing × 3Paris × 5Chicago × 5
Chicago × 3Beijing × 6
Bogota × 3Beijing × 2Paris × 1
Chicago × 3Paris × 2Nairobi × 1
Nairobi × 7Bogota × 2
one.html four.html
two.html
three.html five.html
To do this, we will use a database
Introduction to Computing Using Python
Databases
A database consists of one or more tables
Each table has a name and consists of rows (records) and columns (attributes) Each attribute has a name and contains data of a specific type Hyperlinks
Keywords
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Url Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Introduction to Computing Using Python
Database files
Database files are not text files – you can’t read from or write to them directly
Instead, communication is performed by commands written in a database language called Structured Query Language (SQL)
Introduction to Computing Using Python
SQL SELECT FROM statement
Link
two.html
three.html
four.html
four.html
five.html
one.html
two.html
four.html
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
SELECT Link FROM HyperlinksHyperlinks
SQL statement SELECT is used make queries into a database. The result called a result table
result table
Introduction to Computing Using Python
SQL SELECT FROM statementSQL statement SELECT is used make queries into a database.
SELECT Url, Word FROM Keywords
KeywordsUrl Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Url Word
one.html Beijing
one.html Paris
one.html Chicago
two.html Bogota
two.html Beijing
two.html Paris
three.html Chicago
three.html Beijing
four.html Chicago
four.html Paris
four.html Nairobi
five.html Nairobi
five.html Bogota
Result
Introduction to Computing Using Python
SQL SELECT FROM statement
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
SELECT * FROM Hyperlinks
HyperlinksSELECT statements can use *, a wild card
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Introduction to Computing Using Python
SQL DISTINCT keyword
Link
two.html
three.html
four.html
five.html
one.html
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
SELECT DISTINCT Link FROM Hyperlinks
HyperlinksSQL keyword DISTINCT removes duplicate records in the result table
Introduction to Computing Using Python
SQL WHERE clause
SQL clause WHERE is used to select only those records that satisfy a condition
SELECT Url FROM KeywordsWHERE Word = 'Paris'
KeywordsUrl Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Url
one.html
two.html
four.html
“In which pages does word X appear in?”
Operator Explanation= Equal<> Not equal> Greater than< Less than>= Greater than or equal<= Less than or equalBETWEEN Within an inclusive range
Introduction to Computing Using Python
SQL WHERE clause
SQL clause WHERE is used to select only those records that satisfy a condition
SELECT Column(s) FROM TableWHERE Column operator value
SELECT Column(s) FROM TableWHERE Column BETWEEN value1 AND value2
Introduction to Computing Using Python
Exercise
Hyperlinks
Keywords
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Url Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Write an SQL query that returns:1. The URL of every page that has a link to web
page four.htmlSELECT DISTINCT Url FROM HyperlinksWHERE Link = 'four.html'
Introduction to Computing Using Python
Exercise
Hyperlinks
Keywords
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Url Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Write an SQL query that returns:2. The URL of every page that has an incoming link
from page four.html SELECT DISTINCT Link FROM Hyperlinks WHERE Url = 'four.html'
Introduction to Computing Using Python
Exercise
Hyperlinks
Keywords
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Url Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Write an SQL query that returns:3. The URL and word for every word that appears
exactly three times in the web page associated with the URL
SELECT Url, Word from KeywordsWHERE Freq = 3
Introduction to Computing Using Python
Exercise
Hyperlinks
Keywords
Url Link
one.html two.html
one.html three.html
two.html four.html
three.html four.html
four.html five.html
five.html one.html
five.html two.html
five.html four.html
Url Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
Write an SQL query that returns:4. The URL, word, and frequency for every word
that appears between 3 and 5 times, inclusive, in the web page associated with the URL
SELECT * from Keywords WHERE Freq BETWEEN 3 AND 5
Introduction to Computing Using Python
SQL built-in functions
SQL includes built-in math functions such as COUNT() and SUM()
There are 3 web pages that mention Paris
KeywordsUrl Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
3
“How many pages contain the word Paris?”
SELECT COUNT(*) FROM Keywords WHERE Word = 'Paris'
Introduction to Computing Using Python
SQL built-in functions
SQL includes built-in math functions such as COUNT(), SUM() and AVG()
SELECT SUM(Freq) FROM Keywords WHERE Word = 'Paris'
KeywordsUrl Word Freq
one.html Beijing 3
one.html Paris 5
one.html Chicago 5
two.html Bogota 3
two.html Beijing 2
two.html Paris 1
three.html Chicago 3
three.html Beijing 6
four.html Chicago 3
four.html Paris 2
four.html Nairobi 5
five.html Nairobi 7
five.html Bogota 2
8
There are a total of 8 occurrances s of ‘Paris’ on these web pages
Introduction to Computing Using Python
Another example database
seasons
weatherdata
name number
winter 1
spring 2
summer 3
fall 4
city Season temperature
Mumbai 1 24.8
Mumbai 2 28.4
Mumbai 3 27.9
Mumbai 4 27.6
London 1 4.2
London 2 8.3
London 3 15.7
London 4 10.4
Cairo 1 13.6
Cairo 2 20.7
Cairo 3 27.7
Cairo 4 22.2
weather.db contains two tables:
weatherdata (city text, country text, season int, temperature float)
seasons (attributes name text, number int)
“What is the average summer temperature in Mumbai’?”
Introduction to Computing Using Python
SQL queries involving multiple tables
Assume we don’t know the number coding of seasons, then this question requires a lookup of both tables:
• Use seasons to find match to season name• Use weatherdata to find temperature
Introduction to Computing Using Python
Standard Library module sqlite3The Python Standard Library includes module sqlite3 that allows Python programs to access databases
>>> import sqlite3>>> con = sqlite3.connect('web.db')
sqlite3 function connect() takes as input the name of a database and returns an object of type Connection, a type defined in module sqlite3
• The Connection object con is associated with database file web.db• If database file web.db does not exists in the current working directory,
a new database file web.db is created
Introduction to Computing Using Python
Standard Library module sqlite3>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()
Connection method cursor() returns an object of type Cursor, another type defined in the module sqlite3
• Cursor objects are responsible for executing SQL statements
Introduction to Computing Using Python
Standard Library module sqlite3
The Python Standard Library includes module sqlite3 provides an API for accessing database files
• It is an interface to a library of functions that accesses the database files directly
>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()>>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)")<sqlite3.Cursor object at 0x100575730>
The Cursor class supports method execute() which takes an SQL statement as a string, and executes it
>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()>>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)")<sqlite3.Cursor object at 0x100575730>>>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)")<sqlite3.Cursor object at 0x100575730>
Hardcoded values
Introduction to Computing Using Python
Parameter substitution
In general, the values used in an SQL statement will not be hardcoded in the program but come from Python variables
>>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)")<sqlite3.Cursor object at 0x100575730>>>> url, word, freq = 'one.html', 'Paris', 5>>>
Introduction to Computing Using Python
Querying a database
>>> import sqlite3>>> con = sqlite3.connect('links.db')>>> cur = con.cursor()>>> cur.execute('SELECT * FROM Keywords')<sqlite3.Cursor object at 0x102686960>>>> cur.fetchall()[('one.html', 'Beijing', 3), ('one.html', 'Paris', 5), ('one.html', 'Chicago', 5), ('two.html', 'Bogota', 5), ('two.html', 'Beijing', 2), ('two.html', 'Paris', 1), ('three.html', 'Chicago', 3), ('three.html', 'Beijing', 6), ('four.html', 'Chicago', 3), ('four.html', 'Paris', 2), ('four.html', 'Nairobi', 5), ('five.html', 'Nairobi', 7), ('five.html', 'Bogota', 2)]>>>
The result of a query is stored in the Cursor object
To obtain the result as a list of tuple objects, Cursor method fetchall() is used
Introduction to Computing Using Python
Querying a database
>>> cur.execute('SELECT * FROM Keywords')<sqlite3.Cursor object at 0x102686960>>>> for record in cur:
print(record)
('one.html', 'Beijing', 3)('one.html', 'Paris', 5)('one.html', 'Chicago', 5)('two.html', 'Bogota', 5)('two.html', 'Beijing', 2)('two.html', 'Paris', 1)('three.html', 'Chicago', 3)('three.html', 'Beijing', 6)('four.html', 'Chicago', 3)('four.html', 'Paris', 2)('four.html', 'Nairobi', 5)('five.html', 'Nairobi', 7)('five.html', 'Bogota', 2)>>>
An alternative is to iterate over the Cursor object
Introduction to Computing Using Python
Exercises
In week10exercisesstart.py