LIS654 lecture 1 Introduction to the course, the ssh protocol Thomas Krichel 2012-09-07
Feb 20, 2016
LIS654 lecture 1
Introduction to the course, the ssh protocol
Thomas Krichel2012-09-07
what's up doc?• This lecture is to introduce the topic of digital
libraries.• First part: we look at the nature of digital
libraries. This part is informed by the first chapter of Arms book.
• Second part: we talk ssh. A sorry excuse to play with computers.
first part contents
• digital libraries• digital librarianship• a course on digital libraries• with the aim of training digital librarians
digital libraries
• Generally, we can think about digital libraries are – information stored on a computer– delivered via a network– mimics existing libraries
• As Arms puts it “a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network”.
prospects
• We are at the start of digital libraries.• The problem is that the technology is still
expensive, the cost is still coming down.• The opportunity is that we can build
pioneering systems now, that will have a lasting social impact.
example
• ISI journal citation report is based on two years of data of citations to journals.
• When Eugene Garfield founded it, he published the report in the second year of getting data.
• For the next issue, he chose the same horizon of data.
• Citation rankings of journals still use 2 years, almost 50 years after.
benefits: availability
• Digital libraries bring the information closer to the user than physical libraries can– physically– temporarily
• Even when you are in the physical library you still get faster access to digital library items.
benefits: findability
• Information can be more easily found in digital than in print.
• Some non-textual information is still only findable via metadata.
• But computer scientists are working on that.
benefits: sharing
• Information can be shared.• Items can not be damaged.• Items can not be stolen.
benefits: updating
• Information can be kept up-to-date more easily.
• To update a book, you have to reprint all copies, and replace them.
benefits: new media
• Information can be created and manipulated in completely new ways.
• For example location information can be mixed up with subject information.
issue: costs
• The cost of storing print information is very high. It is a multiple of acquisition costs.
• Digital storage devices decline in price.• But digital information manipulation requires
skills that are not easy to procure.• The overall cost comparison is difficult to
assess.
drawback: preservation
• Preserving information is easy on paper.• Preserving digital information looks very hard. • We will not look at this issue in the course,
because there is a specialized Palmer School course dealing with this.
drawbacks: monopoly dangers
• Since the information only needs to be kept in one copy, and others can access it, there are inherent dangers of the build-up of monopolies.
• One example is Google search engine.
drawbacks: free information
• Since the information is more easy to copy it is harder to police illegal sharing.
• Some creators and intermediaries are feeling the pinch.
• The newspaper industry is one.• Physical libraries are one potential victim.
drawbacks: professional upheaval
• Digital librarianship is as yet, largely undefined.
• This leads me to the next topic.
digital librarianship
• Librarianship has always been a bicephal occupation.
• Libraries always have a collection and service aspect them.
• Digital libraries are no different.
collection aspect
• The collection has to be managed and organized.
• The organizers deal with dead matter, documents.
• This organization is a scientific activity.• Librarianship is a natural science. • The librarian is a cataloger in a corner.
service aspect
• Users have to be shown how the library works.• Librarians have to understand users’ needs to
build services users want.• All these are social activities.• Librarianship is a social science.• The librarian is a people service person.
digital information was hard to use
• Computers had to be driven by esoteric commands.
• Screens were hard to read from. • Telephone lines where hard to get to work to
transmit information• Access costs to digital information was high.• The service aspect was important.
digital information is becoming easier
• Computers are more and more easy to use.• Digital information providers tend to
communicate directly with customers, bypassing libraries.
• Subject literacy becomes relatively more important than information literacy.
• The service aspect is being reduced over time.
an important caveat
• Most items in the modern (19th, 20th century) are mass-produced.
• There is no mass production or mass storage in the digital library.
• The difference between publishers, archives and libraries become very blurred.
a course on digital libraries?
• My initial thought is that a course on digital libraries is nonsensical.
• In the recent future, all libraries will be digital.
digital libraries course
• Literacy and use of digital media. • The idea is to look at what digital libraries exist
and how to use them. • This is really already done in LIS511.• The course has the “building” theme to it.
building aspect
• Building a digital library can basically take three for– electronic resource management– repository building– cross-repository services
electronic resource management
• Libraries license digital contents from providers and make them available.
• There are some minor technical issue– authentication– integration with ILS
• legal issues with the licensing• minor training issues with users
repository building
• Libraries are building repositories of local digital or digitized contents.
• This is firmly on the technical side.• It is the main focus of the LIS654 course as it
has been developed in the past.• We cover digitization as part of repository
building.
cross-repository services
• I think of repositories as publishers, rather than libraries.
• Digital libraries are cross-repository datasets and services attached to them.
• This is where I have done almost all my work.• It can not be done without custom computer
programming.
course syllabus
• It draws on Brian Hoffman’s syllabus in for his Manhattan, Spring 2011 section.
• It is quiet non-technical. I will tune up the technology over the years.
• One can argue that without computer programming, one can not be a digital librarian.
• But most digital libraries fail because of non-technical issues.
my expertise
• My main expertise is in setting up completely new open-access digital library services and collections.
• In non-technical terms, I can discuss how to set up these service and how they run.
• But I am reluctant to appear like a self-promoting pompous git.
the wider environment
• Since 2008 I have been trying to build a special digital information concentration in the Palmer MSLIS program.
• The current version is at http://openlib.org/ home/krichel/proposals/wic.html.
• The LIS654 course is not part of the proposed concentration.
second part: ssh
• a bit about the computer • the Internet• ssh and our host• a brief discussion of the operating system of
the host
a few words about a computer
• To use a computer you need to know something about its operating system (o/s).
• The o/s sets out how the computer behaves. • There are two o/s flavors that are widely used
– MS Windows (in various version)– UNIX-like operating system
• Our server runs an o/s called “Debian GNU/Linux”
some generalities about Debian
• Debian is an open-source computer operating system developed and maintained by a large group of volunteer.
• Debian packages together a very large set of pieces of software into a coherent system.
• It provides a version of the UNIX operating system using Linux.
• The following notes hold for all (?) Unix flavor operating systems.
files
• Files are continuous chunks data on disks that are required for software applications.
• Files have names.• Files have permissions attached to them,
discussed in "permissions model". • Files have times attached to them. Usually
the mtime (time of contents modification) is the only one shown.
directories
• Directories are files that contain other files. Microsoft calls them folders.
• They have names, permissions and times like other files.
• In UNIX, the directory separator is “/”• The top directory is “/” on its own.
links
• Links are files that contain the address of other files.
• In MS Windows, links are called shortcuts.• The times and permissions of links are kept
but they are of no importance.
users and groups
• “root” is the user name of the superuser.• The superuser has all privileges.• There are other physical users, i.e. persons
using the machine• There are users that are virtual, usually
created to run a daemon. For example, the web sever in run by a user www-data.
• Arbitrary users can be put together in groups.
permission model• Permission of files are given
– to the owner of the file– to the group of the file– and to the rest of the world
• A group is a grouping of users. Unix allows to define any number of groups and make users a member of it.
• The rest of the world are all other users who have access to the system. That includes www-data!
user name & password
• To work with our server, you need a use name and a password.
• You can choose your user name as a short form of your own name.
• It should be all lowercases and can not have spaces.
• Please don’t choose an insecure password.
the Internet• The Internet is an interconnected set of
physically disparate networks.• Each computer, when connected to the
Internet has an IP address. It's a four-byte number written as four decimal number from 0 to 255 connected by dots. Example: 148.4.2.231.
• Once a computer has an address, it can communicate with others using a protocol known as IP.
Internet application protocols
• Once we have the Internet, we need protocols to work with it.
• They are called Internet application protocols.• Their king is the domain name system.• Two other protocol we will work with are http
and ssh.
Domain Name System• Domain Name System allows us to associate
human-friendly names with IP addresses. These names are called domains names.
• Domain names can be leased from domain name registrars.
• A machine with a domain name on the Internet is called a host.
• When we know the domain name of the host, we can communicate with the host.
protocols to communicate with hosts
• There are two protocols we use in this class.– We use http to work with the omeka web
interface– We use ssh for some special operations.
• Both protocols are client/server protocols.• You run as ssh or http client on your local
machine.• You communicate with a machine that runs
ssh or http server software.
the ssh protocol• ssh is protocol that uses public key
cryptography to encrypt a stream of communication between client and server.
• This allows us to privately manipulate the server. Or “manipulations” are really just changes to files on the server that contain our web pages.
• The ssh client software we use on the PC is called WinSCP. It is a file transfer program.
our server• Is a machine called tiu.• The machine has been given a domain name
dlib.info.• We also say it is a “host” on the Internet. • It is a rented machine. It cost me money to rent
it, about 50 euros a month.• It runs the testing version of Debian/GNU Linux.• It runs both http and ssh server software.• It is maintained by Thomas Krichel.
tiu vs dlib.info
• Tiu is a rented machine. It is a physical box somebody can actually touch.
• dlib.info is a name. Nobody can touch it.• When the name is resolved into an IP address,
that IP address, at this point in time, is the same as tiu’s.
• In the future it may be a different machine.
the web site • As part of the course, you are being provided
with web space on the host dlib.info, at the URLhttp://dlib.info/home/user where user is a user name that you have chosen.
• This shows a list of available fails as prepared by the web server at tiu.
• This is a page that Thomas has prepared for you.
ssh protocol
• The ssh protocol implements a secure connection to the server over which we can– send instructions to it– store files on it.
• wotan run an ssh server.• On your machines, you run ssh client
software.
ssh client software
• On MS Windows machines, we run – putty for interactive use– WinSCP for file storage and retrieval.
• Usually, students in this class only need to understand WinSCP.
• On the Mac, you can use – Cyberduck– Fugu
• For interactive use on the Mac use Terminal.
winscp• In winscp, the client that we use here most of
the time, we don't make advanced use of public keys, we simply give a password.
• Note that winscp does not establish a connection to wotan. It simply uses ssh as a means to transfer files.
• When winscp saves a file, it may require to open a new connection and will ask you the password again. This request may be in a window you can't immediately see.
open a wotan session with winscp• If you see a list of session, click on “new session”.
– The host name is “dlib.info”.– Give your user name.– Click on “save”, this will save the session, after “ok”.
• You will be lead to the list of saved sessions, double-click to open a session.
• At initial connection, you will be shown a warning message that you can ignore.
• When saving or duplicating files, you may be asked to enter your password again. Watch out for that.
home directory
• When your connection with tiu, and you have authenticated as a certain user, you will be shown your home directory.
• On tiu this is /home/user where user is your user name.
• There you see a bunch of files starting with a dot. Leave them alone.
• And you see a bunch of directories.
initial files on tiu• A directory called public_html. This is your web
site. Everything you store there is on the web.• A set of files starting with a dot. They are
greyed out.• One of them is called .my.cnf. This an
initialization file for your mySQL client. We will not use the client, but we will store the password there.
• The file should be readable and writable by you only, no access to group other users.
mySQL• mySQL is as implementation of a relational
database software. More about it later.• It uses its own permission system. That means
that it has a separate user/password space.• By Thomas’ decision, your mySQL username is
the same as your tiu user name. But Thomas does not know how to import your tiu password as your mySQL password. It has to be recorded separately.
• We use .my.cnf in your home directory.
initial state of .my.cnf
1. # on line 4, replace your_password with your chosen mysql password
2. [client]3. user = your_user_name4. password = your_passsword
web home directory• The web home directory is /var/www.• There you see a directory home, with a series
of links– they have a user name as file name – they go to your home/public_html directory
• There you see a directory omeka with a series of links– they have a user name as file name – they go to the user’s omeka directory
web site address• http://dlib.info/ goes to the /var/www
directory. There it shows the file index.html.• http://dlib.info/home/user goes to the
/var/www/home directory, where it finds the link to the public_html directory of the user user.
• In that directory, it will show the file index.html if it exists. Otherwise, it will build an index on the fly.
backup
• This is more of a technical issue.• You will need backup. My general prescription
would be to run the repository itself with a 3rd party provider.
• Locally, keep a staging (rather than production) server and a backup. They can both be on the same machine.
• All this should be part of the sysadmin course.
common-sensical sysadmin tips• You need physical security for any server.• You need to keep the software up-to-date. I do
it, roughly, weekly.• You need to join the mailing list for the
repository software, and the security list for the operating system.
• Encrypted access to the server when authentication is required.
• Run minimal amount of software.
http://openlib.org/home/krichel
Please shutdown the computers whenyou are done.
Thank you for your attention!