Texts 2: Markup languages, software for manipulating text László Kálmán 1 Csaba Oravecz 1 Péter Szigetvári 2 1 Research Institute for Linguistics Hungarian Academy of Sciences 2 Department of English Linguistics Eötvös Loránd University Lecture 4 / 3 Oct 2007 Kálmán, Oravecz, Szigetvári Texts 2: Markup languages, software outline Kálmán, Oravecz, Szigetvári Texts 2: Markup languages, software abstract this lecture tells you about • ways of formatting electronic text • important software for creating and manipulating electronic text • the features and functions of such software Kálmán, Oravecz, Szigetvári Texts 2: Markup languages, software markup languages 1: *MLs SGML (Standard Generalized Markup Language; ISO 8879) a metalanguage used to define specific markup schemes (a system of tags) HTML (Hypertext Markup Language) an implementation of SGML, used for web documents XML (Extensible Markup Language) a simplified subset of SGML XHTML (Extensible Hypertext Markup Language) an implementation of XML, used for web documents (HTML : SGML = XHTML : XML) Kálmán, Oravecz, Szigetvári Texts 2: Markup languages, software
17
Embed
Texts 2: Markup languages, software for manipulating textseas3.elte.hu/itcourse-2007/texts-2-h.pdf · Texts 2: Markup languages, software for manipulating text László Kálmán1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Texts 2: Markup languages,software for manipulating text
László Kálmán1 Csaba Oravecz1 Péter Szigetvári2
1Research Institute for LinguisticsHungarian Academy of Sciences
2Department of English LinguisticsEötvös Loránd University
• less elaborate systems used for specific purposes, e.g.,
BBCode (Bulletin Board Code)
used on bulletin boards, like the SEAS Forum (btw. have youjoined yet?); contains only some formatting (italics, boldface,colour, size), hyperlink tags, and emoticons (smilies)
Wikitext
used on Wiki sites, some formatting, links to other Wiki pages,external links, pictures, maps
• Wikitext is copiously documented in the relevant Wikipages, BBCode is also usually explained in forum FAQs
• less elaborate systems used for specific purposes, e.g.,
BBCode (Bulletin Board Code)
used on bulletin boards, like the SEAS Forum (btw. have youjoined yet?); contains only some formatting (italics, boldface,colour, size), hyperlink tags, and emoticons (smilies)
Wikitext
used on Wiki sites, some formatting, links to other Wiki pages,external links, pictures, maps
• Wikitext is copiously documented in the relevant Wikipages, BBCode is also usually explained in forum FAQs
• less elaborate systems used for specific purposes, e.g.,
BBCode (Bulletin Board Code)
used on bulletin boards, like the SEAS Forum (btw. have youjoined yet?); contains only some formatting (italics, boldface,colour, size), hyperlink tags, and emoticons (smilies)
Wikitext
used on Wiki sites, some formatting, links to other Wiki pages,external links, pictures, maps
• Wikitext is copiously documented in the relevant Wikipages, BBCode is also usually explained in forum FAQs
Microsoft’s proprietary platform-independent document format,human readable, but rarely edited directly
PostScript
a page description and programming language, the de factostandard for printing; human readable, editable
PDF (Portable Document Format)
Adobe’s proprietary document format, based on PostScript,encoding the exact look of the document; the most widespreadformat of publishing heavily formatted documents on the web;usually non-human-readable, compressed
Microsoft’s proprietary platform-independent document format,human readable, but rarely edited directly
PostScript
a page description and programming language, the de factostandard for printing; human readable, editable
PDF (Portable Document Format)
Adobe’s proprietary document format, based on PostScript,encoding the exact look of the document; the most widespreadformat of publishing heavily formatted documents on the web;usually non-human-readable, compressed
Microsoft’s proprietary platform-independent document format,human readable, but rarely edited directly
PostScript
a page description and programming language, the de factostandard for printing; human readable, editable
PDF (Portable Document Format)
Adobe’s proprietary document format, based on PostScript,encoding the exact look of the document; the most widespreadformat of publishing heavily formatted documents on the web;usually non-human-readable, compressed
N.B. words like TEX are used ambiguously: both for themarkup language and for the text formatting programme;this ambiguity does not normally cause anymisunderstanding
• opening/reading/retrieving a file: copying (part of) a file intothe memory (this part of the memory will be called buffer),and usually displaying (part of) it on the screen, so that itcan be read or modified by the user
• opening a new file: presenting an empty buffer so that a filecan be created from scratch
• saving/writing a file: writing the contents of the buffer to thedisk (this usually destroys the original file, but see VERSION
CONTROL on week 13)
• saving a file as: writing the contents of the buffer to the diskwith a file name different form the original, or in a formatdifferent form the original (or what the editor defaults to)
• auto(matic )saving: regular automatic saving of thecontents of the buffer to minimize data loss in case ofpower failure
• recovering a file: restoring the contents of an unsaved filefrom the automatically saved version
• quitting: in some editors quitting leaves the original filesintact, i.e., all the changes you made in the session are lost(except if there was autosaving during the session)
• exiting: modern editors usually ask if unsaved buffersought to be written to the disk; sometimes this does nothappen if you shut down the computer: to be on the safeside you had always better save buffers manually andperhaps close the editor before shutting down thecomputer
• cursor: an underscore (_), vertical line ( ), rectangular box( ), which indicates the point where text will be entered ifyou begin to type; it may blink; often its shape is differentdepending on input mode (insert or overwrite)
• insert mode: typed text will be inserted, pushing outcharacters to the right (left in right-to-left scripts)
• overwrite mode: typed text will overwrite characters to theright (left in right-to-left scripts)
• mark: another point in the text; the region between thecursor and the mark is selected for some operation
Figure: cursor in Open Office Writer in insert modeKálmán, Oravecz, Szigetvári Texts 2: Markup languages, software
concepts 2
• cursor: an underscore (_), vertical line ( ), rectangular box( ), which indicates the point where text will be entered ifyou begin to type; it may blink; often its shape is differentdepending on input mode (insert or overwrite)
• insert mode: typed text will be inserted, pushing outcharacters to the right (left in right-to-left scripts)
• overwrite mode: typed text will overwrite characters to theright (left in right-to-left scripts)
• mark: another point in the text; the region between thecursor and the mark is selected for some operation
Figure: cursor in Open Office Writer in overwrite modeKálmán, Oravecz, Szigetvári Texts 2: Markup languages, software
concepts 2
• cursor: an underscore (_), vertical line ( ), rectangular box( ), which indicates the point where text will be entered ifyou begin to type; it may blink; often its shape is differentdepending on input mode (insert or overwrite)
• insert mode: typed text will be inserted, pushing outcharacters to the right (left in right-to-left scripts)
• overwrite mode: typed text will overwrite characters to theright (left in right-to-left scripts)
• mark: another point in the text; the region between thecursor and the mark is selected for some operation
• cutting/killing:1 removing the selected region from the textand putting it to the clipboard/kill ring
• copying: copying the selected region to the clipboard/killring
• pasting/yanking: copying the contents of the clipboard/killring into the buffer
1MS dialect/Emacs dialect on this pageKálmán, Oravecz, Szigetvári Texts 2: Markup languages, software
concepts 4
• find/search: looking for a given pattern in the buffer
• overwrapped search: looking for occurrences of the patternfrom the begin of the file after the end of the file has beenreached (or from the end in the case of reverse/backwardsearching)
• incremental search: looking for a given pattern on the fly
• replace: removing the first given pattern from the bufferand inserting the second given patter in its place
regular expressions
offer a very powerful tool in replacing patters (more on them onweek 10)
When a monospaced font is used, there is a way to justify text without insertingextra spaces. Careful word choice allows the author to write with exactly eightycharacters per line, creating a visual effect of justification. Since many wordsin English mean the same thing but are different lengths, it is just a matter oftrial and error to find the proper line length. For extra points, you should endthe last line after eighty characters as well, creating an invincible paragraph.
points of hyphenation are calculated at the end of each line,they do not change later on, occasionally yielding very looselines; paragraph-based hyphenation would burden the systemtoo much, and would result in constantly flickering characterswhile text is entered
hyphenation: the TEX way
points of hyphenation are calculated for a whole paragraph, andrecalculated several times, until the optimal solution is achieved(this is possible, because the calculation does not take placeduring the editing of the text)
• data about previous edits in the documentyou can lose your job if you are unwary, e.g.,
Rossz verzió
Menesztették Dobos Gabriellát, a Fovárosi Foügyészség sajtóosztályának vezetojét,és felmentették szóvivoi posztjáról is. [. . . ]Ugyanakkor hiba történt, amelynek felelose van, hiszen a vádirat lerövidítése és arövidített, adatokat nem sérto változat elkészítése Dobos Gabriella osztályvezetofeladata volt. Mint az ügyészségi vizsgálatban kiderült, Dobos valóban le is rövidítettea vádiratot, de csak kijelölte a törlendo részeket, és úgy küldte a rövidített verziót aLegfobb Ügyészségre, hogy abból véglegesen nem törölték az adatokat.Így egy órán át az internetre kitett verziót (bizonyos billentyuk megnyomásával) bárkikiegészíthette a teljes verzióból kihúzott részekkel. Így a személyes adatokhoz is bárkihozzáférhetett, ami az adatvédelmi törvényt sérti.