SciELO Methodology
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 1/30
SciELO Methodology
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 2/30
Methodology
• SciELO PC Programs (Windows/Visual Basic/VBAWord) – Server programs: Title Manager, Code Manager,
Converter/Parser, XML SciELO
– Workstation program: Markup/Parser
– Located on: c:\scielo\bin\• SciELO Processing (.bat, .sh, java)
– GeraPadrao
– Programs to export data
– Bibliometrics
– Etc – Located on: c:\home\scielo\www\proc or c:\scielo\web\proc
• SciELO Web (Apache, PHP, WWWISIS) – Located on: c:\home\scielo\www or c:\scielo\web
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 3/30
Computers
Local server
(Windows):• Title Manager
• Code Manager
• Converter
• XML SciELO
• Markup/Parser
• Local web site• files storage (img,
pdf, html, etc)1 or more workstations
(Windows)
Markup/Parser
Microsoft Word
Linux server
• processing• homologation web site
• production web site
Obs.: Each one of
these functions can be
in one or more linux
server
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 4/30
The programs are in: c:\scielo\bin\
The data are in: c:\scielo\serial
The data of each journal are in:c:\scielo\serial\<journal_acronym>
The data of each journal issue are in:
c:\scielo\serial\acronym\v*n*
Before using the programs, it is necessary to check if all files are in thecorrect structure.
Under the volume and number folder, the following directories must becreated:
Files structure in the local server
(Windows)
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 5/30
c:\SciELO\serial\sajs\v105n7-8\source\editorial.pm6
Body
Contains all articles of
an issue, each article
in its own file, named
in the correct way
Img
Contains all images,
figures, graphics,etc. named in the
correct way
c:\SciELO\serial\sajs\v105n7-8\body\a01v29n1.html
c:\SciELO\serial\sajs\v105n7-8\img\a01fig01.gif
Markup
Contains the articles to
be marked. The filesfrom the folder body
should be copied and
pasted in this folder.
c:\SciELO\serial\sajs\
v105n7-8\markup\a01v29n1.html
Contains the PDF
files that must benamed in the same
way as the HTML
files
c:\SciELO\serial\sajs\
v105n7-8\pdf\a01v29n1.pdf
Source
Contains the original files
(final version), without
any sort of last-minute
modifications or
adjustments
Files structure in the local server
(Windows)
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 6/30
Structure of the journal’s folders
directory
Files structure in the local server
(Windows)
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 7/30
• SciELO PC programs are accessedby the Program Files menu
Components
SciELO PC Programs
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 8/30
SciELO PC Programs: Local
server• Title Manager :
– program in Visual Basic
– to manage the database title (journal), section (table of contents of the journals),issue (issues).
– located in the local server: c:\scielo\bin\config
• Code Manager :
– program in Visual Basic – to manage the tables of codes (language, country, etc) .
– located in the local server: c:\scielo\bin\codes
• Converter :
– program in Visual Basic
– to convert the markup documents into database ISIS
– located in the local server: c:\scielo\bin\converter
• XML SciELO: – Program in BATCH
– To generate the XML to export to ISI and PubMed, it can be modified to generateto other databases
– Located in the local server: c:\scielo\xml_scielo
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 9/30
SciELO PC Programs:
workstation• Markup:
– program in VBA Word
– to guide the identification of the elements ofthe article
– located in the local server andWORKSTATION: c:\scielo\bin\markup
• SGML Parser :
– program in VB and C
– to parse the markup of the documents
– located in the local server andWORKSTATION: c:\scielo\bin\sgmlpars
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 10/30
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 11/30
Files structure of web site
• Web site files – www/bases
• translation
• artigo
• issue
• title
• etc
– www/htdocs• img
• revistas
– www/cgi-bin• Wxis
• IsisScript
• Processings files• www/proc
• www/bases-work
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 12/30
Files
reception
Workflow in the local serverFiles
preparation ->doc/html
.html
Title
Section
Issue
Title
Manager
Code
Manager
Codenewcode
Code
Manager
corrections
Markup
Parser
MS-Word
Issue.mds
Marked files
Converter
issue of a journal
v<VOL>n<NUM>,
Located on serial
Corrections
Local GeraPadrao
scilist
artigo etc
Local web site
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 13/30
Workflow in the local server• PEOPLE receive the files of the issues (.html, images, pdf, etc)
• PEOPLE prepare and archive them in serial/<journal_acronym>/v<VOL>n<NUM>/ in folders:markup, body, img, pdf. One article for one file .html. At this point, markup and body folders havethe same content.
• Parallelly the issue’s data are registered using Title Manager/Create new issue
• When an unregistered journal title comes, it must be registered, using Title Manager/Create newtitle
• After registering the issues’ data, Title Manager generates input files for markup and converterprograms in their own folder in the computer where Title Manager is running (bin\markup and
bin\convert). So, it is necessary to copy the files from bin\markup to the other computers whereMarkup runs
• Markup program is used to identify the bibliographic elements of the articles/text located in themarkup folder (serial/<journal_acronym>/v<VOL>n<NUM>/markup)
• Parser program is used to validate the files processed by Markup program
• Converter program reads the files located in markup and body of na issue(serial/<journal_acronym>/v<VOL>n<NUM>) and then generates its databas
• All the databases generated by Converter program are used by GeraPadrao to create thedatabase of the web site. The images, pdf, etc, have to be copied to the corresponding folder to beaccessed by the web site.
• Code Manager is rarely used. It manages the tables of codes used by SciELO.
• Whenever mistakes are found, it is possible go back and correct the data and redo the process
• Finally, using a script EnviaBasesScieloPadrao.bat, the databases are sent to a server to beprocessed
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 14/30
Workflow: transfering data from local to processing area
issue of a journal
v<VOL>n<NUM>,
Located on serial
Code
newcode
Located on
bases, resulting
of local
GeraPadrao:
Title
newissue
Local GeraPadrao
scilist
EnviaBasesSciELOPadrao.bat
(local): proc/temp/transf2linux
Processing server
FTP
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 15/30
local server
Processing server
1) Configure the files:C:\scielo\web\proc\transfer\ or
C:\home\scielo\www\proc\transfer
EnviaBasesLogOn.txt
EnviaImgPdfLogOn.txtEnviaTranslationLogOn.txt
2) Execute in C:\scielo\web\proc\ or c:\home\scielo\www\proc\:
EnviaBasesSciELOPadrao.bat – it sends the databases from Windows to Linux
EnviaImgPdfSciELOPadrao.bat – it sends the img, pdf from Windows to Linux
EnviaTranslationSciELOPadrao.bat – it sends the translations from Windows to Linux
3) Execute
GeraPadrao.bat
Open <server_name>
<user_name>
<password>
cd <path_www>
Configuration of the processing to send data and
files to the processing area
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 16/30
Workflow in the Linux server
issue of a journal
v<VOL>n<NUM>,
Located on serial
Code
newcode
Located on bases,
resulting of local
GeraPadrao:
Title
newissue
GeraPadrao.bat
scilistdatabases artigo etcFor the web site
Homologation
Web site
copy
Production/Public
Web site
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 17/30
Workflow in the Linux server
• After receiving the databases and files, theGeraPadrao.bat script must be executed to
generate the databases for the web site. It
is necessary because the databases of theWindows and Linux have uncompatible
format
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 18/30
Processings in Windows
• Xml_scielo: is part of PC Programs/server – PubMed: generate XML to PubMed
– ISI: generate XML to Web Of Sciences
• EnviaBasesScieloPadrao.bat sends databases to
processing server• EnviaImgPdfScieloPadrao.bat sends img and pdf to
homologation server
• EnviaTranslationScieloPadrao.bat sends translations to
homologation server
Note: processing, homologation and public server can be
the same
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 19/30
Only for SciELO Brasil
• Health indicators (Brazilian database)
• Curriculum ScienTI / Lattes (Brazilian
Database, but it is possible an adaptation)
• Semantic highlights: a trial of Knewco,
interrupted.
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 20/30
Centralized processings
• SciELO.ORG:• Bibliometrics
• Links Medline, LILACS, etc
• Co-authors – not finished
• Centralized in Brazil
– doaj: not ready; necessary and agreementwith DOAJ
– Accesses
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 21/30
Processings in the instance
• scieloUpdate: to update the SciELO web site in Linux
• For data exchange: – By sending data
• Envia2Medline.bat: feeds scielo.org. Use for: bibliometrics,
etc• Crossref
– By letting available to harvest• Google Scholar
• Webservices
– By querying• Scimago. Query to http://www.scimagojr.com/journalrank.php
• databases: related and cited from SciELO.org
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 22/30
In the instance: no processing /
exchanging data
• External services:
– Google Analytics
– OAI
Installation and configuration of
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 23/30
Installation and configuration of
SciELO web site and processings
in a Linux server• http://reddes.bvsalud.org/projects/scielo-
metodologia/browser/tags/v5.0-
pr/docs/SciELO-Web-
5.0_installation_guide_en.pdf
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 24/30
Installation and configuration of the
local SciELO web site
• http://webdevcodex.com/tutorial-
installing-apache2-php5-mysql5-
phpmyadmin3-windows-7-vista/ • Local SciELO web site
– Version 3: (php4.3.x)
http://reddes.bvsalud.org/projects/scielo-
metodologia/browser/branches/scielo-web_3.3
– Version 4: (php4.3.x - php5.2.x)
http://reddes.bvsalud.org/projects/scielo-
metodologia/browser/tags/
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 25/30
Using VirtualBox to host a Linux
server in a Windows• Configuring the network:
– Bridge
Move or rename the file: /etc/udev/rules.d/70-persistent-net.rules
To get an IP
• Maping the database folder in Linux to a folder in Windows, to use the free space in Windows.The space in VM is limited to 28 Gb.
• Linux: /home/scielo/www/bases => Windows: c:\scielo\bases\linux\public
• Linux: /home/scielo_homolog/www/bases => Windows: c:\scielo\bases\linux\homolog\
– VirtualBox:
• Settings: shared folders
• add share:
– C:\scielo\bases\linux\public
– bases
• add share:
– C:\scielo\bases\linux\homolog
– bases_homolog – /etc/fstab:
– bases -> /home/scielo/www/bases
– bases_homolog -> /home/scielo_homolog/www/bases
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 26/30
Edit /etc/hosts, including the IP and server
name:
• Example:
– 127.0.0.1 vm.scielo.br
NOTE: 127.0.0.1 DOES NOT CHANGE
Configuration in the server (Linux)
127.0.0.1 servername
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 27/30
Configuration of the Virtual Host
Edit:Ubuntu: /etc/apache2/sites-available/scielo
Windows: <Apache_path>/conf/extra/httpd-vhost.conf
Blue: [email protected] = the e-mail of the web site administrator
Red: /home/scielo = path of the web site
Green: localscielo = url of the web site
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 28/30
New Virtual Host
• Create the file for the virtual host in:
– /etc/apache2/sites-available/<name_of_the_virtual_host_file>
– Example: /etc/apache2/sites-available/scielo_homolog
– Copy the scielo virtual host file and edit it tochange the configuration as shown in the
previous slide:• Path
• Server name
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 29/30
• After changing the configuration of apache
/ virtual hosts, you MUST execute:
sudo /etc/init.d/apache2 reload
Configuration in the computers
8/11/2019 Scielo Metodologia En
http://slidepdf.com/reader/full/scielo-metodologia-en 30/30
Configuration in the computers
which access the SciELO websites
- Editing the file: HostsEdit C:\windows\system32\drivers\etc\hosts, adding the
line:
127.0.0.1 localscielo
<IP_homolog> homologscielo
Where:
<IP_homolog> = IP of homologation SciELO web site
homologscielo = server name of homologation
SciELO web site
Do it for all the computers which have to access
localscielo and homologscielo