Top Banner
Conserving Linguistic Heritage the FOSS way...
23

Conserving Linguistic Heritage the FOSS way...

Jul 12, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Conserving Linguistic Heritage the FOSS way...

Conserving Linguistic

Heritage the FOSS way...

Page 2: Conserving Linguistic Heritage the FOSS way...

Hello!I am Omshivaprakash

I’m a Bengaluru based Wikimedian and a FOSS contributor.

I’m here to share my experience helping reuse/conserve the linguistic heritage of Kannada the FOSS way!

Page 3: Conserving Linguistic Heritage the FOSS way...

2013-14Vachana

Sanchaya

11th and 12th Century literature & the need of the hour...

Page 4: Conserving Linguistic Heritage the FOSS way...

‘’We need to be able to research on Vachana Sahitya. We should be able to search Vachana’s on the NET.We need data to understand Sahitya much better.- Sri OL Nagabhushana Swamy- Sri Vasudendra

Page 5: Conserving Linguistic Heritage the FOSS way...

Challenges

▣ ANSI Data available on GoK Website ▣ GOK website not being intuitive▣ 15 large volumes Printed Books + others▣ No real tool to analyze the data at fingertips▣ Hot discussions on public forums needed

concordance & numerical data to debate on literature

Researches wanted data authentically come to consensus via research… but how?

Page 6: Conserving Linguistic Heritage the FOSS way...

Digitize in UnicodeIdea was to get hands on the digitized data in

a reusable format & in Unicode

Page 7: Conserving Linguistic Heritage the FOSS way...

ScrapeWe found that the data was available in digital format on GoK website http://vachanasahitya.gov.in

but in ANSI format.

We pulled the data with wget and write a python script to systematically extract data and converted the text to Unicode.

ALL IN FLAT FILES

Getting to work on data

But...It was not really enough. How does anyone take all the text in files and do research?We proposed to push this to a database and provide simple GUI tools to search text to look at results.

Page 8: Conserving Linguistic Heritage the FOSS way...

more challenges...

Technical difficulties

Providing the end results to large number of people.

Making them understand to use the tools such as MySQL WorkBench/ SQLite Manager etc...

Awareness

Text input methods

SQL syntax

OS compatibility

Expanding scope

What about other research requirements?

How many queries we can write and keep sharing with the linguists not the computer savvy people?

Page 9: Conserving Linguistic Heritage the FOSS way...

An opportunity to build something

For language that is close to our heart with few like minded people around over a cup of coffee, during weekends, whenever we have sometime to scribble through the need of our people…

IT WAS FUN...

Page 10: Conserving Linguistic Heritage the FOSS way...

We builtVachana Sanchaya

http://vachana.sanchaya.net

Page 11: Conserving Linguistic Heritage the FOSS way...

Portal for linguistic research

Page 12: Conserving Linguistic Heritage the FOSS way...

Visualization, Discussion board, Concordance & more...

Page 13: Conserving Linguistic Heritage the FOSS way...

Enable everyone

studentsResearchers Common Man

Page 14: Conserving Linguistic Heritage the FOSS way...

To unearth the wealth of literature

▣ by reading and searching through 21 thousand Vachana’s

▣ written by 250 Vachanakaara’s▣ Researching in finger tips via Concordance &

quick visualizations ▣ Building corpus of 2lac+ unique words ▣ Building biodata of all male & female

vachanakaaras▣ enabling crowd sourced review solution▣ opening up new possibilities for Linguistic

research across other literary work of Kannada.

Page 15: Conserving Linguistic Heritage the FOSS way...

We reached masses across the world...

Page 16: Conserving Linguistic Heritage the FOSS way...

FOSS

All because of the FOSS tools around us and its philosophy

that we believed in...

Page 17: Conserving Linguistic Heritage the FOSS way...

Rails, Nginx, Passenger, Memcached, MySQL, Python, Gitlab, wordpress & more...Only server cost to keep it running

Localized& being adopted to other projects too...

It is being reviewedto be contributed to Wiki Source & Wikipedia

Page 18: Conserving Linguistic Heritage the FOSS way...

Moving forward

Bring more literary works online

Standardize Research platform for language

Create timeline for Centuries of Heritage

Page 19: Conserving Linguistic Heritage the FOSS way...

How we are planning to do this?

CollaborationEnable community collaboration to build research documents around our literary heritage

EngageEngage students and others to work together on our code to build robust and futuristic tools for all type of literary works(Text, Poems, Old Kannada) etc

EvolveEvolve over period of time, adopt learnings from mistakes, reviews and feedbacks

Consult with communitiesWe would like to consult and learn from multiple language communities. Because Vachana Sahitya is translated to more than 15 languages & more

Keep tweakingWe keep working on tweaking the tool and make it robust to be used as a platform for our upcoming projects

Reaching goalsWe are determined to reach our goal of building unified search tool with timeline for centuries of Kannada Literature the FOSS way...

Page 20: Conserving Linguistic Heritage the FOSS way...

We are on Social Media - FB/Twitter/Google+

Embed us on Wordpress via Plugin

We will be on Mobile Soon…

We are opening up APIs to reuse data or build tools around Kannada literature

Adding English and other translated works too....

There is lot more to share

So, Keep in touch!!!

Page 21: Conserving Linguistic Heritage the FOSS way...

Our TeamPavithra, Myself, OLN, Vasudendra, Devaraj

Page 22: Conserving Linguistic Heritage the FOSS way...

Thanks!Any questions?

You can find me at:Kn/En Wiki: User:OmshivaprakashProject Page: http://vachana.sanchaya.netMain Project: http://kannada.sanchaya.net @omshivaprakash | @vachanasanchaya

Page 23: Conserving Linguistic Heritage the FOSS way...

Credits

Special thanks to all the people who made and released these awesome resources for free:▣ Team photo by Amit Mrugvadhe▣ To my team for having made this possible▣ Minicons by Webalys▣ Presentation template by SlidesCarnival▣ Photographs by Unsplash