Resources for Sanskrit and other Indian Languages-- Dr Girish Nath Jha

Page 1

Special Centre for Sanskrit Studies, J.N.U., New Delhi

Current Progress in Developing Resources for Sanskrit and other Indian languages Girish Nath Jha Associate Professor, Computational Linguistics Special Center for Sanskrit Studies, J.N.U., New Delhi – 110067 & Mukesh and Priti Chatter Distinguished Professor of History of Science, University of Massachusetts Dartmouth Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

What is a “resource” ? Language data, corpora in standard formats for computer processing for direct/indirect use by humans

India is considered “resource-poor” country as we do not have enough standard resources. Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

What does it mean for Sanskrit ?

-electronic texts, dictionaries -digital libraries -parallel corpora -search engines -language processing tools (MT, Speech, OCR, OLHWR etc) -second Indology revolution in the making? Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Why Sanskrit?

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Language

Scripts

Family

Hindi Sanskrit Marathi Konkani Maithili Nepali Sindhi Bodo Dogri Santhali Bengali Assamese Manipuri Gujarati Kannada Malayalam Oriya Punjabi Tamil Telugu Urdu Kashmiri

Devanagari Devanagari Devanagari Devanagari Devanagari Devanagari Devanagari Devanagari Devanagari Devanagari, Ol Chiki Bengali Bengali Bengali, Meithei Gujarati Kannada Malayalam Oriya Gurumukhi Tamil Telugu Perso-Arabic Perso-Arabic

Indo Aryan Indo Aryan Indo Aryan Indo Aryan Indo Aryan Indo Aryan Indo Aryan Tibeto Burman Indo Aryan Austro Asiatic Indo Aryan Indo Aryan Indo Aryan Indo Aryan Dravidian Dravidian Indo Aryan Indo Aryan Dravidian Dravidian Indo Aryan Indo Aryan

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Indian constitution on languages 

448 articles, 12 schedules, 107 amendments (so far)

Article III – Fundamental rights Article IV A – Fundamental duties Article XVII – Official Language Article XVII – Regional Languages Article XVII – Language of Supreme Court and High Court Article XVII – Special Directives

  

 

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Sanskrit Commission, 1956

Keynote delivered at WAVES2012, UMASSD, July14,2012


Sanskrit in digital age Computer for Sanskrit ď Ż Sanskrit for Computer ď Ż


major e-contents Sanskrit wikipedia 

Sanskrit wikipedia (Sanskrit medium wikipedia) http://sa.wikipedia.org

Sanskrit wikisource (Sanskrit e-texts)

Sanskrit wiktionary (Sanskrit encyclopedia )

Sanskrit wikiBooks (Sanskrit e-library)


major e-contents 

Digital libraries      

DLI project (http://dli.iiit.ac.in/) 1022 Sanskrit books (IISc, CMU,NSF,ERNET,MCIT) NSF funded, Brown Univ (http://www.sanskritlibrary.org/) Clay’s project (http://www.claysanskritlibrary.org) JJC foundation, NYU Press INRIA, Paris (technical texts, tools) IGNCA (http://ignca.nic.in/sanskrit.htm _ J-TESS (JNU Text Encoding and Search for Sanskrit)


major e-contents 

Sanskrit e-documents Maharshi Mahesh Yogi (http://sanskrit.safire.com/Sanskrit.html)  Avinash Sathaye - Sanskrit documents list(http://sanskritdocuments.org/ )  Srinivas Varkhedi – Sanskrit corpus (http://rsvidyapeetha.ac.in/)  Oliver Hellwig (Univ of Berlin)  Anand Mishra (http://sanskrit.sai.uni-heidelberg.de/)  http://sanskrit.jnu.ac.in 


major e-contents 

Sanskrit documents   

Sanskrit blogs  

Tirupati Vidyapeeth ASR Melkote CDAC- heritage computing group

JNU students Others (http://sanskritlinks.blogspot.com )

Sanskrit corpora and tagset 

JNU , LDC, Univ. of Pennsylvania, U.Hyd


major e-contents: static 

  

 

Himanshu Pota (http://learnsanskrit.wordpress.com/) http://www.ee.adfa.edu.au/staff/hrp/personal/sanskrit/ American Sanskrit Institute (http://www.americansanskrit.com/) Acharya, IITM (http://acharya.iitm.ac.in/sanskrit/tutor.php) Vasudev Bhatt (http://www.ourkarnataka.com/learnsanskrit/sanskrit_main .htm) Sanskrit Bharati (http://www.samskritabharati.org/newsite/index.php) http://sanskritbhasha.blogspot.com/


major e-contents: dynamic 

Tutorials     

Sudhir Kaicker (http://www.sanskrit-lamp.org/_ Prof. G.V.Singh (CASTLE project of DoE) Peter Scharf Avinash Sathaye Sanskrit CD (Mahesh Kulkarni, CDAC Pune)

Language processing tools     

Gerard Huet Amba Kulkarni Peter Scharf Girish N Jha Anand Mishra


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Editor: Girish Nath Jha

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Work done at Jawaharlal Nehru University (JNU), New Delhi Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Special Center for Sanskrit Studies, JNU 

 

 

Linking Traditional scholarship with modern methods Exploring Science & Technology in Sanskrit Developing language technology resources and tools for Sanskrit and other Indian languages Collaboration with universities Collaboration with industry Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

SATIAIT Science And Technology In Ancient Indian Texts Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Center for Indic Studies, UMASSD initiative

Due to the initiative and efforts of Prof Bal Ram Singh, we are doing the following activities    

Identifying key S&T texts Digitizing them, providing computer help Translating Lab experiments Documenting… Keynote delivered at WAVES2012, UMASSD, July14,2012


Editors: Bal Ram Singh Girish Nath Jha Umesh Kumar Singh Diwakar Mishra

Keynote delivered at WAVES2012, UMASSD, July14,2012


Editors: Girish Nath Jha Bal Ram Singh R P Singh Diwakar Mishra

7/14/2012

Special Center for Sanskrit Studies, J.N.U., New Delhi


Editors: Angela Marcantonio Girish Nath Jha

7/14/2012

Special Center for Sanskrit Studies, J.N.U., New Delhi


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Technology Development for Indian Languages Keynote delivered at WAVES2012, UMASSD, July14,2012


Building Blocks of Language Technology Development Standards Software/Tools

Localization

Awareness

Language Technology

Training

Technologies Linguistic Resources Keynote delivered at WAVES2012, UMASSD, July14,2012

Certification


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Near Future initiatives 

Localization R & D Center (JNU, CDAC, IIT Delhi)

NME-ICT center at JNU

(MHRD, JNU)

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Machine Translation  

SHMT (Dept of IT, Govt. of India) SaHiT (unfunded) Microsoft Translator Hub  

  

English-Hindi (Microsoft) English-Urdu (Microsoft) English-Gujarati (Microsoft) Sanskrit-English (unfunded) English-Maithili (unfunded) Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

SHMT (DIT) 

A consortium of 7 universities/institutes  

  

 

University of Hyderabad JNU IIIT Hyderabad Tirupati Vidyapeeth Sanskrit Academy Hyderabad Poornaprajna Vidyapeth Bangalore Rajasthan Sanskrit University, Jaipur

Duration  3 yrs (2008 – 2012) MT system tobe hosted on http://tdil-dc.in very soon

Phase2 (2012-2015) Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Indian Languages Corpora Initiative (ILCI) Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

 

A consortium of Indian universities has been formed under my leadership – 17 languages, remaining 6 to join later Parallel tagged corpora if 100,000 sentences in all Indian languages in tourism, health, agriculture, entertainment domains Funded by TDIL program of Ministry of C & IT Phase1 :2009-12 (a consortium of 12 languages including English) - corpora to be hosted on http://tdil-dc.in very soon Phase2 : 2012-2015 Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Languages & Consortia partners Consortium of universities Server baser corpora development and management >> the server is called “sanskrit� Limited Crowd sourcing 7/14/2012

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Shallow parsing tools for Indian languages  Under

a consortium project led by Univ. of Hyderabad  Morph analyzers for 11 Indian languages  Duration = 2012-15

7/14/2012

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Consultancies

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Online Handwriting Recognition for Devanagari based languages -Microsoft

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Indic languages tagset and annotation -Microsoft Research India

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Multimodal data in 8 security sensitive languages (Indian English, Hindi, Urdu, Tamil, Bangla, Punjabi, Pushto, Dari) -LDC, University of Pennsylvania Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

English- (major) Indian languages Machine Translation (English-Hindi, English-Urdu, English-Gujarati, Sanskrit-English, English-Maithili) Started this summer

-Microsoft Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Some of the recent R&D with the help of research students

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

ď Ż

Sanskrit Speech Synthesizer

(in collaboration with Microsoft Research India) (prototype by next year)

ď Ż

Named Entity Recognizer for Sanskrit (prototype finished) Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

J-TESS : JNU Text Encoding & Search for Sanskrit

Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Tools  Server

based corpora creation, annotation application called ILCIANN  Sanskrit and other Indian languages processing tools  Multimedia animation, e-learning tools  Lexical resources and search  Indian language Transliterator Keynote delivered at WAVES2012, UMASSD, July14,2012


Special Centre for Sanskrit Studies, J.N.U., New Delhi

Demo http://sanskrit.jnu.ac.in

Keynote delivered at WAVES2012, UMASSD, July14,2012


धन्यवाद ! questions?? ક ক

കൂ क କ

ક గ

గ ಕ ક ಕ

girishjha@gmail.com  91-11-26741308

Keynote delivered at WAVES2012, UMASSD, July14,2012


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.