ijrar.orgijrar.org/papers/IJRAR_203106.docx · Web viewThis research envisaged to device a method of automatic evaluation of descriptive answers using NLP whose efficiency is more

TESTING AND EVALUATION SYSTEM WITH DESCRIPTIVE ANSWERS USING NATURAL LANGUAGE PROCESSING

ABSTRACT: - In most of the Online Examination System multiple choice questions are evaluated and according to that marks are allotted to candidates. The system fails if there is a requirement of evaluation of descriptive answers. In this study a new method is proposed to evaluate the descriptive answers given by students using Artificial Neural Networks and Natural Language Processing algorithms. In the proposed system the evaluator will create model answer and keywords. This information is stored in the database. After the response of candidate in exam page the system automatically evaluate the result using NLP and ANN. The evaluation of answers takes place by matching the keyword with responded answer by using ANN and then applying the NLP for evaluating the grammatical mistakes. The combined result is stored in database. Thus the result achieved by this method is efficient and have a high level of accuracy.

Keywords: Testing and evaluation system, Artificial Neural Networks [ANN], Natural language Processing [NLP], Text mining.

I. INTRODUCTION

Online testing and evaluation system initially developed for evaluation and result generations for objective answers. Further it is a need of an hour that an intelligent system needs to be developed for automatic evaluation of descriptive answers. Evaluation of objective answers is pretty easy to implement whereas the implementation of evaluation of descriptive answers is difficult task. This task can be performed in many ways but to achieve the efficiency to optimum level is still a challenge.

This research envisaged to device a method of automatic evaluation of descriptive answers using NLP whose efficiency is more than 82%. The main approaches for text assessment are Natural language processing, Key word analysis and information mining. The bigger challenge in this scenario is to analyze different word with same meaning or synonyms. The student may use different words to answer whose meaning is same. System needs to understand the meaning these answers and allot marks as per the case.

Second issue of subjective answers is length. The students give answers as per their convenience but to analyze and categorizing the answers with different length requires huge efforts.

The system proposed will facilitate the evaluator to create the test with different questions and answers. It will provide examination page to the students where the students can view the questions and write the descriptive answers as per their choice. The system will automatically assess the answers by allotting the marks for correct sentence and deducting the marks for wrong sentences or the sentences having grammatical mistakes and prepare the result according to the final marks of the answers.

1

Under the guidance of

Sonal Arora2, Asst. Professor, 2DPG Institute of Engineering and

Technology, Gurugram

Ravi Kumar Mishra1

1DPG Institiute of Technology,Sector 34, Gurugram, Haryana 122001, India

II. METHODOLOGY

In this study the process will start from setting up of question papers and providing model answers and key word to system. The evaluator needs to prepare the test case and train the system for the given questions. The information received by the system will be stored in the database .The system will provide a space to students for writing the answer in specified space. The system will make use of two main algorithms to evaluate the descriptive answer that is Natural language processing and artificial neural network.

Language processing is used to analyze the answer for all grammatical mistakes and makes a meaning full general picture to evaluate the overhaul answer efficiently. It uses the following steps.

Figure 1:-Steps in Natural language Processing

Step 1: Sentence Segmentation: -

Preliminary step that is commonly performed on texts before further processing is the so-called sentence segmentation or sentence boundary detection. This is the process of dividing up a running text into sentences. One aspect which makes this task less straightforward than it sounds is the presence of punctuation marks that can be used either to indicate a full stop or to form abbreviations.

Step 2: Word Tokenization:-

One common task in Natural Language Processing is tokenization. "Tokens" are usually individual words and "tokenization" is taking a text or set of text and breaking it up into its individual words. This way for any sentence all words are separated as tokens for further predicting parts of speech for each token.

Step 3: Predicting Parts of Speech for Each Token:-

Given a sentence, determine the part of speech for each word. Assigning word types to tokens, like verb or noun. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a noun ("the book on the table") or verb ("to book a flight"); "set" can be a noun, verb or adjective and "out" can be any of at least five different parts of speech. So this way predicting parts of speech can be particularly prone to such ambiguity.

Step 4: Text Lemmatization:-

For grammatical reasons, documents are going to use different forms of a word, such as organize, organizes, and organizing. Additionally, there are families of derivationally related words with similar meanings, such as democracy, democratic, and democratization. In many situations, it seems as if it would be useful for a search for one of these words to return documents that contain another word in the set or I can say assigning the base forms of words.

Step 5: Identifying Stop Words:-

2

In natural language processing, useless words (data), are referred to as stop words. Text may contain stop words like 'the', 'is', and ‘are’. Stop words can be filtered from the text to be processed. There is no universal list of stop words in NLP research; however the NLTK module contains a list of stop words.

Step 6: Dependency Parsing:-

A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between

Figure 2:- A dependency parser analysis

"head" words and words which modify those heads. The figure below shows a dependency parse of a short sentence. The arrow from the word moving to the word faster indicates that faster modifies moving, and the label advmod assigned to the arrow describes the exact nature of the dependency.

Stanford NLP provides a super-fast transition-based parser which produces typed dependency parses of natural language sentences. The parser is powered by a neural network which accepts word embedding inputs, as described in the paper:

Step 7: Named Entity Recognition (NER):-

Named-entity recognition is a subtask of information extraction that seeks to locate and classify named entity mentions in unstructured text into pre-defined categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages. In Descriptive answer text, there are particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities, which more specifically refer to terms that represent real-world objects like people, places, organizations, and so on, which are often denoted by proper names. A naive approach could be to find these by looking at the noun phrases in text documents

Step 8: Co reference Resolution:-

Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction. In descriptive answer evaluation system it plays a vital role as meaning of the answer may change if co reference is not implemented correctly.

Figure 3:- Co reference Resolution Example

Other Method used is Artificial neural network which is used for pattern matching of answers. The main characteristics of neural networks are that they have the ability to learn complex nonlinear input-output relationships, use sequential training procedures, and adapt themselves to the data. Artificial neural networks (ANNs) provide a new suite of nonlinear algorithms

3

for feature extraction (using hidden layers) and classification (e.g., multilayer perceptrons). Neural networks known for massive parallelism and for pattern recognition and matching. After introducing the basic principles of ANN, some fundamental networks are examined in detail for their ability to solve simple pattern recognition tasks. These fundamental networks together with the principles of ANN will lead in using the concept in architectures for complex pattern recognition tasks.

III IMPLEMENTATION

The system is implemented using the Standford Core NLP tool. Other techniques used are pattern matching through the use of artificial neural network. The initial basic system is developed using PHP and HTML as front end. The database used is ORACLE to store the required data for evaluating the descriptive answers.

Stanford CoreNLP

Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get the quotes people said, etc.

Here I have chosen the Stanford coreNLP to evaluate the descriptive answer due to the following valuable functionalities.

An integrated NLP toolkit with a broad range of grammatical analysis tools

A fast, robust annotator for arbitrary texts, widely used in production

A modern, regularly updated package, with the overall highest quality text analytics

Support for a number of major (human) languages

Available APIs for most major modern programming languages

Ability to run as a simple web service

Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the co reference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

Figure 4:- Stanford CoreNLP tool performance on a sentence

4

In evaluation part I make use of Stanford CoreNLP in order to check the grammatical mistakes in answers written by a candidate and evaluating them to produce the result as follows:-

As the responded answer of a candidate is saved in database it is in raw text format as the candidate written in the space provided. Raw text moves to NLP process which takes place in text mining process. Sentence segmentation is performed on these answers before further processing .Answer is divided into sentences. The preprocessed sentences of answer are divided in individual words. Each individual word behaves like a token. A newer statistical viewpoint is based on the probabilistic approach to semantics now by comparing these aid it is possible to compare word meanings. For each word part of speech is determined. Part-of-speech (POS) tags are probably the most commonly used type of syntactic information. Part-of-speech tagging is the process of labeling each word in a text as belonging to a particular part of speech, based both on its definition, as well as its adjoining context. a range of other syntactic and semantic procedures for text analysis, such as chunking, parsing, and semantic role labeling, depend on the results of POS taggers. Determining a word’s part of speech is a classification problem, which is why POS taggers are usually constructed by applying supervised machine learning to a text corpus that had been hand-annotated with the correct POS tags. In many situations, it seems as if it would be useful for a search for one of these words to return documents that contain another word in the set or I can say a ssigning the base forms of words. After assigning the base forms the stop words are identified. stop words like 'the', 'is', 'are'. Stop words can be filtered from the text to be processed. The preprocessed text of answers is distributed in categories such as the person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages . These specific entities are more informative and have a unique context.

All the words prepared to this point are saved in database. A synonyms and antonyms table is prepared as per the keywords provided for a particular answer to search for if the candidate has used a different word for some keyword whose meaning is same. Here it is also checked if a candidate has responded with opposite word the marks are depreciated. Now the most important work of research is takes place apply test cases to evaluate the answers. For each correct word the marks are allotted and for each antonym used with negative meaning of correct answer the negative marks are allotted. Thus by combining all these transactions the final result is prepared and stored in database.

Figure 5:- Block diagram of Descriptive answer evaluation system

Candidates answer evaluation using the artificial neural networks algorithm:-

5

In this kind of evaluation the responded answer is compared with the keyword provided. This is normal text comparison performed on the answer given by candidate. For all the words the weightage marks are allotted as per the case of matching. If the match found positive marks are added or else the negative marks are allotted. The marks finalized in this method are saved in database under result table.

To evaluate the answers using Stanford coreNLP the following steps are followed:-

Step 1:- Start.

Step 2:- The text answered by candidate is stored in Database in main Answer table in raw text format .

Step 3:- All the keywords are entered by evaluator against the particular answer are stored in database with reference to the answers. The positive and negative sense words are found and stored in similar manner. Since the positivity of a word lays in reference of answers so this activity is repeated for all new answers and stored with reference of a particular answer only.

Step 4:- Parts of speech tagging is done on sentences and words are classified as they belongs to which parts of speech. These words are saved in database and further matched with the keywords provided against the answer.

Step 5: Candidates score is initialized and score is incremented or decremented as per the keyword matching or not. The reference of positive and negative sense is also taken into consideration to formulate the result very efficiently.

Step 6:- If there is no matching is found or matching of all words completed GOTO Step 7

Step 7:- Calculated score is saved in database against a Answer-Candidate.

Step 8:- Parts of speech are checked as per the dependency vectors of enhanced dependency of the NLP.

Step 9:- The grammatical error is checked of all sentences. For all erroneous and correct grammar in the sentence the marks is amended already saved in database as per Step7 .

Step 10:- The final result is analyzed as per the result calculated so far, if result is found correct for more than 25% the responded answer is declared correct otherwise it is considered incorrect.

Step 11:- End.

Final Result Generation

The result saved in database calculated by both the two methods is finalized after the evaluator invoked the result generation process. Under result table first column name “result_ann” consist of value calculated as per first method by using the artificial neural networks algorithm where the comparison is performed on the answer given by candidate. Simple text comparison takes place to allot the marks. Secondly there is another column in result table named “result_nlp” which is meant to store the result calculated by second method that is text mining by Stanford CoreNLP where the grammatical errors are checked. The marks thus achieved by both the methods are finalized and saved in database under result table in separate column “result_total”. Both the results are taken in to consideration with their weight-age. The marks for each answer candidate combination are saved in final answer column in database under result table. This column holding final result can be displayed in many reports depicting the result of a candidate.

Developed Applications Screen Shots

1. Home page:-The application developed for descriptive answer evaluation is a Role based web application. Users should have a Valid Role Assigned to him for login. After login the user will be redirected to the respective menu with privileges defined for him/her as per Role. A user can be assigned with a Single Role at a time. The different roles which can be designated to a user in this application environment are Administrator, Evaluator, QP Generator, Registration Manager, and Candidate. Administrator is super user and has all the privileges. Home page looks like the figure where a use can login by providing the username and password.

6

Figure 6:- Home page of descriptive evaluation system

2. User Registration:- User can be managed by either Administrator or Registration manger. User update, delete and register facility for a particular exam type is available under this menu. Various fields to hold the information are available as per the different kind of user i.e. candidate.

Figure 7:- User registration management

3. Question Bank Management:- This page is meant for the question management. Question bank management menu appears under the login of QB Manager .Here all the facilities to manage the questions are available. Importing of the questions for a particular test, update question, disable question and answers management services are available under this menu. QB Manager has the sole responsibility to create the question and provide model answers. The provision of providing model answers and preparing the keywords for a question is also catered. The keywords with their dependent positive and negative reference words are entered and saved in database.

7

Figure 8:- Question and Answers management

4. Scheduling of Exam:- Scheduling of examination takes palce under this menu. Here all the test related information is provided like date and time of test start, Test end time, test duration, exam type of test , number of questions for a test, max score as per the questions, passing score. The respective question is selected from database as the information furnished for a test.Once the test is scheduled the candidate can start the exam on start time if he is registered for the same test.

Figure 9:- Exam scheduling

5. Candidates Examination:- Once the examination has been scheduled, the registered candidates can login into the system and can start the exam. Login page is same for all if the role assigned to a user is candidate than he will be redirected to the respective exam start section as shown in below figure. Test information is desplayed with wait information if there is time left to start the examination. As soon as the clock reaches to start exam time the wait option convert to start exam button. By clicking the start exam button the candidate is redirected to the answering page.

Figure 10:- Candidate home page to start the exam

In this page all the questions are displayed with the space to answer the questions. After writing all the answers the candidate can submit the examination. As soon as the candidate clicks on “submit exam” button, the responded answers are saved in database for further evaluation and result generation.

8

Figure 11:- Exam Attempt and submit page

6. Result Generation:- Once the candidate save the raw answered text in database the evaluator can go for the result generation process. By clicking this button the generate result procedure is executed from the database. This is the important work in this research where firstly the Artificial Neural Network Algorithm work to match the keyword and save the result in database, Secondly the information Extraction through NLP takes place where the sense of sentences are checked and evaluated then the result is saved in database.

Figure 12:- Result generation

7. Result table:- The result saved in database calculated by both the two methods is finalized after the evaluator invoked the result generation process. Under result table first column name “result_ann” consist of value calculated as per first method by using the artificial neural networks algorithm where the comparison is performed on the answer given by candidate. Simple text comparison takes place to allot the marks. Secondly there is another column in result table named “result_nlp” which is meant to store the result calculated by second method that is text mining by Stanford CoreNLP where the grammatical errors are checked. The marks thus achieved by both the methods are finalized and saved in database under result table in separate column “result_total”. Both the results are taken in to consideration with their weightage . The marks for each answer candidate combination are saved in final answer column in database under result table. This column holding final result can be displayed in many reports depicting the result of a candidate.

9

Figure 13:-Result in Oracle database Table

This result is calculated by developed system using artificial neural network and natural language processing under research work. After this result analysis generated automatically by system, a survey is conducted on the answers responded by a group of students on manual hand written paper. The papers were evaluated by faculty members. The system’s efficiency can be proved by a comparative study over the manual staff evaluation to the automatic evaluation by developed system. After this comparative study the calculated result data is tabulated and with this it can be easily make out that the system is efficient enough to automatically evaluate the answers written by students. Over 10 students graph the system’s efficiency is more than 93%. Hence the system is proven to evaluate the answers with great effectiveness.

SL NO CANDIDATE NO RESULT_ANN RESULT_NLP RESULT_TOTAL MANUAL EVALUTION RESULT1 193361 48.44 49.15 48.795 50.52 193362 26.7 26.88 26.79 323 193363 44.34 35.89 40.115 414 193364 43.55 26.23 34.89 385 193365 26.88 32.11 29.495 326 193366 24.55 22.89 23.72 307 193367 44.23 42.11 43.17 418 193368 34.55 33.23 33.89 329 193369 35.22 36.23 35.725 38

10 193370 32.56 32.44 32.5 32

Figure 14:- Comparison between the evaluation of system and manual staff

IV CONCLUSION AND FUTURE SCOPE

The system developed is efficiently evaluating the descriptive answers and preparing the result with high level of accuracy where the candidate has to answer the questions in the system provided space with the help of a keyboard. However the system may be further envisaged for voice recognition system, handwritten answers, and image processing. Where the candidate can respond to the question in voice format and will be very much helpful to take online interviews. Further the system can be thought of where the student can upload the handwritten answers and with the help of deep learning and image processing the answers can be evaluated .The system needs to be automatically analyses the format and type of answers which can be further envisaged for inter operability of different languages.

V ACKNOWLEDGEMENT

The author would like to thank Asst. Prof. Sonal Arora of DPG Institute of technology, Gurugram for the important collaboration and ideas which was received on this research. He always provides guidance about the current innovations which can be implemented in this research.

10

REFERENCES

[1] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O'Reilly Media, 2009.

[2] Perez D. , 2004, „Automatic Evaluation of Users‟ Short Essays by using Statistical and Shallow Natural Language Processing Techniques‟, PhD diss. , Master‟s thesis, Universidad Autónoma de Madrid. Retrieved from: http://www. ii. uam. es/dperez/tea. Pdf

[3] Kang Liu, Jun Zhao, Shizhu He, and Yuanzhe Zhang “Question Answering over Knowledge Bases”, IEEE Intelligent Systems, 2015.

[4] Papri Chakraborty (2012),”Developing an Intelligent Tutoring System for Assessing Students'Cognition and Evaluating Descriptive Type Answer”,IJMER,PP 985-990

[5] Meena,K, Lawrance,R"Semantic Similarity Based Assessment of Descriptive Type Answers " in IEEE, 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE'16).

[6] Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Baltimore, Maryland, 55–60. (2014)

[7] Stanfordnlp CoreNLP free tutorials website at https://stanfordnlp.github.io/CoreNLP/tutorials.html[8] Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity Vuk

Batanović1 and Dragan Bojić2 1 School of Electrical Engineering, Bulevar kralja Aleksandra 73, 11120 Belgrade, Serbia [email protected] 2 School of Electrical Engineering, Bulevar kralja Aleksandra 73, 11120 Belgrade, Serbia [email protected]

[9] Computational Methods to Extract Meaning From Text and Advance Theories of Human Cognition Danielle S. McNamara Published in topics 2011

[10] Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. 2015. Ask your neurons: A neural-based approach to answering questions about images. In Proceedings of the International Conference on Computer Vision.

[11] Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 102–107, Avignon, France, April 23 - 27 2012. Association for Computational Linguistics

[12] Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou. 2008. Accelerating the annotation of sparse named entities by dynamic sentence selection. BMC Bioinformatics, 9(Suppl 11):S8.

[13] Angel X. Chang and Christopher D. Manning. 2012. SUTIME: A library for recognizing and normalizing time expressions. In LREC 2012.

[14] Laurie Cutrone, Maiga Chang and Kinshuk, “Auto-Assessor: Computerized Assessment System for Marking Student's Short-Answers Automatically”, IEEE International Conference on Technology for Education,2011.

[15] James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I learned to stop worrying and love NLP pipelines). In LREC 2012

[16] Eduard Hovy, Laurie Gerber, Ulf Hermjakob, Chin-Yew Lin, and Deepak Ravichandran. 2001. Toward semanticsbased answer pinpointing. In Proceedings of the First International Conference on Human Language Technology Research, HLT ’01, pages 1–7, Stroudsburg, PA, USA. Association for Computational Linguistics.

[17] Vasin Punyakanok, Dan Roth, and Wen tau Yih. 2004. Natural language inference via dependency tree mapping: An application to question answering. In IN SUBMISSION.

[18] Magdi Z. Rashad, Ahmed E. Hassan, Mahmoud A. Zaher and Mahmoud S. Kandil, “ An Arabic, Web-based Exam Management System ”, IJECS – IJENS International Journal of Computer Sciences and Electricals ,February 2010, Vol. 10 , No:01.

[19] Charusheela Nehete, Vasant Powar, Shivam Upadhyay and Jitesh Wadhwani, “Checkpoint – An Online Descriptive Answers Grading Tool,” IJARCS, April 2017.

[20] X.Yaowen, L.Zhiping, L.Saidong and T.Guohua," The Design and Implementation of Subjective Questions Automatic Scoring Algorithm in Intelligent Tutoring System," 2nd International Symposium on Computer, Communication, Control and Automation, Vols. 347-350, pp. 2647-2650, 2013.

11

mailto:[email protected]

https://stanfordnlp.github.io/CoreNLP/tutorials.html

ijrar.orgijrar.org/papers/IJRAR_203106.docx · Web viewThis research envisaged to device a method of automatic evaluation of descriptive answers using NLP whose efficiency is more

Documents