MANHATTAN DISTANCE AND DICE SIMILARITY EVALUATION ON INDONESIAN ESSAY EXAMINATION SYSTEM

Each learning process requires an evaluation tool to measure the level of understanding of students. The type of evaluation can be multiple choice questions, short entries and essays. Some studies reveal essay exams better than other types of evalua-tions. An essay assessment is automatically needed to save teacher time in correcting answers. However, the development of essay assessments is still ongoing. The aim is to obtain a better accuracy value than the method used in the assessment. Based on these problems, this study proposes a comparative analysis of similarity methods for online essay exam assessment. The similarity method compared is Similarity Dice and Manhattan Distance. Both methods produce coefficient values which are then compared to the assessment of the system with manual scales with the same scale. The data used were 2162 data. This data was obtained from 50 students who answered 40 questions (politics, sports, lifestyle and technology). The data obtained in this study can be used to support other research that can be accessed at www.indonesian-ir.org. This research shows that the Dice similarity scheme is more accurate than Manhattan Distance.


I. INTRODUCTION
HE development of information technology that is so advanced has helped many people in all fields. One of them is education. This technology overcomes the limitations of time and space in conventional learning. Methods are also widely developed in the learning and supporting technology side. The development of information technology that is increasingly rapid in the era of globalization is now unavoidable influence on the world of education. Global demands require the world of education to always and constantly adjust technological developments towards efforts to improve the quality of education, especially adjusting the use of information and communication technology for the world of education, especially in the learning process.
Every learning process requires an evaluation tool to measure the level of understanding of students. Many types of evaluations, ranging from multiple choice questions, short entries to essays. Some studies reveal that MCQs and short entries are inadequate in the teaching and learning process. Conversely, essay exams can train the delivery of information verbally, this test also requires a better understanding. So that the assessment in essay questions can measure the level of understanding more deeply [1].
Many of the benefits that can be obtained from essay assessments are automatic compared to traditional assessments. In British records, teachers spend 30% of their time correcting student answers and eliminating around 30 billion pounds a year because of this [2]. So that it can be imagined the benefits if an educational institution has a system for automatic assessment especially for essays. T At present, there are many e-learning developments for multiple-choice exam assessments, short entries and essays. However, the development of essay assessments is still ongoing. The aim is to obtain better accuracy in the assessment. This is because the number of methods in stating the suitability of students' answers to the answer keys that have been provided by the teacher [3]. Unfortunately, there has not been an analysis of the comparison of the methods (schemes) that are widely used today.
Some researches on Indonesian essay examinations are Indonesian Language Essay Assessment Using the SVM-LSA Method with Generic Features that produce accuracy of 73%, An Automatic Scoring Tool of Short Text Answer in Indonesian in its application has a standard deviation of 3 -30 of various types of data tested [4].
So that it underlies the absence of a comparison between the vectors of similarity that exist, even though this is very important in determining the method used in making an online essay examination assessment system. In addition, the data used in previous research still uses small size data (classes with fewer respondents and not many types of questions).
Based on this, this study aims to determine the comparison of similarity schemes in Cosine Similarity, Euclidean Distance and Jaccard using 4 problem fields (each question area has 10 questions) with 50 students. Where data obtained from this research will be provided by researchers for other research purposes. On the other hand, the use of stemming in the world of word processing, especially the results of online essay examination, has never been matched, therefore, this study will reveal differences in error values (differences in manual values with system values) using stemming.

II. LITERATURE REVIEW
Literature and learning sources in this research activity use a variety of methods to support the success of system performance, including:

A. Vector Space Model
Vector Space Model is a model used to model a text document as a transformation that can convert digital texts into a more efficient and understandable model so that the analysis process can be carried out. For example, Vd is a vector of documents d. The vector has features in the form of values or weights from the terms in the document [5].
To avoid vectors with large but not important features, the word used as a feature is only when it appears on the training data at least three times, or if the word is not stop-word. The following is the vector space model illustration as shown in Figure 1. In VSM, a collection of documents is represented as a term-document matrix (or matrix term frequency). Each cell in the matrix corresponds to the weight given from a term in the specified document. The documents taken are sorted in a sequence that has similarities, the vector model takes into consideration the documents that are relevant to the user's request [6]. Term of documents and its equation can be seen in Figure 2 dan Equation 1.

B. Dice Similarity
Where |X| and |Y| are the cardinalities of the two sets (i.e. the number of elements in each set) [7]. The Sørensen index equals twice the n umber of elements common to both sets divided by the sum of the number of elements in each set.

C. Manhattan Distance
The next distance metric being used in this research is Manhattan distance. Manhattan distance calculates the distance of two vectors by Equation 4. As same as Euclidean, wAk is term weighting k in document A and wBk is term weighting k in document B. The result value is possible to be more than 1 [4].

D. Nazief Adriani Stemming Algorithm
Nazief andriani algorithm is a special stemming algorithm for Indonesian. This algorithm uses several morphological rules to eliminate affix (prefix, affix, etc.) from a word and then match it in the dictionary of root words (basic words) [8]. So the main basis of this algorithm is a list of basic words. The first step is to collect a list of basic words in Indonesian. The more complete the list, the higher the accuracy of this algorithm.

E. Error Rate
Each result of the implementation of two methods of symmetry metrics will be calculated as percentage error and absolute error. The percentage error value shows how much the difference between the measurement and the fact value. A small error value indicates that the error rate of the system is getting better [2]. Here's the equation of it.

III. RESEARCH METHOD
The research method is divided into several phases. The phase in detail will be explained in what will be attached below. The figure shows the analysis scheme in the study. The problem faced is that the time is drained for teachers to correct the essay scores of each student. So that a solution is needed to simplify and cut the time of the instructor in providing value.
The first phase is to make a question and answer key, where each question and key has a category. The categories in the exam system are divided into 5, namely lifestyle, sports, politics, economics, and technology. In this phase, the person in charge of the exam question bank is the admin and instructor. Admin and instructor are able to manipulate or make changes to the questions and key answers to the system. The second phase is students start answering exam questions with an online essay scoring system. Each student is able to choose the category of exam questions to be answered. Each category contains 10 (ten) essay questions that students can answer. The more questions answered correctly, the higher the student's score. The third phase is the process of student answer data that will be assessed by the system. System appraisal is done by doing the stages of text-preprocessing and calculation using the Dice similarity scheme and the comparison of the distance of Manhattan [4].
The evaluation process of this system uses a representation of a vector from a student's test answer document and where each component refers to a term. Then the value of each component is the number of occurrences of the term in a document. Once the document is represented as a vector, various vector operations can be performed.
The fourth phase is giving the manual value of students' answers by the system that has been set in the form of absolute numbers and then taken the average value [9]. The fifth phase is calculating the percentage error value between the average manual rating and the rating of the system [10]. Based on this phase, the error value of each method can be obtained. The following is the flow of the online essay exam scoring system as shown in Figure 3.

A. Data Source
The data which is needed to build this system is derived from a comprehensive dataset in the form of 40 questions where each 10 questions have different categories (Lifestyle, Politics, Sport, Technology). The answers collected were as many as 2162 (two thousand one hundred and sixty two) texts if they were calculated in their entirety.
Furthermore, the key answers and answers of the students will be processed by the text pre-processing process and the two Manhattan Distance methods and Dice Similarity as a comparison of accuracy.

B. Method Steps
There are stages of pre-processing, which convert text data into numerical data that can be processed. This stage is a very important step before starting the automatic valuation process because this process can affect the accuracy of the assessment [11]. In pre-processing there are several steps that must be done.
In this study the pre-processing stage is using stemming. This is related to the absence of studies showing that stemming use makes judgments more effective [3].

1) Text Pre-Processing
Preprocessing is an important task and critical step in Text mining, Natural Language Processing (NLP) and information retrieval (IR) [12]. In the area of Text Mining, data preprocessing used for extracting interesting and non-trivial and knowledge from unstructured text data [13]. The following are the steps that must be carried out on this system.

2) Case Folding
Case folding is a step in text mining to convert uppercase letters to lowercase letters, in the sense that all letters are equaled [14].

3) Tokenizing
Tokenization is the process of cutting an entire sequence of characters into one word chunk [15].

4) Stopword
The process carried out at this stage is to remove stop-word. Stop-word is a word that is not a unique word in an article or general words that are usually always in an article. Examples of Indonesian words including stopword are "yang", "dan", "di", "dari", and others [16].

C. Calculation Example
This sub-section describes the calculation example of Dice Similarity and Manhattan Distance. The example used in this study uses the same example as in Ahmad Hafidh Ayatullah's text mining journal [17]. Sentence A is "Lelaki berjenggot itu sedang menggunting kertasnya". And the sentence B is "pria sedang menggunting kertas". The processes are described as follows:

2) Tokenizing
-  Table 3 describes the results of this step.

IV. SOFTWARE IMPLEMENTATION
The application was developed in a web-based environment. MySQL was chosen as the database management system. Then PHP framework Laravel is used to build the application. The students can log in then choose the exam questions to answer based on the available exam questions categories. Each student is required to answer 10 questions for each category they choose. The categories available in this online essay examination system are Lifestyle, Sports, Politics, and Economics. On the other hand, this system also has role administrators and lecturers page to manage exam questions and answer keys. Figure 4 describes the Physical Data Model (PDM) of "dbessay" database. There are 12 tables in the database with different functions. Each table function is described as follows on Table 4.

A. Database Structure
As shown in the previous table, 3 main tables for the login function, namely the admins table, student_list, and lecturer. Meanwhile, to save the stages of the text pre-processing process the students' answers using 3 tables, namely answers_of_student, doc_preprocessing_answers_ofstudent, doc_preprocessing_key_answers. The exam_questions and exam_category tables contain question bank data and related question types. The score table is used to store all grades of students who have done the exam questions. Relationships between tables can be seen in Figure 4.  to save the results of the preprocessing process from student answers doc_preprocessing_key_answers to save the results of the preprocessing process from the answer key answers_of_student to save all the results of student's exam answers after the process of entering answers. exam_category saves the names of exam category score to store the Manhattan values lecturer to store lecturer users exam_questions to save all essay exam questions

B. User Interface
This page is displayed as the student homepage when the application starts before the application enters the test page. The purpose of this page is to introduce users to this application created by the Polinema agency and created with Laravel Framework. This application's home page is shown in Figure 5. Figure 6 shows an example of answer page for the Lifestyle category exam questions. The number of questions that must be done is 10 items, as stated earlier, each test category has 10 items so that the total items available in this system amount to 40. After the students confirm all of exam answer, the system will automatically correct the students' answers through the stages which are described previously.

V. RESULT AND DISCUSSION
This research is made to improve and develop previous research entitled "Analysis of Aspects of Indonesian Online Language Essay Exams" by Trisna Ari Roshinta [2]. The data used by the two research titles are the same. The data was obtained from 50 students who answered 40 questions (politics, sports, lifestyle and technology).
The data obtained on previous research shows that the percentage error value of Jaccard was 52.31%, Euclidean Distance 332.90%, and Jaccard 59.49%. The following is a comparison of the Error Rate value of the results of the second method of research that has been done (the comparison method is marked by a green column).
After comparing the students' values from the 2 methods of the researchers to the 3 methods of the previous researchers, it can be concluded that the Dice Similarity method has succeeded in obtaining the smallest average error rate, among others, 33.7%. While the highest error rate is generated by the Euclidean method of 333%. Other method error rate results are, Manhattan Distance of 55.4%, cosine of 59.5%, and Jaccard 52.3%. All of the above student grades have surpassed the Stemming process as a refinement of the text pre-processing stage.

VI. CONCLUSION
Based on the results of the analysis, design and implementation carried out, it can be concluded that dice Similarity scheme is known to be more effective than Manhattan Distance. The system method scheme with the smallest percentage error value is the Dice Similarity scheme which is 33.7%. Also, Students who are respondents in this study have varying values that vary, indicating that students work on online essay questions according to their respective abilities. Good student scores on questions with definite types of answers. Because definite answers have a great opportunity to be the same between the key questions and student answers, even without paying attention to the synonyms of the word. From a number of fields taken as test data, it can be seen that the Political problem area is a problem area that has the highest average score compared to the fields of Sports, Technology and Lifestyle.