-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathchapter_3.tex
More file actions
524 lines (440 loc) · 40.3 KB
/
chapter_3.tex
File metadata and controls
524 lines (440 loc) · 40.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
% Filename : chapter_3.tex
\chapter{Research Methodology}
This chapter discusses the steps and activities performed to accomplish the project. This includes the research activities, and a high-level discussion of the oral reading assessment automation process.
\section{Research Activities}
\subsection{Development Framework}
The software development framework that the developers applied in this study is the agile methodology. Agile methodology or simply agile is a methodology that emphasizes iterative development and features communication and collaboration, adaptive planning, and continuous development \cite{mendix_2022}. The developers chose this framework because of its flexibility and responsiveness to change which can be beneficial in situations where modification of user requirements or key features is required. Furthermore, agile methodology reduces development problems as its incremental delivery and constant feedback allows the developers to foresee problems that may only occur at the later stages of development \cite{sharma2012agile}.
Figure \ref{fig:agile_method_life_cycle} shows the Agile methodology life cycle followed by the developers. It is composed of the requirements, construction, release, and production stages. During the requirements stage, the deliverables for the sprint were visualized and demonstrated using user flow or unified modeling language (UML) diagrams. During the construction stage, the developers worked on the requirements to deliver a working product. This involved frontend and backend development which will be elaborated in Sections \ref{sec:frontend_dev} and \ref{sec:backend_dev}, respectively. The release stage involved functionality testing and the detection of bugs. When defects were found, they were addressed back to the construction stage before proceeding to the next stage. The release stage is the final stage of the life cycle and at this point, the application was deployed and made available to users.
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{figures/diagrams/development_framework.png}
\caption{Agile methodology life cycle adapted from https://www.lucidchart.com/blog/agile-software-development-life-cycle.}
\label{fig:agile_method_life_cycle}
\end{figure}
\nocite{lucidchart}
% \textit{``The Stages of the Agile Software Development Life Cycle''},
Furthermore, the agile methodology was implemented using a hybrid of Scrum and Kanban known as Scrumban. Scrum and Kanban are methods under the agile umbrella and Scrumban is a combination of the best features of these two frameworks. Scrumban adopted Scrum’s structure and predictability and Kanban’s flexibility and visualization \cite{dhiman_2022}.
\subsection{Frontend development}
\label{sec:frontend_dev}
During front-end development, the developers implemented the web application’s user interface and user experience (UI/UX), ensuring that the visual and interactive components are developed and optimized for the web. Specifically, this includes the development of the user interface of the web application’s two primary user groups which are the students and the teachers.
Each user group has a different user interface to cater to their specific needs. For the students’ user interface, the developers implemented a dashboard that shows the students’ current oral reading profile for the pretest and the posttest, and buttons that would redirect them to the assessment proper.
\subsection{Backend development}
\label{sec:backend_dev}
Backend development focused in the underlying functionalities of the application such as accurate display of data, logical page navigation, and seamless procedure of passage reading, speech recording, quiz completion, text transcription, miscue detection, and saving results.
\titlespacing*%
{\paragraph}%
{0pt}%
{2ex}%
{1ex}%
\paragraph{Creation of database} A database was created to store and manage the necessary data of the users. Figure \ref{fig:db_schema} shows the designed and implemented database structure on Firestore Database. The data was organized into nine (9) identified documents, namely \textit{School}, \textit{Admin}, \textit{Section}, \textit{Teacher}, \textit{Student}, \textit{Pretest}, \textit{Posttest}, \textit{Passage}, and \textit{Policy}. Each collection contains the necessary fields. For instance, the \textit{School} document includes the following fields: \textit{School\_ID}, \textit{School\_Name}, \textit{Region\_Name}, \textit{School\_Year}, and \textit{Admin\_ID}.
\begin{figure}[!h]
\centering
\includegraphics[width=0.6\textwidth]{figures/diagrams/CORA_DB_Schema.png}
\caption{CORA's database schema.}
\label{fig:db_schema}
\end{figure}
\paragraph{Integration of Whisper API}
Whisper API was integrated into the application to transcribe the speech recording into a text transcript. The speech recording was first saved as a WAV audio file in the Firestore Storage. The speech recording URL was then retrieved to save the audio file to the \textit{Public} folder. Once the audio is saved, it is then sent to Whisper API for transcription. The duration of the transcription process will depend on the length of the audio.
Figure \ref{fig:WhisperFunction} shows a JavaScript function to send the audio to the Whisper API for text transcription. It uses the OpenAI library to interact with the Whisper API for audio transcription. It first sets up the necessary configuration, including the API key. Then, it creates an instance of the OpenAIApi class using the provided configuration. The \textbf{transcribeAudio} function is defined as an asynchronous function that takes an \textbf{audioPath} parameter, representing the path to the audio file. Within the function, it uses the \textbf{openai.createTranscription} method to send the audio file for transcription. The method takes parameters like the audio file, the desired model (in this case, `whisper-1'), and the language (in this case, `en' for English). The function logs the transcription result to the console and returns it as the output. If any error occurs during the transcription process, it will be caught and logged to the console before being re-thrown.
\lstdefinestyle{myCustomMatlabStyle}{
language=Python,
numbers=left,
basicstyle=\footnotesize,
stepnumber=1,
numbersep=10pt,
tabsize=3,
showspaces=false,
showstringspaces=false,
frame=single
}
% ADDED REVISION
\begin{figure}[!h]
\begin{lstlisting}[style=myCustomMatlabStyle]
Import required modules: Configuration, OpenAIApi from 'openai'
Create a configuration object with the API key
Create an instance of the OpenAIApi class using the configuration
Define an asynchronous function named transcribeAudio that takes
an audioPath parameter
Try the following:
Call the openai.createTranscription method with the
audio file, model ('whisper-1'), and language ('en')
Log the transcription response data to the console
Return the transcription result
Catch any error that occurs during the transcription process:
Log the transcription error to the console
Throw the error
\end{lstlisting}
\caption{Pseudocode for text transcription using Whisper API.}
\label{fig:WhisperFunction}
\end{figure}
% ADDED REVISION
Table \ref{tab:whisperresults} shows the execution time results of the speech transcription using the Whisper API for the collected speech samples. It is important to note that the speed of the speech transcription process is dependent on the audio duration or audio length and the internet connection speed of the student and teacher. The internet connection speed for the examples in Table \ref{tab:whisperresults} is 31.44 Mbps for download and 20.64 Mbps for upload. The website used to conduct speed test was Speedtest by Ookla (https://www.speedtest.net/)\nocite{speedtest}. For shorter texts, such as passage for Grade 3, the execution time is fast, with an execution time of 2.7719 seconds. For longer text, such as passage for Grade 5, the execution time (3.5139 seconds) is slightly slower because the audio duration is longer compared to the first passage.
\begin{table}[!h]
\footnotesize
\centering
\begin{tabular}{|p{1.3cm}|p{4.7cm}|p{3cm}|p{3cm}|}
\hline
\textbf{Grade Level} & \textbf{Passage} & \textbf{Execution Time (in seconds)} & \textbf{Internet Speed Test (Mbps)} \\
\hline
3 & Let's have some fun this summer, says Leo. Let's swim in the river, says Lina. Let's get some star apples from the tree, says Leo. Let's pick flowers, says Lina. That is so much fun, says Mama. But can you help me dust the shelves too? Yes, we can mama, they say. Helping can be fun too.
& 2.7719 & Download: 31.44, Upload: 20.64\\
\hline
5 & One day, a frog sat on a
lily pad, still as a rock.
A fish swam by. “Hello,
Mr. Frog! What are
you waiting for?” “I am
waiting for my lunch,”
said the frog. “Oh, good
luck!” said the fish and
swam away. Then, a duck
waddled by. “Hello, Mr.
Frog! What are you wait-
ing for?” “I am waiting
for my lunch,” said the
frog. “Oh, good luck!”
said the duck and wad-
dled away. Then a bug
came buzzing by. “Hello,
Mr. Frog! What are you
doing?” asked the bug.
“I’m having my lunch!
Slurp!” said the frog. Mr.
Frog smiled. & 3.5139 & Download: 31.44, Upload: 20.64\\
\hline
\end{tabular}
\caption{Whisper API speech transcription speed for collected speech samples.}
\label{tab:whisperresults}
\end{table}
\newpage
\subsubsection{Collection of speech samples}
Speech recordings of the passages were collected for testing the accuracy of text transcription of Whisper API and the miscue detection algorithm. Due to constrained availability of volunteers, only a few samples were collected.
% ADD REASON FOR REVISION
Only one (1) sample speech recording was collected from a Grade 3 student. They read the pretest passage for Grade 3 entitled ``Summer Fun''. Lastly, one (1) sample speech was collected from 1 Grade 5 student. They read the pretest passage for Grade 5 entitled ``Frog's Lunch''.
Table \ref{tab:whispersampletranscribe} compares the manual transcription of speech recordings against transcription of Whisper API. The Grade 3 and Grade 5 sample speech recordings were transcribed, and the API returned transcriptions that were 94.73\% and 100\% accurate, respectively.
\begin{table}[!h]
\footnotesize
\centering
\begin{tabular}{|p{0.4in}|p{1.8in}|p{1.8in}|p{0.5in}|p{0.35in}|}
\hline
\textbf{Grade Level} & \textbf{Manual (raw) transcription} & \textbf{Transcribed Text} & \scriptsize{\textbf{Correctly transcribed words}} & \textbf{\%} \\
\hline
3 & Let's have some fun this summer, says Leo. Let's swim in the river, says Lina. Let's get some star apples from the tree, says Leo. Let's pick flowers, says Lina. That is so much fun, says Mama. But can you help me dust the shelves too? Yes, we can mama, they say. Helping can be fun too.
& Let's have some fun this summer, says Leo. Let's swim in the river, says Dina.
Let's get some star apples from the tree, says Leo. Let's pick flowers, says Dina.
That is so much fun, says Mama. But can you help me? That's the shelves too.
Yes, we can Mama. They say, helping can be fun too.
& 54/57
& 94.73 \\
\hline
5 &
One day, a frog sat on a lily pad.
Still as a rock, a fish swam by.
Hello, Mr. Frog, what are you waiting for?
I am waiting for my lunch, said the frog.
Oh, good luck! Said the fish and swam away.
Then a duck waddled by.
Hello, Mr. Frog, what are you waiting for?
I am waiting for my lunch, said the frog.
Oh, good luck! Said the duck and waddled away. Then a bug came buzzing by.
Hello, Mr. Frog, what are you doing?
Asked the bug. I'm having my lunch.
Slurp! Said the frog. Mr. Frog smiled.
& One day, a frog sat on a lily pad.
Still as a rock, a fish swam by.
Hello, Mr. Frog, what are you waiting for?
I am waiting for my lunch, said the frog.
Oh, good luck! Said the fish and swam away.
Then a duck waddled by.
Hello, Mr. Frog, what are you waiting for?
I am waiting for my lunch, said the frog.
Oh, good luck! Said the duck and waddled away. Then a bug came buzzing by.
Hello, Mr. Frog, what are you doing?
Asked the bug. I'm having my lunch.
Slurp! Said the frog. Mr. Frog smiled. & 101/101 & 100.00 \\
\hline
\end{tabular}
\caption{Whisper API Text Transcription for collected speech samples.} \vspace{0.25em}
\label{tab:whispersampletranscribe}
\end{table}
To add sample speech recordings for testing, the developers created two (2) speech recordings reading the pretest passage for Grade 2 students entitled ``Pam's Cat''. One is an accurate reading of the passage, and one was read with intentional errors.
\newpage
\subsection{Tools}
\subsubsection{Software}
\paragraph{Trello}
Trello is a web-based project management tool that visualizes progress on a Kanban board. It offers a wide array of features such as the inclusion of image and file attachments on task cards, deadline management, and a powerful status-tracking system \cite{johnson_2021}.
This software tool was used for project workflow management and task tracking. The developers used Trello to visualize every deliverable into a card that contains necessary information such as the name of the assignee and due date. Furthermore, its core features are free, and the tool has a user-friendly interface and intuitive functionalities making it easy to use.
\paragraph{Figma}
Figma is a cloud-based collaborative interface design tool. This tool is primarily built to enhance design teamwork as well as provide powerful functionalities and features such as cross-platform compatibility, team communication, permission-based sharing, facilitated developer handoff (i.e., displaying of code snippets), third-party tool integration, and an intuitive interface \cite{kopf_2018}.
This software tool was used by the developers to create mockups and design the user interface of the application. The developers used this tool to work on the user interfaces of the application including the color schemes, illustrations used, and UI elements. Furthermore, Figma is free, cross-platform, and has a wide array of collaboration tools and plugins allowing the developers to work together seamlessly.
\paragraph{Visual Studio Code}
Visual Studio Code is a multi-platform source code editor that runs on a desktop. This lightweight but powerful tool offers a variety of useful functionalities and features such as intelligent code completion, streamlined debugging, rich Git integration, and a wide array of installable extensions \cite{microsoft_2021}.
This tool was used as the developers’ primary source code editor. Visual Studio Code is lightweight and offers a comprehensive working environment for software development. It supports Git operations (e.g., commit, push, pull, and merge), and seamless installation of dependencies using the npm package manager. Moreover, this tool supports several extensions or plugins for efficient software development such as Emmet abbreviations and automatic code formatting. All these features and intuitive functionalities make it a user-friendly application for developers.
\titlespacing*%
{\paragraph}%
{0pt}%
{3ex}%
{1ex}%
\paragraph{GitHub}
GitHub is a repository hosting service for Git-based software development and version control. Aside from providing the distributed version control of Git, GitHub also provides a web-based graphical user interface for a seamless Git workflow. Furthermore, GitHub also offers a variety of powerful features such as collaborative coding, security, cross-platform compatibility, and team administration \cite{github}.
Github makes collaborative development more efficient by hosting a remote repository for the project which developers may access via Internet. This service was used by the developers to manage the codebase, control and track the changes applied, and fix minor merge conflicts. Furthermore, the necessary services are free and more intuitive to use compared to other services like BitBucket.
\paragraph{Next.js}
Next.js is a framework for developing web applications with React. It enables developers to create fast and flexible websites which can be displayed both on the server as well as in the client's browser. Next.js provides a number of features which make it easy to build effective SEO friendly Web applications such as server side rendering, automatic code splitting, and routing. Furthermore, this framework was used by the developers in developing the web application due to its accessibility as an open source framework and seamless API routing.
\subsection{Third-party Software Services}
\subsubsection{Application Program Interface}
\titlespacing*%
{\paragraph}%
{0pt}%
{1ex}%
{1ex}%
\paragraph{Whisper}
Whisper is an automatic speech recognition (ASR) system provided by OpenAI that provides speech recognition and transcription services (i.e., the conversion of speech into text). It was trained on 680,000 hours of supervised audio data in multiple languages and makes 50\% fewer errors compared to other models in zero-shot performance (i.e. it classifies objects from unseen data better) \cite{openai_whisper}. The Whisper API takes an audio file (.mp3, .m4a, .wav) as input and returns a text transcript with other metadata such as detected language and timestamps. The cost of speech transcription using the Whisper API is \$0.006 or ₱0.33 per minute. The developers employed Whisper as its automatic speech recognition tool to convert speech recording into text transcript.
% ADDED REVISION
According to OpenAI (n.d.)\nocite{openai_whisper}, Whisper approaches human level robustness and accuracy on English speech recognition. Radford et al. (2022)\nocite{radford2022robust} tested the noise robustness of Whisper models and 14 LibriSpeech-trained models by adding either white noise or pub noise to the audio. The added noise represents a noisy environment like a crowded restaurant or a pub. The results showed that most models performed better when there was less noise. However, all models degrade as the noise increases, performing worse than Whisper. The developers also tested the noise robustness of Whisper by recording in a noisy environment like with people talking in a medium size room. Whisper was able to transcribe the speech correctly despite the noisy environment. The results indicate that Whisper is robust to noise, especially in noisy environments.
\titlespacing*%
{\subsubsection}%
{0pt}%
{3ex}%
{1ex}%
\subsubsection{Cloud Services}
Cloud services are types of software services offered to users via internet to help develop and build software applications. Firebase is an online platform supported by Google that offers backend cloud computing services and development platforms. This study utilized Firebase's Authentication, Firestore, and Storage services for user authentication, data storage, and file storage.
\paragraph{Firebase Authentication}
Firebase Authentication is a backend service for identifying and verifying eligible users for the application. It keeps user login information such as email and password securely, and uses it to authenticate users. It offers software development kits (SDKs) and ready-made user interfaces (UIs) for logging in. The developers utilized available SDKs to implement the login feature of the application.
\paragraph{Firestore}
Firestore is a NoSQL cloud database accessible via Firebase. It stores and synchronizes data across clients and servers. Firestore is capable of handling hierarchical data structures and supports nested objects and collections. The developers utilized Firestore for its flexible and scalable storage for structured data.
\paragraph{Firebase Storage}
Firebase Storage is a file storage system for unstructured data such as images, audio, and videos. The service supports upload and download of files to a Google Cloud Storage bucket, where files are accessible from Firebase and Google Cloud \cite{firebase_storage}. The developers used Firebase Storage to store speech recordings before text transcription.
\section{Oral Reading Assessment Automation}
This section elaborates on the processes involved in automating the Phil-IRI Oral Reading Test in English. It includes (1) audio or speech recording, (2) speech recognition, (3) detection of miscues, (4) computation of reading rate, (5) quiz administration, and (6) generation of results.
For the purpose of recognizing the student as a test taker, and to not limit the use of the application to school children, individuals that would take the assessments will be referred to as test taker in the following sections.
Figure \ref{fig:usecase} shows the tasks that the users of the application can perform. Registered test takers can start by logging in and choosing a passage to read (i.e., pretest passage or posttest passage). After choosing a passage, they will be able to start reading and take a quiz. Teachers can also start using the web application by logging in. Once successful, they can view test taker's data per section or per individual.
\begin{figure}[!h]
\centering
\includegraphics[width=\textwidth]{figures/diagrams/UseCaseDiagram_CORA.png}
\caption{Use Case Diagram for CORA.}
\label{fig:usecase}
\end{figure}
Figure \ref{fig:sequence} shows the sequence of processes in administering the automated oral reading assessment. Registered test takers start by logging into the system and then takes either the oral reading pretest or posttest. As the test taker reads the passage, their speech would be recorded into an audio file. The audio file would be used to obtain text transcription necessary for detecting reading miscues and computation of reading rate. After speech recording, the test taker will take the quiz related to the passage. Results are then summarized.
\newpage
\begin{figure}[!h]
\centering
\includegraphics[width=\textwidth]{figures/diagrams/SequenceDiagram_AutomatedORAwithTestee.png}
\caption{Sequence diagram for using CORA.}
\label{fig:sequence}
\end{figure}
\newpage
\hfill\break
\subsection{Speech Recording}
The first step in the automated administration of oral reading test was the recording of the test taker's reading speech as they read grade-level-appropriate passages. As a test taker read the assigned passage, the system recorded their speech.
The audio recording obtained was then transcribed into text. The recording had to be monophonic (mono) in .wav format with a minimum bit rate of 128 kbps, minimum sample rate of 44.1 kHz, and bit depth of at least 16 bits for standard, high-quality recording. Last 2020, the Department of Education purchased computers for teachers as the school shifted towards a blended education system during the coronavirus new normal. The DepEd ensured that the equipment provided to teachers were internet-capable \cite{Bernardo_Domingo_2020}. The models and specifications of the procured computers were not specified. However, since the computers are ensured to be internet-capable and suitable for a blended education system, it is expected that they meet the recommended minimum requirements for audio recording.
In the actual collection of recordings, it was optimal to conduct audio recording in a noise-free and echo-less environment, preferably in a closed recording studio equipped with quality recorders. A recorder equipped with a cardioid condenser microphone with a response range of approximately 80 to 15,000 Hz was recommended. Condenser microphones were able to pick up subtleties in speech recordings, and the cardioid polar pattern picked up sound in front of the equipment\nocite{musiciansfriend_2022}, eliminating unnecessary noise from the back of the microphone. A response range of 80 to 15,000 Hz covered the range of frequencies audible to human beings.
\titlespacing*%
{\subsection}%
{0pt}%
{3ex}%
{1ex}%
\subsection{Quiz Administration}
After the test taker read the passage, they shall answer a multiple-choice quiz about the passage to assess their comprehension of the text they read. The project adapted the quizzes supplemented by Phil-IRI for every passage.
\subsection{Speech Recognition}
The audio recording obtained from the test taker's speech would be sent to Whisper API for recognition. The API shall return text transcription of the speech alongside other information such as timestamps of individual words (in seconds). The returned speech transcription will be used to detect miscues in the test taker's speech.
\subsection{Detection of Miscues}
The developers adapted the Needleman-Wunsch alignment approach in detecting reading miscues which compared a reference text to a speech transcription transcribed by using automatic speech recognition. The alignment approach was used by the developers due to the following reasons:
\begin{enumerate}
\item Miscue patterns might change depending on the speaker or the situation. Since an alignment technique doesn't rely on predetermined prediction models, which could have trouble capturing a range of patterns, it can be flexible in identifying and addressing different sorts of errors.
\item Alignment methods take sentence alignment and context into account to provide dynamic mistake detection. They can be excellent for situations where the kinds or patterns of errors might alter over time or in response to various inputs due to their versatility.
\item Due to the lack of or access to labeled training data for predictive modeling, the alignment technique is the more feasible approach. Using this, sentences can be aligned and compared to find errors without depending on previously labeled data.
\end{enumerate}
Furthermore, the developers adapted the Needleman-Wunsch algorithm instead of the Smith-Waterman algorithm due to the following reasons:
\begin{enumerate}
\item The Needleman-Wunsch algorithm is particularly intended for global alignment, which aligns entire sequences from start to end. On the other hand, the Smith-Waterman method is made for local alignment, which concentrates on finding the optimal alignment within a region of interest (e.g., phrases) rather than a complete text \cite{durbin1998biological}. Since the application’s miscue detection feature aims to compare the total similarity or dissimilarity between the reference and transcribed text, the Needleman-Wunsch algorithm is more suitable.
\item The Needleman-Wunsch algorithm completes every point in the dynamic programming matrix to ensure consistency in the alignment. This might be crucial for comparing and assessing errors in two phrases since it makes sure that all factors are taken into account and consistent.
\item The Needleman-Wunsch algorithm can be faster than the Smith-Waterman algorithm, especially when comparing large sequences. The Needleman-Wunsch approach can provide computational efficiency when dealing with lengthy words or when doing mistake detection on large amounts of data. The Smith-Waterman method, on the other hand, conducts a more thorough search for local alignments, taking into account all feasible alignments within the defined region of interest. When working with lengthy words, this comprehensive search can be computationally costly, which reduces its effectiveness
\cite{durbin1998biological}.
\end{enumerate}
% \subsubsection{Needleman-Wunsch Algorithm}
% The algorithm called the Needleman-Wunsch algorithm was used to align the passage text and the returned transcription text. The Needleman-Wunsch algorithm applies dynamic programming in bioinformatics to align protein sequences. It is sometimes referred to as the optimal matching algorithm or the global alignment technique. It compares two protein sequences, each composed of a string of letters (A, T, C, and G), and determines the optimal alignment by assigning scores to matches, mismatches, and gaps, allowing for the identification of similarities and differences between the sequences \cite{NEEDLEMAN1970443}. Matches are two letters at the current index that are the same, mismatches or substitutions are two letters at the current index that are different, and gaps are alignments where one letter aligns to a blank in the other sequence which can either be a deletion wherein the letter in the first sequence aligns to a blank in the second sequence or an insertion if otherwise. Essentially, the Needleman-Wunsch algorithm can detect substitutions, insertions, and deletions present in a given pair of sequences.
\subsubsection{Generalizing Phil-IRI's Types of Miscues}
The Phil-IRI manual identified eight types of oral reading miscues namely mispronunciation, omission, substitution, insertion, repetition, transposition, reversal, and self-correction (see Table \ref{table:philirimiscues}). However, the Needleman-Wunsch algorithm is only designed to detect three types of discrepancies in a given pair of sequences which are substitutions, insertions, and deletions. To resolve the issue, the developers generalized the 8 types of miscues to fit into the 3 types of discrepancies that the Needleman-Wunsch algorithm can detect.
The mispronunciation type of miscue is characterized by a wrong utterance of a word. When transcribed, mispronunciation can manifest as an incorrectly-spelled instance of the reference word. In Table \ref{tab:global_ex1}, the word ``over'' was pronounced ``ober'' and can be considered as a form of substitution.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & lazy & dog \\
\hline
quick & brown & fox & jumps & \textit{ober} & lazy & dog \\
\hline
\end{tabular}
\caption{An example of mispronunciation type of miscue.}
\label{tab:global_ex1}
\end{table}
The omission type of miscue is characterized by the failure of a word to be recognized and read aloud. When transcribed, omission can manifest as a gap as it won't appear in the transcription. In Table \ref{tab:global_ex2}, the word ``over" was not pronounced and can be considered a form of deletion.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & lazy & dog \\
\hline
quick & brown & fox & jumps & & lazy & dog \\
\hline
\end{tabular}
\caption{An example of omission type of miscue.}
\label{tab:global_ex2}
\end{table}
The substitution type of miscue is characterized by a wrong utterance of a word resulting in a different word being pronounced. When transcribed, substitution can manifest as a word that is a homophone (i.e., phonetically similar) of the reference word. In Table \ref{tab:global_ex3}, the word ``over'' was pronounced ``hover'' and can be considered as a form of substitution.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & lazy & dog \\
\hline
quick & brown & fox & jumps & hover & lazy & dog \\
\hline
\end{tabular}
\caption{An example of substitution type of miscue.}
\label{tab:global_ex3}
\end{table}
The insertion type of miscue is characterized by the addition of a new word in a phrase or sentence. When transcribed, insertion can manifest as an additional word in the transcription. In Table \ref{tab:global_ex4}, the word ``fat'' was added and can be considered as a form of insertion.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & lazy & & dog \\
\hline
quick & brown & fox & jumps & over & lazy & fat & dog \\
\hline
\end{tabular}
\caption{An example of insertion type of miscue.}
\label{tab:global_ex4}
\end{table}
The repetition type of miscue is characterized by multiple utterances of a word. When transcribed, repetition can manifest as an additional word in the transcription. In Table \ref{tab:global_ex5}, the word ``over'' was pronounced twice and can be considered as a form of insertion.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & & lazy & dog \\
\hline
quick & brown & fox & jumps & over & over & lazy & dog \\
\hline
\end{tabular}
\caption{An example of repetition type of miscue.}
\label{tab:global_ex5}
\end{table}
The transposition type of miscue is characterized by a wrong order of word utterance. When transcribed, transposition can manifest as insertion and omission of words or vice versa. In Table \ref{tab:global_ex6}, the word ``brown'' was pronounced before ``quick'' and the transposition of words created an insertion and a deletion. Furthermore, it is also possible for the algorithm to consider transposition as two substitutions if the words are very similar or if it is the most optimal alignment.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & & fox & jumps & over & lazy & dog \\
\hline
& brown & quick & fox & jumps & over & lazy & dog \\
\hline
\end{tabular}
\caption{An example of transposition type of miscue.}
\label{tab:global_ex6}
\end{table}
The reversal type of miscue is characterized by a reversed utterance of a word. When transcribed, the reversal can manifest as a reversely-spelled instance of the reference word. In Table \ref{tab:global_ex7}, the word ``dog" was pronounced ``god'' and can be considered as a form of substitution.
\begin{table}[!h]
\centering
\begin{tabular}{ | m{1cm} | m{1cm}| m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | m{1cm} | }
\hline
quick & brown & fox & jumps & over & lazy & dog \\
\hline
quick & brown & fox & jumps & over & lazy & god \\
\hline
\end{tabular}
\caption{An example of reversal type of miscue.}
\label{tab:global_ex7}
\end{table}
The self-correction type of miscue was not included as it doesn't count as an error and therefore, won't affect the overall miscue score of the test takers. Furthermore, it is important to take note that this generalization is possible due to the Phil-IRI manual's scoring system not using weights in scoring the miscues detected (i.e., every miscue is counted as just 1 error). The total number of detected miscues was used to calculate the Word Reading score of the test taker as shown in \ref{eq:wordreadingscore}.
\subsubsection{Needleman-Wunsch Algorithm as a Miscue Detector}
Each word in the passage text will be aligned to its corresponding word or position in the test taker's speech transcription. In the resulting alignment, matches can either be a perfect match or a substitution. Gaps in the alignment would indicate either insertion or deletion. It would be an insertion if a gap in the passage text is aligned to a word in the transcription, and a deletion if a word in the passage text is aligned to a gap. Substitution, insertion, and deletion all account for a single miscue score.
\subsubsection{Implementation of the Needleman-Wunsch Algorithm}
The following shows a slightly modified implementation of the Needleman-Wunsch algorithm.
% \lstdefinestyle{myCustomMatlabStyle}{
% language=Python,
% numbers=left,
% basicstyle=\footnotesize,
% stepnumber=1,
% numbersep=10pt,
% tabsize=3,
% showspaces=false,
% showstringspaces=false,
% frame=single
% }
% ADDED REVISION
The code in Figure \ref{fig:ArrayInit} shows the initialization of a two-dimensional array called \textbf{score} with zeros. The array has a size of \textbf{sentence1.length + 1} rows and \textbf{sentence2.length + 1} columns. This array will be used to store the alignment scores between characters in \textbf{sentence1} and \textbf{sentence2}.
\begin{figure}[!h]
\begin{lstlisting}[style=myCustomMatlabStyle]
score = Create 2D Array with dimensions (sentence1.length + 1) x
(sentence2.length + 1)
for i = 0 to sentence1.length do
score[i] = Create 1D Array with length (sentence2.length + 1)
and fill it with 0
end for
\end{lstlisting}
\caption{Array initialization.}
\label{fig:ArrayInit}
\end{figure}
% \newpage
In the code in Figure \ref{fig:ScoreMatrixInit}, the first row and the first column of the \textbf{score} array are filled with increasing numbers starting from 0 to \textbf{sentence1.length} and \textbf{sentence2.length}, respectively. This step initializes the base cases for the alignment algorithm, representing the cost of aligning a character with an empty space (insertion or deletion).
\begin{figure}[!h]
\begin{lstlisting}[style=myCustomMatlabStyle]
for i = 1 to sentence1.length do
score[i][0] = i
end for
for j = 1 to sentence2.length do
score[0][j] = j
end for
\end{lstlisting}
\caption{Score matrix initialization.}
\label{fig:ScoreMatrixInit}
\end{figure}
The code in Figure \ref{fig:FillScoreMatrix} calculates the alignment scores for each position in the \textbf{score} array. It iterates through each row and column, starting from index 1, and computes the score based on three possibilities: deletion (cost of aligning the current character in \textbf{sentence1} with an empty space in \textbf{sentence2}), insertion (cost of aligning an empty space in \textbf{sentence1} with the current character in \textbf{sentence2}), and match or substitution (cost of aligning the current characters in both sentences, either as a match or a substitution). The minimum of these three possibilities is stored in the score array for each position.
\begin{figure}[!h]
\begin{lstlisting}[style=myCustomMatlabStyle]
for i = 1 to sentence1.length do
for j = 1 to sentence2.length do
deletion = score[i - 1][j] + 1
insertion = score[i][j - 1] + 1
match = score[i - 1][j - 1] + (sentence1[i - 1] equals
sentence2[j - 1] ? 0 : 1)
score[i][j] = Minimum of (deletion, insertion, match)
end for
end for
\end{lstlisting}
\caption{Filling the score matrix.}
\label{fig:FillScoreMatrix}
\end{figure}
The code in Figure \ref{fig:Backtracking} determines the insertions, deletions, and substitutions needed to align \textbf{sentence1} with \textbf{sentence2}. It starts from the bottom-right corner of the score array and moves towards the top-left corner, tracking the minimum-cost alignment. Counters for insertions, deletions, and substitutions are updated accordingly. The process continues until reaching the top-left corner (\textbf{i} and \textbf{j} become 0), marking the alignment's end. The variables \textbf{numInsertions}, \textbf{numDeletions}, and \textbf{numSubstitutions} store the final counts.
\begin{figure}[!h]
\begin{lstlisting}[style=myCustomMatlabStyle]
i = sentence1.length
j = sentence2.length
numInsertions = 0
numDeletions = 0
numSubstitutions = 0
while i > 0 or j > 0 do
if i > 0 and score[i][j] equals score[i - 1][j] + 1 then
numDeletions = numDeletions + 1
i = i - 1
else if j > 0 and score[i][j] equals score[i][j - 1] + 1 then
numInsertions = numInsertions + 1
j = j - 1
else
if sentence1[i - 1] does not equal sentence2[j - 1] then
numSubstitutions = numSubstitutions + 1
end if
i = i - 1
j = j - 1
end if
end while
\end{lstlisting}
\caption{Backtracking to determine the number of insertions, deletions, and substitutions needed to align sentence1 with sentence2.}
\label{fig:Backtracking}
\end{figure}
\newpage
\subsection{Computation of Reading rate}
The Phil-IRI manual provides a formula for computing reading rate of a test taker as shown in Equation \ref{eq:readingspeed}. This would be used in computing the reading speed rate of the test taker. The number of words refers to the total count of words in the text transcription.
\subsection{Generation of Results}
Once all items in the quiz have been answered and submitted, the test taker's quiz score will be displayed to serve as feedback for the assessment. The results of miscue detection, reading rate computation, and overall oral reading profile however would only be visible to test administrators or teachers to aid in their analysis of the needs of test taker to improve their reading comprehension.