programming_under_the_hood/RobustCh.xml at master · johnnyb/programming_under_the_hood · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
<chapter id="developingrobustprograms">
<title>Developing Robust Programs</title>
<!--

Copyright 2002 Jonathan Bartlett

Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.1 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts.  A copy of the license is included in fdl.xml

-->


<para>
This chapter deals with developing programs that are
<emphasis>robust<indexterm><primary>robust</primary></indexterm></emphasis>.  Robust
programs are able to handle error conditions<indexterm><primary>error conditions</primary></indexterm> gracefully.  They
are programs that do not crash no matter what the user does.  Building
robust programs is essential to the practice of programming.  Writing
robust programs takes discipline and work - it usually entails finding every
possible problem that can occur, and coming up with an action plan for
your program to take.
</para>

<sect1>
<title>Where Does the Time Go?</title>

<para>
Programmers schedule poorly.  In almost every programming project,
programmers will take two, four, or even eight times as long to develop
a program or function than they originally estimated.  There are many
reasons for this problem, including:
</para>

<itemizedlist>
<listitem><para>Programmers don't always schedule time for meetings or other non-coding activities that make up every day.</para></listitem>
<listitem><para>Programmers often underestimate feedback times (how long it takes to pass change requests and approvals back and forth) for projects.</para></listitem>
<listitem><para>Programmers don't always understand the full scope of what they are producing.</para></listitem>
<listitem><para>Programmers often have to estimate a schedule on a totally different kind of project than they are used to, and thus are unable to schedule accurately.</para></listitem>
<listitem><para>Programmers often underestimate the amount of time it takes to get a program fully robust.</para></listitem>
</itemizedlist>

<para>
The last item is the one we are interested in here.  <emphasis>It takes a lot
of time and effort to develop robust<indexterm><primary>robust</primary></indexterm> programs.</emphasis>  More so than
people usually guess, including experienced programmers.  Programmers get
so focused on simply solving the problem at hand that they fail to look at
the possible side issues.
</para>

<para>
In the <literal>toupper</literal> program, we
do not have any course of action if the file the user selects does not
exist.  The program will go ahead and try to work anyway.  It doesn't report
any error message so the user won't even know that they typed in the name
wrong.  Let's say that the destination file is on a network drive, and the
network temporarily fails.  The operating system is returning a
status code<indexterm><primary>status code</primary></indexterm>
to us in &eax-indexed;, but we aren't checking it.  Therefore, if a failure
occurs, the user is totally unaware.  This program is definitely not
robust.  As you can see, even in a simple program there are a lot of
things that can go wrong that a programmer must contend with.
</para>

<para>
In a large program, it gets much more problematic.  There are usually many
more possible error conditions<indexterm><primary>error conditions</primary></indexterm> than
possible successful conditions.
Therefore, you should always expect to spend the majority of your time
checking status codes, writing error handlers, and performing similar tasks
to make your program robust.  If it takes two weeks to develop a program,
it will likely take at least two more to make it
robust<indexterm><primary>robust</primary></indexterm>.  Remember that
every error message that pops up on your screen had to be programmed in by
someone.
</para>

</sect1>

<sect1>
<title>Some Tips for Developing Robust Programs</title>

<sect2>
<title>User Testing</title>

<para>
Testing<indexterm><primary>testing</primary></indexterm> is one of the most essential things a programmer does.  If you haven't
tested something, you should assume it doesn't work.  However, testing isn't
just about making sure your program works, it's about making sure your
program doesn't break.  For example, if I have a program that is only supposed
to deal with positive numbers, you need to test what happens if the user enters
a negative number.  Or a letter.  Or the number zero.  You must test what
happens if they put spaces before their numbers, spaces after their numbers,
and other little possibilities.  You need to make sure that you handle the
user's data in a way that makes sense to the user, and that you pass on that
data in a way that makes sense to the rest of your program.  When your program
finds input that doesn't make sense, it needs to perform appropriate actions.
Depending on your program, this may include ending the program, prompting
the user to re-enter values, notifying a central error log, rolling
back an operation, or ignoring it and continuing.
</para>

<para>
Not only should you test your programs, you need to have others test it as
well.  You should enlist other programmers and users of your program to
help you test your program.  If something is a problem for your users,
even if it seems okay to you, it needs to be fixed.  If the user doesn't
know how to use your program correctly, that should be treated as a bug
that needs to be fixed.
</para>

<para>
You will find that users find a lot more bugs in your program than you
ever could.  The reason is that users don't know what the computer
expects.  You know what kinds of data the computer expects, and therefore
are much more likely to enter data that makes sense to the computer.  Users
enter data that makes sense to them.  Allowing non-programmers to use
your program for testing<indexterm><primary>testing</primary></indexterm>
purposes usually gives you much more accurate results as to how robust<indexterm><primary>robust</primary></indexterm>
your program truly is.
</para>

</sect2>

<sect2>
<title>Data Testing</title>

<para>
When designing programs, each of your functions needs to be very specific
about the type and range of data that it will or won't accept.  You then
need to test these functions to make sure that they perform to specification
when handed the appropriate data.
Most important is testing <emphasis>corner cases<indexterm><primary>corner cases</primary></indexterm></emphasis> or
<emphasis>edge cases<indexterm><primary>edge cases</primary></indexterm></emphasis>.
Corner cases are the inputs that are most likely to cause problems or behave unexpectedly.
</para>

<para>
When testing numeric data, there are several corner cases you always
need to test:
</para>

<itemizedlist>
<listitem><para>The number 0</para></listitem>
<listitem><para>The number 1</para></listitem>
<listitem><para>A number within the expected range</para></listitem>
<listitem><para>A number outside the expected range</para></listitem>
<listitem><para>The first number in the expected range</para></listitem>
<listitem><para>The last number in the expected range</para></listitem>
<listitem><para>The first number below the expected range</para></listitem>
<listitem><para>The first number above the expected range</para></listitem>
</itemizedlist>

<para>
For example, if I have a program that is supposed to accept values between 5
and 200, I should test 0, 1, 4, 5, 153, 200, 201, and 255 at a minimum (153
and 255 were randomly chosen inside and outside the range, respectively).  The
same goes for any lists of data you have.  You need to test that your program
behaves as expected for lists of 0 items, 1 item, massive numbers of items, and
so on.  In addition,
you should also test any turning points you have.  For example, if you have
different code to handle people under and over age 30, for example, you would
need to test it on people of ages 29, 30, and 31 at least.
</para>

<para>
There will be some internal functions that you assume get good data because
you have checked for errors before this point.  However, while in development
you often need to check for errors anyway, as your other code may have
errors in it.  To verify the consistency and validity of data during
development, most languages have a facility to easily check assumptions
about data correctness.  In the C language there is the
 <literal>assert<indexterm><primary>assert</primary></indexterm></literal>
macro.  You can simply put in your code
<literal>assert(a > b);</literal>, and it will give an error if it
reaches that code when the condition is not true.  In addition, since such
a check is a waste of time after your code is stable, the
<literal>assert</literal> macro allows you to turn off asserts at
compile-time.   This makes sure that your functions are receiving good
data without causing unnecessary slowdowns for code released to the public.
</para>


</sect2>

<sect2>
<title>Module Testing</title>

<para>
Not only should you test your program as a whole, you need to test the
individual pieces of your program.  As you
develop your program, you should test individual functions by providing it with
data you create to make sure it responds appropriately.
</para>

<para>
In order to do this effectively, you have to develop functions whose
sole purpose is to call functions for testing.  These are called
<emphasis>drivers<indexterm><primary>drivers</primary></indexterm></emphasis>
(not to be confused with hardware drivers) .
They simply load your function, supply it with data, and check the results.
This is especially useful if you are working on pieces of an unfinished
program.  Since you can't test all of the pieces together, you can create
a driver program that will test each function individually.
</para>

<para>
Also, the code you are testing may make calls to functions not developed
yet.  In order to overcome this problem, you can write a small function
called a <emphasis>stub<indexterm><primary>stub</primary></indexterm></emphasis>
which simply returns the values that
function needs to proceed.  For example, in an e-commerce application,
I had a function called <literal>is_ready_to_checkout</literal>.  Before
I had time to actually write the function I just set it to return true
on every call so that the functions which relied on it would have an
answer.  This allowed me to test functions which relied on
<literal>is_ready_to_checkout</literal> without the function being
fully implemented.
</para>

</sect2>

</sect1>

<sect1 id="handlingerrors">
<title>Handling Errors Effectively</title>

<para>
Not only is it important to know how to test, but it is also important
to know what to do when an error is detected.
</para>

<sect2>
<title>Have an Error Code for Everything</title>

<para>
Truly robust software has a unique error code for every possible contingency.
By simply knowing the error code<indexterm><primary>error code</primary></indexterm>, you should be able to find the location
in your code where that error was signalled.
</para>

<para>
This is important because the error code is usually all the user has to go
on when reporting errors.  Therefore, it needs to be as useful as possible.
</para>

<para>
Error codes should also be accompanied by descriptive error messages.
<indexterm zone="handlingerrors"><primary>error messages</primary></indexterm>
However, only in rare circumstances should the error message try to
predict <emphasis>why</emphasis> the error occurred.  It should simply
relate what happened.  Back in 1995 I worked for an Internet Service Provider.
One of the web browsers we supported tried to guess the cause for every network
error, rather than just reporting the error.  If the computer wasn't connected
to the Internet and the user tried to connect to a website, it would say
that there was a problem with the Internet Service Provider, that the
server was down, and that the user should contact their Internet Service
Provider to correct the problem.  Nearly a quarter of our calls were
from people who had received this message, but merely needed to connect
to the Internet before trying to use their browser.  As you can see, trying
to diagnose what the problem is can lead to a lot more problems than it
fixes.  It is better to just report error codes and messages, and
have separate resources for the user to troubleshooting the application.
A troubleshooting guide, not the program itself, is an appropriate place
to list possible reasons and courses for action for each error message.
</para>

</sect2>

<sect2>
<title>Recovery Points</title>

<para>
In order to simplify error handling, it is often useful to break
your program apart into distinct units, where each unit fails and
is recovered as a whole.  For example, you could break your program
up so that reading the configuration file was a unit.  If reading the
configuration file failed at any point (opening the file, reading the
file, trying to decode the file, etc.) then the program would simply
treat it as a configuration file problem and skip to the
<emphasis>recovery point<indexterm><primary>recovery points</primary></indexterm></emphasis> for that problem.  This way you greatly reduce the number
of error-handling mechanism you need for your program, because error
recovery is done on a much more general level.
</para>

<para>
Note that even with recovery points, your error messages<indexterm><primary>error messages</primary></indexterm> need to be
specific as to what the problem was.  Recovery points are basic
units for error recovery, not for error detection.
Error detection still needs to be extremely exact, and the error
reports need exact error codes and messages.
</para>

<para>
When using recovery points<indexterm><primary>recovery points</primary></indexterm>,
you often need to include cleanup code to
handle different contingencies.  For example, in our configuration file
example, the recovery function would need to include code to check and
see if the configuration file was still open.   Depending on where the
error occurred, the file may have been left open.  The recovery function
needs to check for this condition, and any other condition that might
lead to system instability, and return the program to a consistent state.
</para>

<para>
The simplest way to handle recovery points<indexterm><primary>recovery points</primary></indexterm>
is to wrap the whole program
into a single recovery point.  You would just have a simple
error-reporting function that you can call with an error code and
a message.  The function would print them and and simply exit the
program.  This is not usually the best solution for real-world situations,
but it is a good fall-back, last resort mechanism.
</para>

</sect2>

</sect1>

<sect1>
<title>Making Our Program More Robust</title>

<para>
This section will go through making the <filename>add-year.s</filename>
program from <xref linkend="records" /> a little more robust.
</para>

<para>
Since this is a pretty simple program, we will limit ourselves to
a single recovery point that covers the whole program.  The only thing
we will do to recover is to print the error and exit.  The code to
do that is pretty simple:
</para>

<programlisting>
&error-exit-s;
</programlisting>

<para>
Enter it in a file called <filename>error-exit.s</filename>.  To call it,
you just need to push the address of an error message, and then an error
code onto the stack, and call the function.
</para>

<para>
Now let's look for potential error spots in our <literal>add-year</literal>
program.  First of all, we don't check
to see if either of our <literal>open</literal> system calls actually
complete properly.  Linux returns its status code in &eax-indexed;, so
we need to check and see if there is an error.
</para>

<programlisting>
	#Open file for reading
	movl  $SYS_OPEN, %eax
	movl  $input_file_name, %ebx
	movl  $0, %ecx
	movl  $0666, %edx
	int   $LINUX_SYSCALL

	movl  %eax, INPUT_DESCRIPTOR(%ebp)

	#This will test and see if %eax is
	#negative.  If it is not negative, it
	#will jump to continue_processing.
	#Otherwise it will handle the error
	#condition that the negative number
	#represents.
	cmpl  $0, %eax
	jl    continue_processing


	#Send the error
	.section .data
no_open_file_code:
	.ascii "0001: \0"
no_open_file_msg:
	.ascii "Can't Open Input File\0"

	.section .text
	pushl $no_open_file_msg
	pushl $no_open_file_code
	call  error_exit

continue_processing:
	#Rest of program
</programlisting>

<para>
So, after calling the system call, we check and see if we have an
error by checking to see if the result of the system call is
less than zero.  If so, we call our error reporting and exit routine.
</para>

<para>
After every system call<indexterm><primary>system call</primary></indexterm>,
function call<indexterm><primary>function call</primary></indexterm>,
or instruction<indexterm><primary>instruction</primary></indexterm>
which can have erroneous
results you should add error checking<indexterm><primary>error checking</primary></indexterm> and handling code.
</para>

<para>
To assemble and link the files, do:
</para>

<programlisting>
as add-year.s -o add-year.o
as error-exit.s -o error-exit.o
ld add-year.o write-newline.o error-exit.o read-record.o write-record.o count-chars.o -o add-year
</programlisting>

<para>
Now try to run it without the necessary files.  It now exits cleanly and
gracefully!
</para>

</sect1>

<sect1>
<title>Review</title>

<sect2>
<title>Know the Concepts</title>

<itemizedlist>
<listitem><para>What are the reasons programmer's have trouble with scheduling?</para></listitem>
<listitem><para>Find your favorite program, and try to use it in a completely wrong manner.  Open up files of the wrong type, choose invalid options, close windows that are supposed to be open, etc.  Count how many different error scenarios they had to account for.</para></listitem>
<listitem><para>What are corner cases?  Can you list examples of numeric corner cases?</para></listitem>
<listitem><para>Why is user testing so important?</para></listitem>
<listitem><para>What are stubs and drivers used for?  What's the difference between the two?</para></listitem>
<listitem><para>What are recovery points used for?</para></listitem>
<listitem><para>How many different error codes should a program have?</para></listitem>
</itemizedlist>

</sect2>

<sect2>
<title>Use the Concepts</title>

<itemizedlist>
<listitem><para>Go through the <filename>add-year.s</filename> program and add error-checking code after every system call.</para></listitem>
<listitem><para>Find one other program we have done so far, and add error-checking to that program.</para></listitem>
<listitem><para>Add a recovery mechanism for <filename>add-year.s</filename> that allows it to read from STDIN if it cannot open the standard file.</para></listitem>
</itemizedlist>

</sect2>

<sect2>
<title>Going Further</title>

<itemizedlist>
<listitem><para>What, if anything, should you do if your error-reporting function fails?  Why?</para></listitem>
<listitem><para>Try to find bugs in at least one open-source program.  File a bug report for it.</para></listitem>
<listitem><para>Try to fix the bug you found in the previous exercise.</para></listitem>
</itemizedlist>

</sect2>
</sect1>
</chapter>