Newton 2.x Q&A Category: NewtonScript

Copyright © 1997 Newton, Inc. All Rights Reserved. Newton, Newton Technology, Newton Works, the Newton, Inc. logo, the Newton Technology logo, the Light Bulb logo and MessagePad are trademarks of Newton, Inc. and may be registered in the U.S.A. and other countries. Windows is a registered trademark of Microsoft Corp. All other trademarks and company names are the intellectual property of their respective owners.


For the most recent version of the Q&As on the World Wide Web, check the URL: http://www.newton-inc.com/dev/techinfo/qa/qa.htm
If you've copied this file locally, click here to go to the main Newton Q&A page.
This document was exported on 7/23/97.

NewtonScript


Nested Frames and Inheritance (10/9/93)

Unlike C++ and other object oriented languages, NewtonScript does not have the notion of nested frames obtaining the same inheritance scope as the enclosing frame.

This is an important design issue, because sometimes you want to enclose a frame inside a frame for name scoping or other reasons. If you do so you have to explicitly state the messages sent as well as explicitly state the path to the variable:

Here's an example that shows the problems:

myEncloser := {
    importantSlot: 42,
    GetImportantSlot := func()
        return importantSlot,

    nestedSlot := {
        myInternalValue: 99,

        getTheValue := func()
            begin
            local foo;
            foo := :GetImportantSlot();            // WON'T WORK; can't find function
            foo := myEncloser:GetImportantSlot();    // MAY WORK

            importantSlot := 12;       // WON'T WORK; will create new slot in nestedSlot
            myEncloser.importantSlot := 12;        // MAY WORK
            end
    }
};

myEncloser.nestedSlot:GetTheValue();


The proper way to accomplish this is to give the nested frame a _parent or _proto slot that references the enclosing frame. Nesting the frame is not strictly necessary in this case, only the _proto or _parent references are used.


Symbol Hacking (11/11/93)

Q: I would like to be able to build frames dynamically and have my application create the name of the slot in the frame dynamically as well. For instance, something like this:
MyFrame:= {}; theSlotName := "Slot_1";


At this point is there a way to then create the following?... MyFrame.Slot_1

A: The function Intern takes a string and returns a symbol. There is also a mechanism called path expressions (see the NewtonScript Reference), that allows you to specify an expression or variable to evaluate, in order to get the slot name. You can use these things to access the slots you want:

    MyFrame := {x: 4};
    theXSlotString := "x" ;

    MyFrame.(Intern(theXSlotString)) := 6 

    theSlotName := "Slot_1";
    MyFrame.(Intern(theSlotName)) := 7;

    // myFrame is now {x: 6, Slot_1: 7}


Check for Application Base View Slots (3/6/94)

Here's a simple function that will print out all the slots and the slot values in an application base view. This function is handy if you want to check for unnecessary slots stored in the application base view; these eat up the NewtonScript heap and eventually cause problems with external PCMCIA RAM cards.

    call func() 
    begin
        local s,v;
        local root := GetRoot();
        local base := root.|YourApp:YourSIG|; // name of app
        local prot := base._proto;

        foreach s,v in base do
        begin
            if v and v <> root AND v <> base AND v <> prot then
              begin
               Write ("Slot:" && s & ", Value: ");
               Print(v);
              end;
        end;
    end with ()

The debugging function TrueSize can also be a valuable tool to determine the heap used by your applications. See the NTK User Guide for more information about TrueSize.


Performance of Exceptions vs Return Codes (6/9/94)

Q: What are the performance tradeoffs in writing code that uses try/onexception vs returning and checking error results?

A: We did a few trials to weight the relative performance. Consider the following two functions:

    thrower: func(x) begin
        if x then
            throw('|evt.ex.msg;my.exception|, "Some error occurred");
        end;
 
    returner: func(x) begin
        if x then
            return -1;    // some random error code,
        0; // nil, true, whatever.
        end;
 

Code to throw and and handle an exception:
    local s;
    for i := 1 to kIterations do
        try
            call thrower with (nil);
        onexception |evt.ex.msg;my.exception| do
            s := CurrentException().data.message;


Code to check the return value and handle an error:
    local result;
    local s;
    for i := 1 to kIterations do
        if (result := call returner with (nil)) < 0 then
            s := ErrorMessageTable[-result];



Running the above loops 1000 times took about 45 ticks for the exception loop, and about 15 ticks for the check the return value loop. From this you might conclude that exception handling is a waste of time. However, you can often write better code if you use exceptions. A large part of the time spent in the loop is setting up the exception handler. Since we commonly want to stop processing when exceptions occur, we can rewrite the function to set up the exception handler once, like this:

local s;
try
    for i := 1 to kIterations do
        call thrower with (nil);
    onexception |evt.ex.msg;my.exception| do
        s := CurrentException().data.message;


This code takes only 11 ticks for 1000 iterations, an improvement over the return value case, where we'd have to check the result after each call to the function and stop the loop if an error occurred.

Running the same loops, but passing TRUE instead of NIL so the "error" occurs every time was interesting. The return value loop takes about 60 ticks, mostly due to the time needed to look up the error message. The exception loop takes a whopping 850 ticks, mostly because of the overhead in the CurrentException() call.

With exceptions, you can handle the error at any level up the call chain, without having to worry about each function checking for and returning error results for every sub-function it uses. This will produce code that performs much better, and will be easier to maintain as well.

With exceptions, you do not have to worry about the return value for successful function completion. It is occasionally very difficult to write functions that both have a return value and generate an error code. The C/C++ solution is to pass a pointer to a variable that is modified with what should otherwise be the return value of the function, which is a technique best avoided.

As in the above example, you can attach data to exceptions, so there's no need to maintain an error code to string (or whatever) mapping table, which is another boon to maintainability. (You can still use string constants and so on to aid localization efforts. Just put the constant in the throw call.)

Finally, every time an exception occurs you have an opportunity to intercept it with the NTK inspector. This is also a boon to debugging, because you know something about what's going wrong, and you can set the breakOnThrows global to stop your code and look at why there's a problem. With result codes you have a tougher time setting break points. With a good debugger it could be argued that you can set conditional break points on the "check the return value" code, but even when you do this you'll have lost the stack frame of the function that actually had the problem. With exceptions and breakOnThrows, all the local context at the time the exception occurred is still available for you to look at, which is an immense aid.

Conclusion: Use exceptions. The only good reason not to would be if your error handler is very local and if you expect it to be used a lot, and if that's true you should consider rewriting the function.


NewtonScript Object Sizes (6/30/94)

These desciptions document current OS formats only, we reserve the right to extend or change the implementation in future releases.

Generic
NewtonScript objects are objects that reside either in the read-write NewtonScript memory, in pseudo-ROM memory, inside the package, or in ROM. In earlier MessagePad platforms, these objects are aligned to 8-byte boundaries. In Newton 2.0 OS, objects in the NewtonScript memory are aligned to 4-byte boundaries. Inside Newton 2.0 packages, you can optionally align objects to 4-byte boundaries (with NTK's "tighter object packing" checkbox). Alignment causes a very small amount of memory to be wasted, usually less than 2%.


The Newton Object System has four built-in primitive classes that describe an object's basic type: immediates, binary objects, arrays, and frames. The NewtonScript function PrimClassOf will return an object's primitive type.

Immediates and Magic Pointers
Immediates (integers, characters, TRUE and NIL) and magic pointers are stored in a 4-byte structure containing up to 30 bits of data and 2 bits of primitive class identification.

Referenced Objects
Binaries, arrays and frames are stored as larger separate objects and managed through references. A reference is a four- byte object. The binary objects, frames, or arrays themselves are stored separately as objects containing a so-called Object Header.

Object Header
Every referenced object has a 12-byte header that contains information concerning size, flags, class, lock count and so on. This information is implementation-specific.

Symbols
A symbol is a binary object that contains a four-byte hash value and a name, which is a null-terminated ASCII string. Each symbol uses 12 (header) + 4 (hash value) + length of name + 1 (null terminator) bytes.

Binary Objects
A binary object contains a 12- byte header plus space for the actual data (allocated in 8 -byte chunks.)

Strings
Strings are binary objects of class (or a subclass of) String. A string object contains a 12-byte header plus the Unicode strings plus a null termination character. Note that Unicode characters are two-byte values. Here's an example:
    "Hello World!"

This string contains 12 characters, in other words it has 24 bytes. In addition we have a null termination character (24 + 2 bytes) and an object header (24 + 2 + 12 bytes), all in all the object is 38 bytes big. Note that we have not taken into account any possible savings if the string was compressed (using the NTK compression flags).

Rich Strings
Rich strings extend the string object class by embedding ink information within the object. Within the unicode, a special character kInkChar is used to mark the position of an ink word. The ink data is stored after the null termination character. Ink size varies depending on stroke complexity.

Array Objects
Array objects have an object header (12 bytes) and additional four bytes per element which hold either the immediate value or a reference to a referenced object. To calculate the total space used by an array, you need to take into account the memory used by any referenced objects in the array.

Here's an example:
    [12, $a, "Hello World!", "foo"]

We have a header (12 bytes) plus four bytes per element (12 + (4 * 4) bytes). The integer and character are immediates, so no additional space is used, but we have 2 string objects that we refer to, so the total is (12 + (4*4) + 38 + 20 bytes) 86 bytes. We have not taken into account savings concerning compression. Note that the string objects could be referred by other arrays and frames as well, so the 38 and 20 byte structures are stored only once per package.

Frame Objects
We have two kinds of frames: frames that don't have a shared map object; and frames that do have a shared map object. We take the simple case first (no shared map object).

The frame is maintained as two array-like objects. One, called the frame map, contains the slot names, and the other contains the actual slot values. A frame map has one entry per symbol, plus one additional 4 -byte value.

The frame map uses a minimum of 16 bytes. If we add the frame's object header to this, the minimal size of a frame is 28 bytes. Each slot adds 8 bytes to the storage used by the frame (two array entries.) Here's an example:
    {Slot1: 42, Slot2: "hello"}

We have a header of 28 bytes, and in addition we have two slots, for a total of (28 + (2 * 8)) 48 bytes. This does not take into account the space used for each of the slot name symbols or for the string object. (The integer is an immediate, and so is stored in the array.)

Multiple similar frames (having the same slots) could share a frame map. This will save space, reducing the space used per frame (for many frames all sharing the same map) to the same as used for an array with the same number of slots. (If just a few frames share the frame map, we need to take into account the amortized map size that the frames share. So the total space for N frames sharing a map is N*28 bytes of header per frame, plus the size of the frame map, plus the size of the values for the N frames.

Here's an example of a frame that could share a map with the previous example:
    {Slot1: 56, Slot2: "world"}

We have a header of 12 bytes. In addition, we have two slots (2 * 4), and additional 16 bytes for the size of a map with no slots „ all in all, 36 bytes. We should also take into account the shared map, which is 16 bytes, plus the space for the two symbols.

When do frames share maps?

1. When a frame is cloned, both the copy and the original frame will share the map of the original frame. A trick to make use of this is to create a common template frame, and clone this template when duplicate frames are needed.

2. Two frames created from the same frame constructor (that is, the same line of NewtonScript code) will share a frame map. This is a reason to use RelBounds to create the viewBounds frame, and it means there will be a single viewBounds frame map in the part produced.

Note: These figures are for objects in their run-time state, ready for fast access. Objects in transit or in storage (packages) are compressed into smaller stream formats. Different formats are used (and different sizes apply) to objects stored in soups and to objects being streamed over a communications protocol.


Symbols vs Path Expressions and Equality (7/11/94)

Q: While trying to write code that tests for the existance of an index, I tried the following, which did not work. How can I compare path expressions?
if value.path = '|name.first| then ...    // WRONG

A: There are several concerns. '|name.first| is not a path expression, it is a symbol with an escaped period. A proper path expression is either 'name.first or [pathExpr: 'name, 'first]. The vertical bars escape everything between them to be a single NewtonScript symbol.

The test value.path = 'name.first will always fail, because path expressions are deep objects (essentially arrays) the equal comparison will compare references rather than contents. You will have to write your own code to deeply compare path expressions.

This code is further complicated by the fact that symbols are allowed in place of path expressions that contain only one element, but the two syntaxes produce different NewtonScript objects with different meanings. That is, 'name = [pathExpr: 'name] will always fail, as the objects are different.

A general test is probably unnecessary in most circumstances, since you will be able to make assumptions about what you are looking for. For example, here is some code that will check if a given path value from a soup index is equivalent to 'name.first:

if ClassOf(value.path) = 'pathExpr and Length(value.path) = 2
      and value.path[0] = 'name and value.path[1] = 'first then ...


Function Size and "Closed Over" Environment (7/18/94)

Q: I want to create several frames (for soup entries) that all share a single function, but when I try to store one of these frames to a soup, I run out of memory. Can several frames share a function and still be written to a soup? My code looks like this:
    ...
    local myFunc := func(...) ...;
    local futureSoupEntries := Array(10, nil);
    for i := 0 to 9 do
        futureSoupEntries[i] := {
            someSlots: ...,
            aFunction: myFunc,
        };
    ...

A: When a function is defined within another function, the lexically enclosing scope (locals and paramaters) and message context (self) are "closed over" into the function body. When NewtonScript searches for a variable to match a symbol in a function, it first searches the local scope, then any lexically enclosing scopes, then the message context (self), then the _proto and _parent chains from the message context, then finally the global variables.

Functions constructed within another function, as in your example, will have this enclosing lexical scope, which is the locals and parameters of the function currently being executed, plus the message context (self) when the function is created. Depending on the size of this function and how it's constructed, this could be very large. (Self might be the application's base view, for example.)

A TotalClone is made during the process of adding an entry to a soup, and this includes the function body, lexical scopes, and message context bound up within any functions in the frame. All this can take up a lot of space.

If you create the function at compile time (perhaps with DefConst('kMyFunc, func(...) ...)) it will not have the lexically enclosing scope, and the message context at compile time is defined to be an empty frame, and so cloning such a function will take less space. You can use the constant kMyFunc within the initializer for the frame, and each frame will still reference the same function body. (Additionally, the symbol kMyFunc will not be included in the package, since it is only needed at compile time.)

If the soup entries are only useful when your package is installed, you might consider instead replacing the function body with a symbol when you write the entry to the soup. When the entry is read from the soup, replace the symbol with the function itself, or use a _proto based scheme instead. Each soup entry will necessarily contain a complete copy of the function, but if you can guarantee that the function body will always be available within your application's package, it might be unnecessarily redundant to store a copy with each soup entry.


TrueSize Incorrect for Soup Entries (2/6/96)

Q: When I use TrueSize to get the size of a soup entry I get results like 24K or even 40K for the size. That can't be right. What's going on?

A: TrueSize "knows" about the underlying implementation of soup entries. A soup entry is really a special object (a fault block) that contains information about how to get an entry and can contain a cached entry frame. In the information about how to get an entry, there is a reference to the soup, and various caches in a soup contain references to the cursors, the store, and other (large) NewtonScript objects. TrueSize is reporting the space taken up by all of these objects. (Note: calling TrueSize on a soup entry will force the entry to be faulted in, even if it was not previously taking up space in the NewtonScript heap.)

The result is that TrueSize is not very useful when trying to find out how much space the cached frame for an entry is using. A good way to find the space used for a cached entry frame is to call gc(); stats(); record the result, then call EntryUndoChanges(entry); gc(); stats(). The difference between the two free space reports will be the space used by the cached frame for a given entry.

EntryUndoChanges(entry) will cause any cached frame to be removed and the entry to return to the unfaulted state. Gc() then collects the space previouly used by the cached entry frame.

If you want the TrueSize breakdown of the types of objects used, you can Clone the entry and call TrueSize on the copy. This works because the copy is not a fault block, and so it does not reference the soups/cursors/stores.


Floating Point Numbers Are Approximations (3/28/97)

Q: The functions Floor and Ceiling seem broken. For instance, Floor(12.2900 * 10000) returns 122899, not 122900. What's going on?

A: This is not a bug in Floor or Ceiling. This happens because of the way floating point numbers are stored, and the limitation is common to many real number representations. In the same way that 1/3 cannot accurately be represented in a finite number of digits in base 10 (it is .3333333333...), likewise 1/10 cannot be exactly represented as a fractional part in base 2. Because number printers typically round to a small number of significant digits, you don't normally notice this. The NTK inspector, for example, displays only 5 significant figures in floating point numbers. However, if you display the number with enough precision, you'll see the representation error, where the real is actually slightly larger or smaller than the intended value.
FormattedNumberStr(0.1, "%.18f") -> "0.100000000000000010"
    FormattedNumberStr(0.3, "%.18f")  ->  "0.299999999999999990"


The functions Floor and Ceiling are strict, and do not attempt to take this error into account. In the example, 12.29 is actually 12.2899999999999990, which multiplied by 10000 is 122,899.999999999990. The largest integer less than this number (Floor) is correctly 122899.

There are usually ways to work around this problem, depending on what you are trying to accomplish. To convert a floating point number to an integer, use RIntToL, which rounds to the nearest integer avoiding the problems caused with round-off error and Floor or Ceiling. RIntToL(x) produces the same result that Floor(Round(x)) would produce.
    RIntToL(12.29*10000)  ->  122900


If you need to format a number for display, use a formatting function such as FormattedNumberStr. These functions typically round to the nearest displayable value. To display 2 decimal digits, use "%.2f":
    FormattedNumberStr(12.29, "%.2f")  ->  "12.29"


If you're working with fixed point numbers such as dollar amounts, consider using integers instead of reals. By representing the value in pennies (or mils, or whatever) you can avoid the imprecision of reals. For example, represent $29.95 as the integer 2995 or 29950, then divide by 100 or 1000 to display the number. If you do this, keep in mind that there is a maximum representable integer value, 0x1FFFFFFF or 536870911, which is sufficient to track over 5 million dollars as pennies, but can't go much over that.

If you really need to find the greatest integer less than a certain number and can't tolerate how Floor deals with round off errors, you'll need to do some extra work keeping track of the precision of the number and the magnitude of the round off error. It's worthwhile to read a good numeric methods reference. Floating point numbers in NewtonScript are represented by IEEE 64-bit reals, which are accurate to around 15 decimal digits. The function NextAfterD provides a handy way to see how 'close together' floating point numbers are.
        FormattedNumberStr(NextAfterD(0.3, kInfinity), "%.18f");
                                                    ->  "0.300000000000000040"


Real Numbers in NewtonScript (3/28/97)

Q: How are real numbers represented as floating point in NewtonScript? How accurate are they? What about infinities, NANs, and other exceptions?

A: Real numbers in NewtonScript are represented as IEEE 64-bit floating point numbers, which are accurate to about 15 decimal digits. You can read more about the IEEE floating point numbers in "Inside Macintosh: PowerPC Numerics" available online at the URL:
        http://gemma.apple.com/dev/techsupport/insidemac/PPCNumerics/PPCNumerics-2.html

The Newton floating point environment is not as rich in features as the PowerPC environment, and the PowerPC numerics document is only mentioned as a useful resource for understanding floating point issues. It in no way documents API or features of the Newton floating point environment.

Briefly, numbers are represented by 1 bit of sign ("on" is negative), 11 bits of exponent, and 52 bits of fractional part. The exponent bits are stored in excess 0x3FF, that is, 0x3FF is the representation for 0, values greater than 0x3FF are positive exponents, and values less than 0x3FF are negative exponents. The 52 bits of fractional part actually provide 53 bits of accuracy, because the initial 1 bit is dropped.

For example, suppose that we want to convert 9 97/128 into IEEE 64 bit format:
1) convert to base 2
1001.1100001
2) shift number to the form of 1.yyyyyy * 2^Z
1.0011100001 * 2^3
3) add 0x3FF (excess 0x3FF) to exponent field, convert to binary.
3+0x3FF = 0x402 = 100 0000 0010
4) now put the numbers together, using only the fractional part of the number represented above, in the form of yyyyyy
0 10000000010 0011100001000000000000000000000000000000000000000000
in hex representation, this is 0x4023840000000000
5) Just to verify, try it: StrHexDump(9+97/128, 16) -> "4023840000000000"

The IEEE standard also allows for non-normal numbers. Here are the exceptions:
infinity e = 7FF, f = 0 (+ or - depending on sign bit)
NaN e = 7FF, f <> 0 (also overflow, error, etc.)
zero e = 0, f = 0 (+ or -, depending on sign bit)
subnormal e = 0, f <> 0 (these are less precise numbers, smaller than the smallest normal number)

Note that there is more than one not a number value. In fact, there are quite a large number. The IEEE spec assigns meaning to various NaN values, as well as defining signalling and quiet NaNs. NewtonScript does not distinguish between NaN values. One NaN is as good as another.

In NewtonScript, real numbers are 8-byte binary objects of class 'real. In addition to the NewtonScript floating point literal syntax, you can use the compile time function MakeBinaryFromHex to construct real numbers, and you must use this style for custom NaN values. The most recent platform files for Newton 2.0 and Newton 2.1 provide constants for negative zero (kNegativeZero), positive and negative infinity (kInfinity, kNegativeInfinity), and a canonical NaN (kNaN).
MakeBinaryFromHex("4023840000000000", 'real) -> 9.7578125 // = 9+97/128