{\rtf1\mac\deff2 {\fonttbl{\f0\fswiss Chicago;}{\f2\froman New York;}{\f3\fswiss Geneva;}{\f4\fmodern Monaco;}{\f13\fnil Zapf Dingbats;}{\f14\fnil Bookman;}{\f15\fnil N Helvetica Narrow;}{\f16\fnil Palatino;}{\f18\fnil Zapf Chancery;}{\f20\froman Times;} {\f21\fswiss Helvetica;}{\f22\fmodern Courier;}{\f23\ftech Symbol;}{\f33\fnil Avant Garde;}{\f34\fnil New Century Schlbk;}{\f55\fnil Code 3 of 9;}{\f256\fnil TaxType;}{\f257\fnil TaxTypeCondensed;}{\f258\fnil TaxType Mono;}{\f259\fnil TaxType Pi;} {\f1904\fnil AppleIcon;}{\f2029\fnil Nadianne;}{\f2052\fnil Zeal;}{\f12899\fnil AppleGaramond LtIt;}{\f12900\fnil AppleGaramond BkIt;}{\f12901\fnil AppleGaramond BdIt;}{\f12902\fnil AppleGaramond Lt;}{\f12903\fnil AppleGaramond Bk;} {\f12904\fnil AppleGaramond Bd;}{\f14003\fnil Newt Espy Plain;}{\f14004\fnil Newt Espy Bold;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255\b lue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\ red255\green255\blue0; \red255\green255\blue255;}{\stylesheet{\s243\qj\fi180\sa240\keep\tqc\tx43 20\tqr\tx8640 \f20 \sbasedon0\snext243 footer;}{\s254\qj\sb120\sa240\keep \b\f20 \sbasedon0\snext0 heading 2;}{\s255\qj\sb240\sa240\keep \f20\fs36 \sbasedon0\snext0 heading 1;}{ \qj\fi180\sa240\keep \f20 \sbasedon222\snext0 Normal;}{\s2\fi-360\li720\sa60\keep \f20 \sbasedon0\snext2 BulletList;}{\s3\li360\sa240\keep \f22 \sbasedon0\snext0 code;}{\s4\qc\fi180\sa240\keep \f20\fs48 \sbasedon0\snext4 Title;}{\s5\qc\fi180\sa240\keep \f20\fs28 \sbasedon0\snext5 Byline;}{\s6\qc\fi180\sa240\keep \f20 \sbasedon0\snext0 Caption;}{\s7\qj\sb240\sa240\keep \f22\fs36 \sbasedon255\snext7 CodeRef;}}\margl720\margr720\margt1080\margb720\deftab360\widowctrl\ftnbj \sectd \sbknone\linemod0\linex0\cols1\endnhere \pard\plain \s4\qc\fi180\sa240\keep \f20\fs48 Lost In Space\par \pard\plain \qc\fi180\sa240\keep \f20 {\fs36 DRAFT 2\par }\pard\plain \s5\qr\fi180\sa240\keep \f20\fs28 Michael S. Engber\line Apple Computer - Newton ToolBox Group\line Copyright \'a9 1994 - Michael S. Engber\par \pard\plain \fi180\sa240\keep \f20 This article was (will be) published in the May 1994 issue of PIE Developers magazine. For information about PIE Developers, contact Creative Digital Systems at CDS.SEM@APPLELINK.APPLE.COM or 415.621.4252.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Introduction\par \pard\plain \qj\fi180\sa240\keep \f20 This article discusses different data structures for storing data in your package. The data structures are compared with respect to both space and speed. The purpose of this article is not to give the definitive best way to store your data. It is to show y ou some possibilities and give you a starting point for your own experiments. I assume the reader has some experience using NTK to write Newton applications.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Data In Your Package\par \pard\plain \qj\fi180\sa240\keep \f20 Data your application gathers from the user must be stored in soups. That pretty much determines the data structure you need to use for dynamic data.\par \pard \qj\fi180\sa240\keep This article primarily concerns itself with data stored in packages. This means the data is read-only since packages are stored in protected memory. There are a variety of reasons you might store data in your package (as opposed to using soups). Some examples are listed below.\par \pard\plain \s2\fi-360\li720\sa60\keep \f20 \bullet \tab The data might be static. For example, a table of state names and their abbreviations.\par \pard \s2\fi-360\li720\sa60\keep \bullet \tab Although the user might enter data dynamically, there might be a large initial set of data you need to include in your package.\par \pard \s2\fi-360\li720\sa240\keep \bullet \tab Your application might have its data split up in separate package. For example, you might have a travel guide application that keeps the data for each state in its own package. This allows the user to load only the data of interest.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 The Raw Data\par \pard\plain \qj\fi180\sa240\keep \f20 The data that forms the basis for this article is defined in the file {\f22 "Data.f"}. It's defined as an array of 300 frames as shown below. For the purposes of our tests we use {\f22 slot2} as the key for indexing, although {\f22 slot1} could work as well.\par \pard\plain \s3\li360\sa240\keep \f22 gData := [\line \{slot1: 000, slot2: "foo 000", slot3: 42, slot4: true\},\line \{slot1: 001, slot2: "foo 001", slot3: 42, slot4: true\},\line \{slot1: 002, slot2: "foo 002", slot3: 42, slot4: true\},\line \'c9\line \{ slot1: 297, slot2: "foo 297", slot3: 242, slot4: true\},\line \{slot1: 298, slot2: "foo 298", slot3: 242, slot4: true\},\line \{slot1: 299, slot2: "foo 299", slot3: 242, slot4: true\},\line ];\par \pard\plain \qj\fi180\sa240\keep \f20 The projects discussed in this article load {\f22 "Data.f"} and then use the {\f22 gData} array to create the actual data structures of interest.\par \pard \qj\fi180\sa240\keep {\f22 "Data.f"} should be easily adapted to represent the data your application.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 The Code\par \pard\plain \qj\fi180\sa240\keep \f20 This article is based on code in the "LostInSpace" folder. You should obtain a copy of this code so you can run your own experiments. The rest of this section gives a quick overview of the code.\par \pard\plain \s254\qj\sb120\sa60\keep \b\f20 SpaceWars\par \pard\plain \qj\fi180\sa60\keep \f20 This project builds the same package using one of several data structures. The difference in package size between these builds is used to compare the data structures for size.\par \pard\plain \s254\qj\sb60\sa60\keep \b\f20 TimeWars\par \pard\plain \qj\fi180\sa60\keep \f20 This project builds a package containing all the data structures. The application has various buttons you can use to run the benchmarks. The results are output using Print() so you need to run them with the NTK Inspector connected.\par \pard\plain \s254\qj\sb60\sa60\keep \b\f20 StorePart\par \pard\plain \qj\fi180\sa60\keep \f20 This folder contains a package with a store part. When you load it, a new store is made available on Newton. This store contains a soup with the data set used in my experiments.\par \pard\plain \s254\qj\sb60\sa60\keep \b\f20 Data.f\par \pard\plain \qj\fi180\sa60\keep \f20 This file defines the basic data set used in our experiments.\par \pard\plain \s254\qj\sb60\sa60\keep \b\f20 DataUtils.f\par \pard\plain \qj\fi180\sa60\keep \f20 This file defines routines for creating various data structures.\par \pard\plain \s254\qj\sb60\sa60\keep \b\f20 SearchUtils.f\par \pard\plain \qj\fi180\sa240\keep \f20 This file defines routines for searching the various data structures using binary search.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 The Data Structures\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Array of Frames - Naive\par \pard\plain \qj\fi180\sa240\keep \f20 This data structure uses {\f22 gData} with no further processing. It is included for purposes of illustrating a pitfall of NewtonScript programming, not as a serious candidate for representing your data.\par \pard \qj\fi180\sa240\keep Associated with each frame is another data structure called a map. The map is used locate the value of a slot given the slot's name. Frames with common structures can share maps, saving space. However, the way {\f22 gData} is defined in {\f22 "Data.f"} does not use any map sharing. Even though the frames have identical structure, they each have their own map.\par \pard \qj\fi180\sa240\keep In general, each frame constructor in your code has its own map. It would be possible for the compiler to detect frames with identical maps and cause them to share maps. However, this optimization is not currently performed by NTK.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Array of Frames - Map Sharing\par \pard\plain \qj\fi180\sa240\keep \f20 This data structure uses the function MakeDataFrame to generate an array, just like {\f22 gData} , except that all the frames share a common map. The easiest way to ensure your frame share maps is to make sure they're all generated by the same frame constructor. Usually this is done by using a function to create all your frames. For example:\par \pard\plain \s3\li360\sa240\keep \f22 func MakeDataFrame(s1,s2,s3,s4) \{slot1: s1, slot2: s2, slot3: s3, slot4: s4\};\par \pard\plain \qj\fi180\sa240\keep \f20 You may recall the standard functions {\f22 SetBounds} and {\f22 RelBounds} which return bounds frames. You might have thought them pointless since it's almost as easy to write:\par \pard\plain \s3\li360\sa240\keep \f22 local bounds := \{left: 0, top: 0, right: 100, bottom: 100\};\par \pard\plain \qj\fi180\sa240\keep \f20 as to write:\par \pard\plain \s3\li360\sa240\keep \f22 local bounds := SetBounds(0,0,100,100);\par \pard\plain \qj\fi180\sa240\keep \f20 Now you know that using {\f22 SetBounds} creates bounds frames that map share \endash a good reason to use this function. You should write similar functions for your own types of frames. In fact the frames returned by the run-time version of {\f22 SetBounds} use a shared map that resides in ROM.\par \pard \qj\fi180\sa240\keep An alternative way to create frames with a shared map is to make them from a Clone of a common frame. Consider the code below.\par \pard\plain \s3\li360\sa240\keep \f22 constant kDummy := '\{s1: nil, s2: nil\};\line \'c9\line local frame1 := Clone(kDummy );\line frame1.s1 := 42;\line frame1.s2 := true;\line \line local frame2 := Clone(kDummy );\line frame2.s1 := 43;\par \pard\plain \qj\fi180\sa240\keep \f20 Both {\f22 frame1} and {\f22 frame2} will share a common map. I point out this second technique only for completness. Writing functions to build your frames is a better approach. It makes for better legible and maintainabiliy. In addition, using a frame construc tor to create your frame is generally faster than calling Clone followed by multiple assignment statments.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Array of Arrays\par \pard\plain \qj\fi180\sa240\keep \f20 This data structure uses an arrays rather than frames to represent the elements of {\f22 gData}. For example, the following element of {\f22 gData}: \par \pard\plain \s3\li360\sa240\keep \f22 \{slot1: 002, slot2: "foo 002", slot3: 42, slot4: true\}\par \pard\plain \qj\fi180\sa240\keep \f20 is represented as:\par \pard\plain \s3\li360\sa240\keep \f22 [002, "foo 002", 42, true]\par \pard\plain \qj\fi180\sa240\keep \f20 Using frames makes for a more legible data structure. We want to measure the price we pay for this legibility.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Frame of Frames\par \pard\plain \qj\fi180\sa240\keep \f20 This data structure replaces the {\f22 gData} array with a frame containing a slot{\f22 }for each element of {\f22 gData}. The {\f22 slot2} values are used as the slot names. For example, consider the following portion of {\f22 gData}.\par \pard\plain \s3\li360\sa240\keep \f22 [\line \'c9\line \{slot1: 297, slot2: "foo 297", slot3: 242, slot4: true\},\line \{slot1: 298, slot2: "foo 298", slot3: 242, slot4: true\},\line \'c9\line ]\par \pard\plain \qj\fi180\sa240\keep \f20 The corresponding part of the frame-of-frames data structure is:\par \pard\plain \s3\li360\sa240\keep \f22 \{\line \'c9\line |foo 297|: \{slot1: 297, slot3: 242, slot4: true\},\line |foo 298|: \{slot1: 298, slot3: 242, slot4: true\},\line \'c9\line \}\par \pard\plain \qj\fi180\sa240\keep \f20 Using a frame instead of as an array lets us utilize of NewtonScript's slot lookup operation to retrieve our data. To retrieve the data for the {\f22 "foo 297"} entry we can use code like:\par \pard\plain \s3\li360\sa240\keep \f22 frameOfFrames.(Intern("foo 297"))\par \pard\plain \qj\fi180\sa240\keep \f20 Notice that we use the Intern function to turn the string key value into a symbol. Slot names are symbols, not strings. It is important to note that characters in symbols (and therefore, slot names) are restricted to 7-bit ASCII values. This means if you w ish your slot names to contain special characters you will need to transform them to 7-bit ASCII in some way. It should not be difficult to come up with a suitable encoding scheme.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Binary Objects\par \pard\plain \qj\fi180\sa240\keep \f20 This data structure encodes the entire {\f22 gData} array as a single binary object. Each element of {\f22 gData} takes up 15 bytes of the object as follows:\par \pard \qc\fi180\sa240\keep {{\pict\macpict\picw357\pich95 055a009b003000fa0195001102ff0c00ffffffff00300000009b00000195000000fa00000 000000000a0008200a10064000a53504e5403e80001000000a10064000e53504e540cd000 9b003000fa019500a10064000a53504e540bb8000b000000a0008c00a10064000a53504e5 40bb8000400020001000a009b003000fa01 95003000d6003700e9016300a10064000a53504e540bb800020002002200d60088001200a 10064000a53504e540bb800020002002200d6014f001200a10064000a53504e540bb80002 0002002200d60127001200a10064000a53504e540bb8000c000000a0008d00a10064000a5 3504e540bb80001000000a10064001a5350 4e540c2600e6003000fa004400040002ffffffffffffffff00a10064000a53504e540c940 000000000a10096000c050000000200000000000000002c000a001607436f757269657200 030016000d000a002e000400000000002b35f3013000a0009700a10064000a53504e540bb 80001000000a10064001a53504e540c2600 e6008100fa009500040002ffffffffffffffff00a10064000a53504e540c940000000000a 10096000c05000000020000000000000000295101340000a0009700a10064000a53504e54 0bb80001000000a10064001a53504e540c2600e6011d00fa013300040002fffffffffffff fff00a10064000a53504e540c9400000000 00a10096000c05000000020000000000000000299c02313200a0009700a10064000a53504 e540bb80001000000a10064001a53504e540c2600e6014500fa015b00040002ffffffffff ffffff00a10064000a53504e540c940000000000a10096000c05000000020000000000000 000292802313400a0009700a10064000a53 504e540bb80001000000a10064001a53504e540c2600bc004700da007c00040002fffffff fffffffff00a10064000a53504e540c940000000000a10096000c06000000020000000000 0000002c000c00150948656c76657469636100030015000d0009002800c8005806736c6f7 4310d00002800d3004c0b34206279746520 6c6f6e6700a0009700a10064000a53504e540bb80001000000a10064001a53504e540c260 0bc00c300da00fe00040002ffffffffffffffff00a10064000a53504e540c940000000000 a10096000c060000000200000000000000002800c800d706736c6f74320d00002800d300c 80d38206279746520737472696e6700a000 9700a10064002253504e540cee00020010005600000026000000560000000000000000000 00000000000a10064000a53504e540bb80003000200a10064000e53504e540c9e00b7012e 00d6013b00a0008c002200b7012e08140071001e00c9013100d6013b00cb013600cd01310 0d6013b00c9013b00cb013600a0008d00a1 0064000a53504e540bb80003000200a10064000e53504e540c9e00b9016500d6015900a00 08c002200b90165f9120071001e00c9015900d6016300cb015e00c9015900d6015900cd01 6300cb015e00a0008d00a10064000853504e540cee000000a10064000a53504e540bb8000 1000000a10064001a53504e540c26009b01 0d00b9014500040002ffffffffffffffff00a10064000a53504e540c940000000000a1009 6000c060000000200000000000000002800a7011f06736c6f74330d00002800b201120b32 206279746520776f726400a0009700a10064000a53504e540bb80001000000a10064001a5 3504e540c26009b015000b9019500040002 ffffffffffffffff00a10064000a53504e540c940000000000a10096000c0600000002000 00000000000002800a7016906736c6f74340d00002800b201550e31206279746520626f6f 6c65616e0000a0009700a10064000653504e5403e900a0008300ff}}\par \pard\plain \s6\qc\fi180\sa240\keep \f20 Figure 1 - Layout of Binary Object Data Structure\par \pard\plain \qj\fi180\sa240\keep \f20 The binary object is accessed using the various NewtonScript StuffXXX and ExtractXXX functions.\par An alternative way to use binary data from NTK is to import resources using GetNamedResource. GetNamedRe source returns a binary object made from the specified resource. If you already have a Macintosh application (or ResEdit TMPL) that will generate your data as resources you can use the resources directly if that's more convenient.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Soups\par \pard\plain \qj\fi180\sa240\keep \f20 One of my experiments creates a soup containing an entry for each element of {\f22 gData} . Creating a soup in the user store does not qualify as method for including data in your package. However, it is interesting to see how soups compare to the other data structures.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Store Parts\par \pard\plain \qj\fi180\sa240\keep \f20 Store parts allow you to include a store as part of your package. The store part I made contains a soup with an entry for each element of {\f22 gData} . Soups in store parts do qualify as a method for including data in your package. However, NTK does not currently support the creation of store parts. You will not be able to perform this part of the experiment with your own data set. The store part was cr eated with experimental tools.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Space Results\par \pard\plain \qj\fi180\sa240\keep \f20 The space results were obtained by repeatedly compiling the SpaceWar s project using different data structures. There is a constant defined in "Project Data" which can be adjusted to determine which data structure is used.\par \pard\plain \s3\li360\sa240\keep \f22 constant kDataStructure := 'ArrayOfFramesCommonMap;\par \pard\plain \qj\fi180\sa240\keep \f20 The size of the {\f22 "SpaceWars.pkg"} file on the Macintosh is the uncompressed size of the package. The compressed sized was determined by using the {\f22 UsedSize} message on the store before and after loading the package. The {\f22 UsedSize} message returns an approximate value. Don't be alarmed if you find discrepancies in compressed size in your own experiments \endash even across different downloads of the same package.\par The soup experiment does not involve data that is part of the package so there is no entry in the uncompressed column. If you build using one of the array-of-frames options and open the SpaceWars application in the extras drawer you'll see a "Make Soup" bu tton. Pressing this button creates a soup and adds an entry for each element of {\f22 gData}. The value in the table for the soup corresponds to the difference in the store size before and after the soup is created.\par The store part experiment simply uses a pre-built package containing a store part with a soup in. The values in the table correspond to the uncompressed size of this package and the difference in {\f22 UsedSized} before and after loading the package. \par \trowd \trgaph80\trleft2080 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrr\brdrhair \clshdng0\cellx5400\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrdot \clbrdrr\brdrhair \clshdng0\cellx9000\pard \qj\sb40\sa40\keep\intbl \cell \pard \qc\sb40\sa40\keep\intbl Package Size (bytes)\cell \pard \intbl \row \trowd \trgaph80\trleft2080 \clbrdrl\brdrhair \clbrdrb\brdrdb \clbrdrr\brdrhair \clshdng0\cellx5400\clbrdrl\brdrhair \clbrdrb\brdrdb \clbrdrr\brdrhair \clshdng0\cellx7200\clbrdrl \brdrhair \clbrdrb\brdrdb \clbrdrr\brdrhair \clshdng0\cellx9000\pard \qc\sb40\sa40\keep\intbl Data Structure\cell uncompressed\cell compressed\cell \pard \intbl \row \trowd \trgaph80\trleft2080 \clbrdrt\brdrdb \clbrdrl\brdrs \clbrdrb\brdrhair \clbrdrr \brdrhair \clshdng0\cellx5400\clbrdrt\brdrdb \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx7200\clbrdrt\brdrdb \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrs \clshdng0\cellx9000\pard \qj\sb40\sa40\keep\intbl array of frames (naive)\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 35538\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 11884\tab \cell \pard \intbl \row \trowd \trgaph80\trleft2080 \clbrdrt\brdrhair \clbrdrl\brdrs \clbrdrb\brdrhair \clbrdrr \brdrhair \clshdng0\cellx5400\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx7200\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrs \clshdng0\cellx9000\pard \qj\sb40\sa40\keep\intbl array of frames (map sharing)\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 25970\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 8932\tab \cell \pard \intbl \row \pard \qj\sb40\sa40\keep\intbl array of arrays\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 25866\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 8860\tab \cell \pard \intbl \row \pard \qj\sb40\sa40\keep\intbl frame of frames\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 22386\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 9176\tab \cell \pard \intbl \row \pard \qj\sb40\sa40\keep\intbl binary object\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 9962\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 5328\tab \cell \pard \intbl \row \pard \qj\sb40\sa40\keep\intbl soup\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl N/A\cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 37260\tab \cell \pard \intbl \row \trowd \trgaph80\trleft2080 \clbrdrt\brdrhair \clbrdrl\brdrs \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx5400\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx7200\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrs \clshdng0\cellx9000\pard \qj\sb40\sa40\keep\intbl store part\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 35012\tab \cell \pard \qc\sb40\sa40\keep\tqdec\tx1080\intbl 14900\tab \cell \pard \intbl \row \pard\plain \s6\qc\fi180\sa240\keep \f20 Table 1 - Results of Space Experiment\par \pard\plain \qj\fi180\sa240\keep \f20 We can see that binary-object wins for the smallest size. Some of you might have noticed that we represent our string data in the binary object as plain ASCII rather than unicode. Even so, this can only account for about 2.5K (= 300 entries x 8 bytes/entry) \endash and even less when compressed. There is no question that encoding your data into a binary object is the most compact representation.\par The results also show that map sharing makes a substantial difference in data size. This difference becomes even more pronounced as the number of slots in your frames increases.\par Another observation worth making is that using arrays instead of frames to represent the elements of {\f22 gData} did not yield much of a space savings \endash as long as we took advantage of map sharing.\par You might also have noticed that the soup in the store part compresses better than the regular soup. This is because store parts are read-only and are co mpressed just like any other part. Regular soups, on the other hand, are compressed on an entry-by-entry basis using a different compression scheme.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Time Results\par \pard\plain \qj\fi180\sa240\keep \f20 The benchmark used for timing data access was retrieving each of the elements of the data set using the {\f22 slot2} value as the key.\par Below is the basic benchmark code used for all the non-soup data structures (minus some error checking code).\par \pard\plain \s3\li360\sa240\keep \f22 {\fs20 func()\line begin\line local iTicks := Ticks();\line local key;\line foreach key in kBenchmarkKeys do\line begin\line <*** data structure dependent line ***>\line //error checking code omitted\line end;\line iTicks := Ticks() - iTicks;\line Print("ArrayOfFramesBenchmark:" && iTicks);\line end}\par \pard\plain \qj\fi180\sa240\keep \f20 The magic lines of code ({\f22\fs20 <*** data structure dependent line ***>}) for each of the data structures are shown below.\par \pard\plain \s2\fi-360\li720\sa60\keep \f20 {\f22\fs20 local result := call kBSearchArrayOfFramesFn with (kArrayOfFramesData,key);\par local result := call kBSearchArrayOfArraysFn with (kArrayOfArraysData,key);\par local result := kFrameOfFramesData.(Intern(key));\par }\pard \s2\fi-360\li720\sa240\keep {\f22\fs20 local result := call kBSearchBinaryObjFn with (kBinObjData,key);\par }\pard\plain \qj\fi180\sa240\keep \f20 The data is kept sorted by {\f22 slot2} allowing binary search to be used for the array-of-frames, array-of-arrays, and binary-object data structures. The code for the frame-of-frames binary search is shown below. It is a standard binary search algorithm.\par \pard\plain \s3\li360\sa240\keep \f22 {\fs20 func BSearchArrayOfFrames(array,key)\line begin\line local lPos := 0;\line local rPos := Length(array) - 1;\line local mPos;\line \line while lPos <= rPos do\line begin\line mPos := (lPos + rPos) DIV 2;\line mVal := array[mPos].slot2;\line if mVal > key then\line rPos := mPos - 1;\line else\line lPos := mPos + 1;\line end;\line \line if rPos>=0 AND StrEqual(array[rPos].slot2,key) then rPos;\line end;\par }\pard\plain \qj\fi180\sa240\keep \f20 The code for searching the array-of-array is very similar.\par \pard\plain \s3\li360\sa240\keep \f22 {\fs20 func BSearchArrayOfArrays(array,key)\line begin\line local lPos := 0;\line local rPos := Length(array) - 1;\line local mPos;\line \line while lPos <= rPos do\line begin\line mPos := (lPos + rPos) DIV 2;\line mVal := array[mPos][1];\line if mVal > key then\line rPos := mPos - 1;\line else\line lPos := mPos + 1;\line end;\line \line if rPos>=0 AND StrEqual(array[rPos][1],key) then rPos;\line end;\par }\pard\plain \qj\fi180\sa240\keep \f20 The code for searching the binary-object is shown below.\par \pard\plain \s3\li360\sa240\keep \f22 {\fs20 func BSearchBinaryObj(binObj,key)\line begin\line local lPos := 0;\line local rPos := (Length(binObj) DIV kDatumSize) - 1;\line local mPos;\line \line while lPos <= rPos do\line begin\line mPos := (lPos + rPos) DIV 2;\line mVal := ExtractCString(binObj,mPos*kDatumSize + kSlot2Offset);\line if mVal > key then\line rPos := mPos - 1;\line else\line lPos := mPos + 1;\line end;\line \line if rPos>=0 AND StrEqual(ExtractCString(binObj,rPos*kDatumSize + kSlot2Offset),key) then rPos*kDatumSize;\line end;\par }\pard\plain \qj\fi180\sa240\keep \f20 The frame-of-frames uses NewtonScript's built-in slot lookup to retrieve values rather than an explicit binary search. The current implementation of NewtonScript looks up slots in large frames using binary search on slot-name hash values.\par The soup benchmark code (minus some error checking) is shown below.\par \pard\plain \s3\li360\sa240\keep \f22 {\fs20 func(soup)\line begin\line local cursor := Query(soup,'\{type: index, indexPath: slot2\});\line local iTicks := Ticks();\line local key;\line foreach key in kBenchmarkKeys do\line begin \line local result := cursor:GotoKey(key).slot2;\line //error checking code omitted\line end;\line iTicks := Ticks() - iTicks;\line Print("SoupBenchmark:" && iTicks);\line end\line );}\par \pard\plain \qj\fi180\sa240\keep \f20 Table 2 summarizes the results from the benchmarks.\par \trowd \trgaph80\trleft3700 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrdb \clbrdrr\brdrhair \clshdng0\cellx5580\clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrdb \clbrdrr\brdrhair \clshdng0\cellx7380\pard \qc\sb40\sa40\keep\intbl Data Structure \cell Benchmark Time (ticks)\cell \pard \intbl \row \trowd \trgaph80\trleft3700 \clbrdrt\brdrdb \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx5580\clbrdrt\brdrdb \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0 \cellx7380\pard \sb40\sa40\keep\intbl array of frames\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 755\tab \cell \pard \intbl \row \trowd \trgaph80\trleft3700 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx5580 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx7380\pard \sb40\sa40\keep\intbl array of arrays\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 725\tab \cell \pard \intbl \row \pard \sb40\sa40\keep\intbl frame of frames\line 1st trial\line subsequent trials\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl \line 97\tab \line 60\tab \cell \pard \intbl \row \pard \sb40\sa40\keep\intbl binary object\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 1001\tab \cell \pard \intbl \row \pard \sb40\sa40\keep\intbl soup\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 420\tab \cell \pard \intbl \row \trowd \trgaph80\trleft3700 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx5580 \clbrdrt\brdrhair \clbrdrl\brdrhair \clbrdrb\brdrhair \clbrdrr\brdrhair \clshdng0\cellx7380\pard \sb40\sa40\keep\intbl store part\cell \pard \sb40\sa40\keep\tqdec\tx1080\intbl 343\tab \cell \pard \intbl \row \pard\plain \s6\qc\fi180\sa240\keep \f20 Tab le 2 - Results of Time Experiment\par \pard\plain \qj\fi180\sa240\keep \f20 We can see that frame-of-frames wins for the fastest lookup. Repeated lookups in the frame-of-frames get faster because Newton caches frame maps and because the the slot name symbols have already been created. {\f22 Intern} first checks for an existing symbol to return before creating a new one.\par Compression did not make any appreciable difference in the timing results. This is probably because the packages are small enough that they can fit entirely in working memory (w here they're paged-in and uncompressed). This is a big hint for those of you running your own experiments. Run your tests on a representative quantity of data \endash not just representative values.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Discussion of Results\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Array of Frames\par \pard\plain \qj\fi180\sa240\keep \f20 Clearly, you should be sure to take advantage of map sharing. The difference it can make in your package size can be substantial.\par An array-of-frames is a basic general purpose data structure. If speed of access is a concern you should consider the frame-of-frames as an alternative.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Array of Arrays\par \pard\plain \qj\fi180\sa240\keep \f20 The array-of-arrays has no real advantage over the array-of-frames. It has the disadvantage of being a less legible data structure. Unless your data is more naturally represented as an array, I don't see any reason to prefer an array-of-array over an array -of-frames.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Frame of Frames\par \pard\plain \qj\fi180\sa240\keep \f20 A frame-of-frames is a significantly faster alternative to an array-of-frames. The only disadvantage is that if your key values contain special characters you will have to deal with the complexity of encodin g them in 7-bit ASCII \endash not too big a disadvantage.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Binary Objects\par \pard\plain \qj\fi180\sa240\keep \f20 If space is of paramount importance, then you should definitely consider using binary-objects. They are definitely the most complex and least legible data structure we considered in this article. Although, writing a library of setter and getter methods sho uld help with this problem.\par Extracting your data from binary-objects entails overhead, especially in the case of strings. This is reflected in the slower lookup times. Remember that extract ing strings from a binary-object requires allocating a new object in the NewtonScript heap. If you encode arrays or frames within your binary objects, these too will require creating objects in the NewtonScript heap when they're extracted. Contrast this wi th the other data structures we've discussed. They allow you to reference their sub-parts directly (but, remember, they're still read-only).\par Note that the in ROM versions of some of the ExtractXXX and StuffXXX functions have problems. Especially, StuffPString, which will trash the NewtonScript heap. See the PIE DTS Q&A documents for more information.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Soups\par \pard\plain \qj\fi180\sa240\keep \f20 Regular soups are not an option for including data in your package. If you were considering providing data via soups (by writing code to build the soups) you can see that they're not the most space-efficient data structure to use.\par On the other hand, soups provide great flexibility. They can be modified. They can have multiple indexes. Indexes can be added and removed dynamically. Data access is faster than any of the other data structures except the frame-of-frames.\par \pard\plain \s254\qj\sb120\sa240\keep \b\f20 Store Parts\par \pard\plain \qj\fi180\sa240\keep \f20 Pretty much the same things can be said about store parts as for regular soups. Their primary advantage over soups is they offer better compression, but they're still not as space efficient as the other data structures we considered.\par However, until NTK supports the creation of store parts they're not an option.\par \pard\plain \s255\qj\sb240\sa240\keep \f20\fs36 Conclusions\par \pard\plain \qj\fi180\sa240\keep \f20 As you can see, there are a wide variety of trade-offs you can make when deciding how to represent your data. I'm sure the speed-freaks among you already have your sites set on a frames-of-frames and the bit-heads are already laying out binary-objects. However, I advise doing realistic experiments with the actual data data before finally choosing a data structure. I also advise designing your project so it's easy to switch among different data structures at any stage of development.\par }