d /  ideas / edesk

Basic idea is to use e4 as a workspace(s) integration bus.  The goal
is to break up monolithic applications into components which perform
single tasks well.  This has many benefits:
- choice of best application for task
- ability to use different application components on different
- ability to split workspace over multiple machines
- ability to have common services for groups of users

Fundmental Concepts

The system is build on some fundamental concepts:

    objects are typed.  the most useful type, at this stage, seems to
    be to reuse the existing MIME type framework, as is done for Xdnd,
    Windows registry, etc.

    objects are useless without actions.  there exist many actions in
    the normal desktop metaphor-style environment.  edesk uses a
    common, standardised set of actions to complete its types.

    a user is a person.  users have unique identities within the edesk

    a device is a computer.  a device MUST be networked, at least some
    of the time, in order to participate in an edesk system.  in
    addition to networking, it may possess one or more facilities for
    input from user, output to user, storage, computation, or special
    purpose I/O (ie. zip drive).

    some devices may be single or limited purpose, for example,
    consider an audio speaker capable of receiving an MP3 stream and
    "rendering" it into sound, or an X terminal.  others will be more
    general-purpose, like a desktop PC.

    a workspace is a physical place within which a single user is
    working.  users may serially use many workspaces, but at any point
    in time, may be present only in one.

    a workspace is composed of devices, and the set of devices
    supporting a particular workspace may change.

    mobile devices may individually support a workspace (when the user
    is on the bus), and then later form part of a different workspace
    (when the user is at her desk).

    a workspace may continue to operate on behalf of a user when the
    user is not present.

    some devices may support multiple workspaces simultaneously
    (consider a large screen in a lunch room).

    user interactions may use different forms: graphical, textual,
    auditory, hardcopy, keyboard, gestural, etc.  the selection of a
    particular combination of modalities to support an action on a
    typed object can be made dynamically.

    you might choose a text-based browser to display some pages, and a
    graphical browser for others.  you might choose to have your email
    read to you sometimes, or textually presented others.

    some factors influencing this decision are obviously the devices
    available within the current workspace, and the modality of the
    triggering component.  but additional factors can, and might be
    used to select an appropriate mechanism for a particular

Using these components, the applications forming the familiar desktop
computing environment can be restructured into a distributed
collection of collaborating components, able to span multiple devices
and to more-or-less seamlessly follow the user as she moves between

Instead of (necessarily) building monolithic applications, each
combination of type and action may be supported by a single
component.  The selection of this component is the result of a runtime
negotiation between contenders, decided on the basis of type, action,
workspace, current modality, explicit user direction and possibly
other factors.

This concept is not entirely novel.  The basis for GNOME (the GNU
Network Object Model Environment) is similar, using CORBA as the
communicating framework.  However, they have focussed on the
development of Microsoft-competitive applications, rather than a model
of a networked object environment.

Similarly, the Tooltalk event bus is part of the Open Groups's CDE
(Common Desktop Environment) and provides a means for applications to
communicate.  It was never widely adopted by application writers.


The general "action" event drives the workspace.  It has three
fundamental parameters:

  This is the type of action which is being requested.  There are not
  that many really different actions that are needed to run your
  normal desktop.  The current list is:
    Every type can have a default action, much like the default click
    in a traditional GUI environment.  This will normally display
    text, fetch an URL, compile C code, etc.

    Create a new instance of the specified type.  This is like the
    MIME "compose" action.

    Like create, but using another instance of the same type as the
    initiator of the creation.  This is useful in mail and tickertape
    (so far, any others?).

    how does this compare to the (common) operation of cloning an
    existing file to create a new one?

    Modify the specified instance of the specified type.

    Display an object, in its current form.  The use of different
    modalities is the key to this action: it can mean show on screen,
    read via speech synthesis, print on paper, etc.

    Change the supplied object (and its dependents?) into an object of
    the specified target format ?  Could be useful for HTML,
    PostScript, C code, graphics files, etc.
    Useful for objects that represent encoded instructions in some
    form.  Executable binaries, shell, PERL or Python scripts.  Even
    things like HTML source can be "interpret"ed.

    For references (only?)

    ??? FTP?

  This is the type of the object upon which you wish to perform the
  action.  It is represented using MIME types.

  This is the object upon which the action is to be performed.

  Note that it might be necessary to distinguish between objects
  contained within the event, and objects located elsewhere, but
  referred to by the event.  Many tools will need to be able to work
  indirectly on objects, especially those on a filesystem.

  On the other hand, the tools might not always share a filesystem.
  Use of global handles, such as an HTTP URL, would allow an
  indirection via a fetcher.  Perhaps as a `Location' attribute?



there is a general issue with elvin, nicely illustrated by a HTTP
caching service:

when something emits a request to fetch an URL, both the fetcher and
the cache see the event.  both could start to respond, but that is
inefficient.  the question is: how do we maintain an ecology of
interacting apps with a need to prioritise responses?

the example is nicely illustrative: what if there is no cache
available?  what if the cache is there, but doesn't have the requested
item?  what if the cache wasn't there initially, but starts later (or
vice versa)?

where the example falls down is that there is only one level choice
(cache or fetcher).  it's possible that there are many layered
options, and that the layering could be configured either by the user
or by service provisioning.

an alternative example would be a set of bots listening to a
tickertape channel.  deciding which should respond to a query is a
similar problem.


Event Formats

taking the fundamental concepts, the interaction patterns and wrapping
these into event formats, we get:

When a tool is started, it needs to query the workspace to see whether
and how it is needed.  This should be done for each type that the tool

  edesk.elvin.org : 1000
            Event : "query"
             User : "user@example.com"
        Workspace : "office"
             Type : "text/plain"
          Actions : "|view|"
            Modes : "|gui|"

Tools respond to a query with a proposal of their role in the
workspace for that type.  Each action and modality requires a separate

The "Level" is the currently assigned level for that tool, if any,
while "Affinity" is the tool's preference: "top", "bottom" or "middle"
with an optional "+" suffix.  This is used to automatically layer
tools in a stack from user (top) down.  The trailing "+" indicates a
willingness to stack away from the specified position by a few levels
if required.

The "Round" field is used to record the number of attempts to propose
a layering.  If the number of rounds exceeds a maximum value, the
negotiation fails.  If a central arbitrator is present, it might
monitor the negotiation, and require human input at a specified point

  edesk.elvin.org : 1000
            Event : "propose"
             User : "user@example.com"
        Workspace : "office"
             Type : "text/plain"
           Action : "view"
             Mode : "gui"
            Level : 1
         Affinity : "top"
            Round : 1
             Name : "less/xterm/edeskd"
              Cid : "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15"

For each workspace, an edeskd may act as an arbitrator for tools.
When a conflict arises, edeskd might ask the user to resolve it, and
then assign tools to a particular layering.

  edesk.elvin.org : 1000
            Event : "assign"
             User : "user@example.com"
        Workspace : "office"
             Type : "text/plain"
           Action : "view"
             Mode : "gui"
            Level : 3
             Name : "less/xterm/edeskd"
              Cid : "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15"

The basic request for an action on an object.  The object may be
present inline in the "Object" field, or indirectly via the "Location"

  edesk.elvin.org : 1000
            Event : "request"
             User : "user@example.com"
        Workspace : "office"
              Xid : "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15"
           Action : "do"
             Type : "text/uri-list"
             Mode : "gui"
            Level : 1
           Object : [68 74 74 70 3a 2f 2f 63 6e 6e 2e 63 6f 6d]
         Location : "webnfs://fileserver/home/user/file"

When a tool chooses to respond to a request, it must inform the
workspace that it is doing so.  At minimum, it should send one such
notification, indicating that the request is complete.  If the request
is time-consuming, it must respond immediately to claim the request,
optionally throughout the performance with partial progress
notifications, and finally with a completion notification.

At this stage, result codes are taken directly from the HTTP
protocol.  This will be clarified.

  edesk.elvin.org : 1000
            Event : "progress"
             User : "user@example.com"
        Workspace : "office"
              Xid : "f1d2d2f924e986ac86fdf7b36c94bcdf32beec15"
 Percent-Complete : 35
      Result-Code : 0
      Result-Text : "Fetching ...  35%"

Some Example Apps

This section explains the architecture (and re-architecture) of some
example applications using the edesk model.

Web Browser

A web browser often consists of multiple components:
- HTML viewer
- HTTP client (URL fetcher, with cache?)
- bookmark manager
- hypertext history viewer
- cookie jar

The plan is to split the browser into these components, connected by
Elvin.  The viewer must accept commands to display a file of HTML, and
generate events when a hyperlink is clicked.

The HTTP fetcher needs to listen for requests to fetch URLs, and
notify their arrival.  Some feedback on progress is also important.
The feedback of download progress need not necessarily be in the
"viewer" app, eg, there could be a separate progress-bar-app which
consumes progress events.  This would have the benefit that each app
would not have to implement their own.

The bookmark editor should accept notifications of URLs, and should
emit requests to display an URL.

The history viewer is similar to the bookmark editor.

The cookie jar will need to respond to requests from the fetcher so
that the latter can provide the cookie data to the remote HTTPD.  This
assumes that the fetcher is the only component with an outgoing


Tickertape could consist of a 

  the scroller should scroll pixmaps according to various config
  parameters, and report mouse/keyboard events relative to those

  this would require events to: add/replace/remove/list/get pixmaps,
  set/get scrolling config, report UI events, and some mgmt events
  (startup, shutdown, etc).

history window
  much like the scroller, but showing the threaded history.

chat message composer
  compose Tickertape-format chat event.  much like the current chat

subscription editor
  a subscription editor would provide a means to construct
  subscription expressions, perhaps with some graphical help?  or a
  query by example?  or ... a text window ;-)

  it would talk to the message renderer.

message renderer
  would subscribe to subscriptions from the editor, and generate
  pixmap events for the scroller/history.

Progress Meter

This is a general purpose application.  It consists of a number of
"progress bar" widgets, each of which can be identified by name,
colour, etc, based on the owning event stream.  Once a task is
completed, some completion action could also be performed.

Imagine having all your downloads, cvs updates, compilations, expense
requests, SATS, general workflows being monitored in a single place,
so you don't have to return to check on their progress.

It could even handle things that weren't so much a percent-complete
type event stream, but just an ongoing actions event stream, like
software releases, web page changes, etc.

In some ways this is a generalisation of buildmon, with bits of
tickertape also.


Nesting of workspaces
  it'd be nice to have a single URL fetcher or HTTP cache, for
  example, that worked for an entire site.  how do we scope the
  requests so that this can work?

  should such user-agnostic tools ignore the Workspace parameter when
  looking for requests?  probably?


26 nov 1999 : modified : $Date: 2002/09/07 08:02:04 $