Re: temporal database generation

From: Tomi NA (hefes..mail.com)
Date: Sun Aug 27 2006 - 18:58:25 EDT

  • Next message: Øyvind Harboe: "Re: Detecting when outer joins are needed [Was: Cayenne does not throw an exception when an outer join is attempted]"

    On 8/26/06, Mike Kienenberger <mkienen..mail.com> wrote:
    > On 8/26/06, Eric Lazarus <ericllazaru..ahoo.com> wrote:
    > > I guess I was thinking about it done in Tomi's way but if this way can ensure that no data is ever lost, that is pretty powerful.
    > >
    > > What are the advantages of doing it Tomi's way?
    > >
    > > He can do efficient queries to see what the state of the database was at any time in the past? Is that right?
    > >
    > > He could efficiently materialize objects as they existed at any time in the past.
    >
    > Yeah, the only real difficulty in doing this is handling join queries.
    > Each join has to also eliminate non-relevent data.
    >
    > I actually did a lot of this "by hand" for a WebObjects project
    > (Imperial Wars) a long time ago. Time was measured in discrete units
    > (turns), but it worked pretty much the same way.

    The above two comments Erik made describe exactly what I'd like to
    have at my disposal when I start designing a new system. I'd like to
    make my reasons clear.
    Experience showed me that a significant percentage of the systems my
    company deploys have to keep some sort of history. Furthermore, in
    almost all systems, we need a way to find out what happened to the
    system in a specific period of time.
    Having the possibility to set the "system time" at runtime (I even
    considered the possibility of making it a user-level setting) would
    allow me the following:
    1) do a detailed "playback" of all system events
    2) compare analysis on system objects in different moments, i.e. have
    an instant time-dimension embedded in my system, to build data
    warehouses upon
    3) troubleshoot: when something fails (due to one reason or another),
    it would be a lot easier to find the cause if the data wasn't somehow
    overwritten. Obviously, if the problem was with the temporal
    mechanisms themselves, this wouldn't work, but that's one of the
    reason I'd like to automate construction of such a database.
    4) security: no one can do anything within the system without there
    being a clear trace of his/her actions in the system. Sure, log files
    are fine, but they're much harder to automatically analyze in order to
    find e.g. patterns of behaviour of the offending user etc.
    5) inherently temporal applications: I'm working on an app right now
    whose key interface element will be a time slider, positioning the
    user into a given moment in time so that he can access data from that
    exact moment - what he does with that data is not important, but it is
    important that he be able to get at it, at runtime and without having
    to call me to retrieve the state of his database from 5 years ago
    6) meeting regulatory requirements: sometimes, for some government
    systems and the like, the application *has* to work in a non-overwrite
    mode. There simply is no overwriting anything.

    Now, an audit trail is a useful tool answering point number 3 and possibly 4.
    1, 2, 5 and 6, however, (I feel) need a more robust approach to making
    "time" an integral part of all stored data.
    As a side note, when I first heard about postgresql's capability to do
    PITR (Point-In-Time-Recovery), I was fascinated, but then I learned
    you have to take the database offline, initialize it in a certain way
    and so on, so it's not nearly as useful as I hoped it would be. The
    server itself is probably too low a level to implement meaningful time
    dimensions without severely limiting the type off applications you
    could run on it, but still, the idea fascinated me. It's basically a
    niche in the ORM problem space (as I see it) that hasn't been
    addressed yet: I feel it'd be a nice addition to the already existing
    arsenal of features Cayenne has to offer.

    As an alternative, I was thinking about generating a trigger system in
    the database that would perform most of what I have in mind here, but
    I'm not sure how it'd get along with cayenne, and I'd still have to
    design the hairy temporal database model by hand, instead of being
    able to focus on the nature of the data, so I'm not really sure where
    to go from here.



    This archive was generated by hypermail 2.0.0 : Sun Aug 27 2006 - 18:58:48 EDT