I have seen several discussions on implementing temporal databases. I am working on an application that needs something very similar, but slightly different from what I have seen discussed on the list. I would like a "repository" database that allows me to provide versioning and allow collaboration. I would like to structure it conceptually similarly to SVN or CVS. I would like to use some of Cayenne's abilities to accomplish this.
If anyone has some spare cycles to help, I am looking to hire a qualified Cayenne developer for some part-time work. Follow this link to job posting 266326 on Odesk.com. http://www.odesk.com/console/j/opening.php?key=8bd7d42fda97bfde&tot=1&pos=0
(By the way, I think Odesk does a nice job providing a good contracting environment and have had a good experience so far.)
If you are just curious about what I am thinking, I have included a long-winded explanation of my initial ideas. If you are at all interested in this or have tried similar things, I would like to know what you may think.
James Jensen
... ... ... ... ...
(Understand that this is a first pass, and is intended as a starting point)
The databases that make up this system should behave similar to a repository such as SVN or CVS. There is one "central" repository database, and multiple client databases. A client can only connect to one repository, and a reasonable mechanism should be in place to prevent normal users from connecting a client to a different repository than the one for which it was originally created.
All modifications by users are made to the client database. There will usually be one per laptop/user, although occasionally there could be more than one user. Each client application should have a password login process that identifies the user, and selects the correct local client db that is used for that user. (The application should also have a roll based security system to restrict/allow specific activities, like sales, installer, scheduler, admin., etc) For each workstation, a user must have a separate user db. I.e. Each client is identified by user and workstation id. When a user logs into a new workstation, that user must create a new client from the repository, and cannot directly access data on a different workstation that has not been synced. Information never flows directly from client to client.
Modifications to the repository are made by a client "syncing" with the repository. Like SVN, the client must first update any and all repository changes back to the local client by merging and choosing all changes. When all merges are complete and client has updated to the latest repository version, it is allowed to commit its changes to the repository. The commit process should be atomic, and the first one wins. I.e. If two commits from different clients start at the same time, the first to get the lock, checks repository version, does its update, which causes the repository version to advance, the second client's commit will fail because of the version mismatch, so the client will need to update again and then attempt to commit again.
The repository is intended to be a fairly passive part of the system. It consists of a database and some locking mechanism to prevent concurrent access. It is intended that the repository be some sort of web service (Like Cayenne makes possible). Connections from client to repository being made via web. The repository is only really visible through the clients. All application logic is managed by the client.
Versioning:
The repository behavior is enabled by adding a few extra columns to most tables in the database, in addition to the ID column. Rows will be fully identified by ID, BIRTH_VER, RETIRED_VER, and CLIENT_ID. (Assuming Cayenne is used as the ORM, ID will actually be created by some custom code to combine the standard auto-increment ID plus the CLIENT_ID added as some upper bits to ensure that ID is always unique across the repository without requiring that the CLIENT_ID column always be used for all queries.)
The "current" version is a concept that applies to the entire database. Each table does not maintain it's own notion of current version. Even and odd versions are used to differentiate between synced data and client modified data. Even indicates data synced with the repository. Odd versions have been modified since the last sync.
When a record is created, BIRTH_VER is set to current version (some odd client ver) and RETIRED_VER is set to max-value or some other really big value called NOT_RETIRED that logically represents "beyond" the current version.
When a record is deleted, a check is made to see if that retired version already exists, if it does, then the deleted data is just copied over the data with the same retired version. If a retired record doesn't exist yet with that version, retired is set to current version . When a record is modified, the original record is treated like a delete (some odd number), and the record is cloned with same id, with the death set to NOT_RETIRED. The birth could be set to current, but it would work even if it were not, queries would just be harder.
It is possible to "save" a version by simply incrementing the current version number to the next odd number, thus protecting all existing retired data from further changes and a new layer of history has been started. Also queries for the current snap shot just have to test for retired == NOT_RETIRED.
When a client is updated. It uses its last updated even version as a base line and looks for any newer versions. If they exist, entries from tables of that version are copied to the client (maybe to tmp tables). Merging is not as simple as text merges, and requires special code that takes into account the specific tables being merged and presents appropriate choices to the user. As pieces of the merge are made they are give a new version number. When a successful set of merge data has been chosen/created, then the data is accepted as a new odd version and the last updated version is changed to the even version of the repository. To summarize, an update consists of first downloading a tmp version of latest modified records from the latest repository state, attempt to merge, if a successful merge is created, it can be accepted, or the merge process can just be rolled back a version and thrown away.
When a client wants to commit, if the "updated" version matches and no new odd versions have been created, then the client data is copied over to the repository, but with any odd version numbers moved to the "next" even version. I.e. Only records with even versions makeup the "official" repository. By committing, a version is being saved in the repository. (The original odd version records can also be copied to the repository and replicated to other clients if it is desired to give clients the opportunity to see all the versions used to merge into the repository. However, this expands the amount of data stored even further.)
Thanks for your interest and patience if you made it this far.
This archive was generated by hypermail 2.0.0 : Sat Aug 04 2007 - 16:43:15 EDT