The SCM Lounge: Change set under the microscope

RTC Source Control users deal with hundreds or thousands of change sets every day. Let's stop a bit here and have a look at what a change set looks like under a microscope.

Change sets are everywhere. Wherever you put your microscope inside SCM, you'll quickly find them under the cover.

In the Work Item editor, as links to the code that was modified to fix that work item.
In the Pending Changes view, as the currency you exchange with your team to collaborate
In the History view of your workspace or of your team stream, nicely sequentially organized
In the Build Results page, showing you what happened to the code since the previous build
In a Source Editor, when using the annotate feature

Under the naked eye, this is how a change set looks like, an innocent blue triangle:

When you create a change set, it remembers you for ever. So anyone who finds this change set will know its author. It also knows the date and time related to its creation.

A change set describes its purpose with a comment. It describes its mission with a bi-directional link to a work item. So you can trace back why that change set is there and who approved it, give feedback.

A change set is tied to a component. If you have permission to see the component, you can see the change set. Otherwise you won't. A change set is designed to flow between workspaces and streams, within the same RTC server or between different RTC servers.

A change set knows who it is. It is different from all the other change sets out there in the world. It's important because that is how RTC Source Control is a modern distributed SCM. A change set has a unique ID. We'll never mix your change set with mine, even if we are on different servers.

Now it's time to zoom into the core of a change set. Until now, we were just looking at its outer shell... A change set holds an immutable list of files and folders, with their expected starting state (required to apply the change set) and final state (after the change set is applied). It's immutable once its author decided to seal it (by completing or delivering it). The change set will forever describe how to take a set of resources from a particular state to another particular state.

There are things you won't see in the change set itself. Content of the files changed in the change set isn't part of it. You see, the change set is designed to be a very lean entity so you can scale huge histories of change sets and not hurt the database. Change set only contains the IDs that identify the starting states and final states of the files and folders modified. The content corresponding to these states is efficiently stored once in the DB (keyed with a hash). A change set is a bunch of ID's, not a huge text patch.

Another thing you won't see is path names. The change set stores the name of the files and folders that are modified (or moved, renamed..). But being efficient and portable, it stores only the ID of the parent folder of these modified items. No lengthy path name. Change set describes a change in a mathematically accurate manner (IDs). Of course humans will need to see path strings at some point e.g. src/com/my/project/A.txt. The UI resolves these paths when needed in the context (configuration) in which the change set is used (your workspace, a stream, etc.). For most SCM operations, only IDs matter. IDs are reliable and efficient (and unique). Paths are only useful when a reviewer is looking at the change set.

Sometimes a bacteria runs into another bacteria and consumes it. Some change sets are the result of merging change sets. For each file or folder merged, the change set tracks the starting state, the merge state(s) and the final (merged) state. So a change set knows it represents a merge between conflicting states and the UI can specially decorate it to get user's special attention (hey, did my team mate override my change?).

And this allows SCM to show the following merge graph. It displays the relationship of the states in the change sets forming the history of a file in a particular stream or workspace.

At that point, we've used a regular magnification. But we can push further, though beware most users usually don't get in there. A change set also knows a lot about how its author got to that final state for each file or folder it modifies. In the early stages of a change set, its author checks-in local changes he or she made to files and folders. The same file may be modified and checked-in many times into that same change set. The last check-in will define the final version - the one that will be applied with the change set. But the change set also remembers somewhere very deep each intermediate version that was checked-in before it was sealed. Since 4.0, these versions are available to the user through the File History View / Check-in History pane.

That's how far we will take our journey inside a change set today. It's the building block of SCM, and a pretty solid one.

The SCM Lounge

Thursday, September 20, 2012

Change set under the microscope

1 comment: