I know there is a large subset that reads this blog and/or follows me on twitter and/or is on the ddd list but for those who are not in those sets.
In hope of bringing the DDD list back to a focus of DDD I created a second list for Event Sourcing/CQRS/etc type discussions as I am sure many of the people on the DDD list feel spammed with the ideas. The list already has some great discussions on it and I think will become a great resource for those looking to learn the ideas.
I was just reading Patrick Smachia's great post on High Test Coverage and I have to say I agree nearly 100%.
I have a presentation up from a few years ago that covers some of the relations between TDD and DbC http://www.infoq.com/presentations/TDD-in-a-DbC-World-Greg-Young that might be worth watching for some people checking out these ideas.
As some of you may know though I did not announce it on my blog I ran an online course yesterday for the first time.
The class was full day and about the evolution involved with CQRS and Event Sourcing. It was an interesting experience as it was my first time utilizing the medium. Here are some thoughts from my perspective on how it went.
The good.
The not-so-good
Overall I kind of question the benefit of doing things in the online format. Overall I think I will do it one or two more times but I feel that once I get the format nailed down I would probably be better off to just record the 8 hour session and put it up for download etc. The main reason for this is that the medium is much more so uni-directional and lacks in large amounts of interaction. Having questions is a good thing but its not nearly as interactive as say a classroom setting where I can just whip out a pen and start drawing on the whiteboard to answer the question. Going with the recording a think a venue where people could ask questions would generally suffice for most of the interactions that happen live.
The other day I was talking with Kyle Baley on twitter and I gave an example of how it is possible to store "highly critical" data only in memory. I mentioned to him a similar problem, here it is.
The basics of the problem are that:
we need to support 10000 concurrent queries, we can reject any past that immediately
we are given a latent but extremely high bandwith that guarantees packet ordering
we are only allowed to store 10 of the 100000 objects of the dataset in memory per instance
Feel free to email me with your potential solutions to this problem (or put them in comments).
Some questions:
What did I not specify that we need for the solution to work? Is there anything that would make it easier?
What is the latency of a query in your solution?
How do we handle fail-over situations?
What is the total general purpose memory needed on our machine?
How would we scale to supporting 10,000,000 concurrent queries?
What happens if we need to support 1,000,000 objects?
What happens when we can change data not just query it?
Can we implement a unique constraint on say the id of our data? If so how?
Have fun guys :)
Writing this in the Web UI which kind of sucks because I have not found a good blogging utility in unix yet. I kinda miss livewriter.
One of the biggest advantages of using Event Sourcing with CQRS is that is drastically changes how we even think about our problems. Let's go through an example.
When I have an inventory item that is in a deactivated state and I deactivate it, it should throw an alreadydeactivatedexception
Or let's try another one:
When a gold customer makes a purchase they should receive a 15% discount on their purchase.
Basically all of our use cases can boil down to "given some state, when some action, expect something"
Making statements like these leaves off many things. How do we know that an inventory item can even get into a deactivated state? If it can't is there any purpose in testing what happens if it is? When we represent current state we talk about any possible state the object could have, not the finite number of states an object can get into. What happens if we never implement a deactivate method?
We also are coupling our use cases to the concept of current state. Unfortunately our representation of current state has a funny way of changing over time.
What happens when we Event Sourcing?
When an inventory item that has been deactivated is deactivated and alreadydeactivatedexception should be thrown
When a customer that has been marked as a gold customer purchases an item they should receive a 15% discount
These use cases are describing series of behavior not current state! We know that we are testing relevant items because we are giving series of behaviors that the object can do, as such the state that we end up with is necessarily a state the object could actually be in.
What is more important is that our use cases are being structured much differently. Instead of describing things as ...
"given some state, when some action, expect something"
we are describing things as:
"given the system has done these behaviors, when some action, expect something"
The difference between these two is HUGE!!!
1) The first method requires specialization for every type of object (eg: state representation) where as the second one is a generic one for all use cases.
Consider:
Given: A series of events
When: A command
Expect: A series of events
Since events and commands are simple messages this can be genericized in < 30 lines of code for all objects.
2) The first one couples our use case language to our internal domain structure where as the second one does not, it describes the actions in terms of a calculus of themselves. This is to me the most important part as we can freely change our internal structures without worry, and our tests will remain just as relevant.
Sorry for the run by posting but I hope some people gain something out of this.
Couldn't sleep on the train from Oslo to Bergen so I decided to start writing a bit. Correction, the train is stuck now in a tiny little town about two hours from Bergen, have to say, a beautiful spot to have to say it is a beautiful spot to be stuck.
Many people are talking about Command Handlers and how they work when they are really just CRUD handling very simple operations. Likely there is little or no validation in them and they are simply passing through information.
A perfect example of this might be the name of a customer in a CRM. There are no invariants of the customer object that need the name (there could be invariants imagined such as all customers who have a name of 'greg' get a 15% discount when buying things but let's imagine for a minute that such invariants do not exist in this case). Many people have been suggesting to just use the command handler as a pass-through to then publish the event (transaction script like). The code would look something like this (simplified).
class ChangeCustomerName : Handles<ChangeCustomerNameCommand> {
public void Handle(ChangeCustomerNameCommand cmd) {
if(somebasic logic) throw new Exception();
DomainEvents.Publish(new CustomerNameChangedEvent(cmd.FirstName, cmd,LastName));
}
}
A more stereotypical version of this might look something like.
class ChangeCustomerName : Handles<ChangeCustomerNameCommand> {
private CustomerRepository repository;
public ChangeCustomerName(CustomerRepository repo) {
repository = repo;
}
public void Handle(ChangeCustomerNameCommand cmd) {
var customer = repo.FetchById(cmd.Id);
customer.ChangeNameTo(cmd.FirstName, cmd.LastName);
}
}
with customer then creating the event.
There is nothing wrong with either of these solutions, both have their merit. Unfortunately the Ubiquitous Language becomes extremely interesting in the first case as there is no concept of a Customer in the domain model. There is a concept that the customer's name can be changed and that when it is there is a CustomerNameChangedEvent but there is no explicit concept of what a customer is within the domain model.
This is perfectly ok if DDD is not being applied (and there are many places where doing this style of transaction script command handler can be very useful). If DDD is being applied though this is probably not a good pattern to be following. Very little code is being saved (the same new of the event is just being put on the customer object instead of in the command handler) and a concept within the domain is being lost. This becomes especially true if the other object already exists within the domain model as there are other behaviors associated with it.
If it is the case that there are just a few operations on an Aggregate and they don't access current state for invariant protection the domain object will probably only contain an aggregate id and the basic if statements that are present in the command handlers otherwise. These thin little objects still have benefit though as they are defining and making explicit the aggregate boundary as well as giving vocabulary to what the aggregate is and the behaviors contained within its boundary.
It is also however a great example of how Domain Driven Design is not a pre-requisite to using CQRS and even events. For a great many systems the first command handler will be a much better choice than the second command handler (especially for line of business or simple web systems). It can also be a good idea to use a hybrid approach where although many things exist in a given context most of them are simple and not specifically modeled and the domain model itself is focused only on those cases where a domain model makes sense.
Started talking with some people on twitter about this today. This thread represents a bounty for anyone who ports resharper or a functional equivalent to monodevelop. I understand it is a LOT of work therefor I am willing to personally pledge $500 to the person who does it.
Some stipulations. I would want to see the product come out and then become OSS so the community can bring it forward or I would want jetbrains to commit to supporting the product for a period of time.
What are you willing to pledge? Let's give them an idea of how much money could be made on the project. Within 3 minutes we were up to $1200 on twitter. Post here or even better use the hash tag #pledgetoresharperformonodevelop (I like long tags) on twitter.
I will be doing a series of webcasts starting tomorrow at 12:00 EST
I figured this timing is the best because it allows people from Europe to the Pacific to have it within "working" hours.
The first webcast will be "The Ubiquitous Language is not ubiquitous". To join the meeting come to https://global.gotomeeting.com/join/153015861. It will also be recorded and placed online (viddler maybe?) to be posted on the blog after.
I will be doing one of these every week or two until further notice. Feel free to drop ideas here in comments for future webcasts
I have been reading through some of the academic literature on TDD and figured it might be useful to put up a quick little bibliography of what I am reading through.
White paper on Unit Testing R.Venkat Rajendran http://www.mobilein.com/WhitePaperonUnitTesting.pdf
Realizing quality improvement through test driven development http://research.microsoft.com/en-us/projects/esm/nagappan_tdd.pdf
Actually quite a few here http://research.microsoft.com/en-us/projects/esm/
Efficiency and Effectiveness Measures To Help Guide the Business of Software Testing http://www.scribd.com/doc/2060575/1999-BenchmarkQA-Whitepaper-Efficiency-and-Effectiveness-Measures-To-Help-Guide-the-Business-of-Software-Testing
Effective Unit Test Minimization http://se.ethz.ch/people/leitner/publications/min_leitner_ase_2007.pdf
On the Effectiveness of the Test-First Approach to Programming
Test-Driven Development- Concepts, Taxonomy, and Future Direction
A structured experiment of test-driven development
Idea Paper v.02 - An Analysis of Test-Driven Development
Quite a few here http://collaboration.csc.ncsu.edu/laurie/publications.html#Testing
Have some more important things to read through put them up!
Very often people attempting to introduce eventual consistency into a system run into problems from the business side. A very large part of the reason of this is that they use the word consistent or consistency when talking with domain experts / business stakeholders. A quick look up of the word consistent helps show where the confusion comes in.
S: (n) consistency (logical coherence and accordance with the facts) "a rambling argument that lacked any consistency"
S: (n) consistency ((logic) an attribute of a logical system that is so constituted that none of the propositions deducible from the axioms contradict one another)
Business users hear “Consistency” and they tend to think it means that the data will be wrong. That the data will be incoherent and contradictory. This is not actually the case. Instead try using the word “stale” or “old”, in discussions when the word stale is used the business people tend to realize that it just means that someone could have changed the data, that they may not have the latest copy of it.
If you can get this point to be understood the discussion about introducing eventual consistency becomes a fairly simple one.
You can quantify mathematically the “cost” of eventual consistency, the cost can generally be defined by how many more concurrency problems are experienced. If no concurrency problem is experienced then the end user view of the data is essentially identical for most use cases. It is important to note though that although this is one way of thinking about cost there are other aspects including complexity for the development team etc.
Unless you are using pessimistic locking, all data is stale, there are possibilities of optimistic concurrency failures. There is some period of time that it takes to build the DTOs, put them on the wire and for the client to receive them and draw them on the screen. There is also a period of time for a change to come from the client back up to the server. In all of these periods of time the data could change causing an optimistic concurrency failure. Let’s go with some numbers.
Get data from database – 10 ms
Build DTOs – 1 ms
Get data to client – 100 ms
Show on screen – 50ms
Send back to server – 100 ms
Server validation of request – 1 ms
So we can quickly add these together and know that any request the server is processing is operating on 262 ms stale data. Of course we have left out the largest thing the user! The human brain has roughly a 190 ms reaction time to visual stimulus, that’s just to realize the data has been shown on the screen, it is assumed the user is actually changing something as well. Do you measure the amount of time users take on various screens? Are you thinking it might be a good idea? Let’s go with a relatively quick time for the sake of discussion. A mean time of 60 seconds on a given screen. This gets added in as well so the total is now 60.262 s
Let’s imagine that we also tracked the number of optimistic concurrency failures. Hint: this is another value you should be tracking. We could relatively easily define an equation that represented the probability of a concurrency failure given the period of time. Most data sets will follow a normal distribution … Let’s assume that we get one (an example of where we may not would be if we had a periodical update at 62 seconds … thus P(t) approaches 1 at t = 60.
If we were to add in 5 seconds of eventual consistency assuming a normal distribution of changes we would end up with 65.262 seconds.
So we would have increased probability = P(65.262) – P(60.262).
Now for the last step. Let’s estimate the cost of an optimistic concurrency failure. Its a user, they have to redo something because they failed. We can come up with a rough estimate of the cost. The cost to the business from eventual consistency can at this point be estimated. Its important to note that for some transactions you may say “the value is high so we will never give a consistency error”, say for orders over $1000, it is profitable to later handle the problem even in a manual fashion, accept the order no matter what. This is actually a very valuable insight to reach. You know how often the case is being run over a period of time, you estimated the cost of the failure, and you know the increased probability of a failure due to n seconds of eventual consistency.
Estimated Cost = Number of Times * Increased Probability * Cost per time
This is a simple and effective way to help make decisions with eventual consistency. What is the cost in terms of user productivity and experience and what will you gain technically by introducing it? How will it affect your availability and partitionability?
I hope also that people will see the value in tracking metrics like how long users stay on screens and the number of consistency errors reported … These metrics can help improve user experience drastically.
An event is something that has happened in the past.
All events should be represented as verbs in the past tense such as CustomerRelocated, CargoShipped, or InventoryLossageRecorded. For those who speak French, it should be Passé Composé, they are things that have completed in the past. There are interesting examples in the English language where one may be tempted to use nouns as opposed to verbs in the past tense, an example of this would be “Earthquake” or “Capsize”, as a congressman recently worried about Guam, but avoid the temptation to use names like this for Domain Events and stick with the usage of verbs in the past tense when creating Domain Events. These nouns tend to match up with “Transaction Objects” discussed later from Streamlined Object Modelling. It is imperative that events always be verbs in the past tense as they are part of the Ubiquitous Language.
Consider the differences in the Ubiquitous Language when we discuss the side effects from relocating a customer, the event makes the concept explicit where as previously the changes that would occur within an aggregate or between multiple aggregates were left as an implicit concept that needed to be explored and defined. As an example, in most systems the fact that a side effect occurred is simply found by a tool such as Hibernate or Entity Framework, if there is a change to the side effects of a use case, it is an implicit concept. The introduction of the event makes the concept explicit and part of the Ubiquitous Language; relocating a customer does not just change some stuff, relocating a customer produces a CustomerRelocatedEvent which is explicitly defined within the language.
In terms of code, an event is simply a data holding structure as can be seen in Listing 1.
Listing 1 A Simple Event
public class InventoryItemDeactivatedEvent {
public readonly Guid InventoryItemId;
public readononly string Comment;
public InventoryItemDeactivatedEvent(Guid id, string comment) {
InventoryItemId = id;
Comment = comment;
}
}
The code listing looks very similar to the code listing that was provided for a Command the main differences exist in terms of significance and intent.
Other Definitions and DiscussionThere is a related concept to a Domain Event in this description that is defined in Streamlined Object Modeling (SOM). Many people use the term “Domain Event” In SOM when discussing “The Event Principle”
Model the event of people interacting at a place with a thing with a transaction object. Model a point-in-time interaction as a transaction with a single timestamp; model a time-interval interaction as a transaction with multiple timestamps. (Jill Nicola, 2001, p. 23)
Although many people use the terminology of a Domain Event to describe this concept the terminology is not having the same definition as a Domain Event in the context of this document. SOM uses another terminology for the concept that better describes what the object is, a Transaction. The concept of a transaction object is an important one in a domain and absolutely deserves to have a name. An example of such a transaction might be a player swinging a bat, this is an action that occurred at a given point in time and should be modeled as such in the domain, this is not however the same as a Domain Event.
This also differs from Martin Fowler’s definition of what a Domain Event is.
Example: I go to Babur’s for a meal on Tuesday, and pay by credit card. This might be modeled as an event, whose type is “Make Purchase”, whose subject is my credit card, and whose occurred date is Tuesday. If Babur’s uses and old manual system and doesn’t transmit the transaction until Friday, then the noticed date would be Friday. (Fowler)
Further along
By funneling inputs of a system into streams of Domain Events you can keep a record of all the inputs to a system. This helps you to organize your processing logic, and also allows you to keep an audit log of the system (Fowler)
The astute reader may pick up on the fact that what Martin is actually describing here is a Command as was discussed previously when discussing Task Based UIs. The language of “Make Purchase” is wrong. A purchase was made. It makes far more sense to introduce a PurchaseMade event. Martin did actually make a purchase at the location, they did actually charge his credit card, and he likely ate and enjoyed his food. All of these things are in the past tense.
An example such as the sales example given here also tends to lead towards a secondary problem when applied within a system. The problem is that the domain may be responsible for filling in parts of the event. Consider a system where the sale is processed by the domain itself, how much is the sales tax? Often the domain would be calculating this as part of its calculations. This leads to a dual definition of the event, there is the event as is sent from the client without the sales tax then the domain would receive that and add in the sales tax, it causes the event to have multiple definitions, as well as forcing mutability on some attributes. One can bypass this by having dual events (one for the client with just what it provides and another for the domain including what it has enriched the event from the client with) but this is basically the command event model and the linguistic problems still exist.
A further example of the linguistic problems involved can be shown in error conditions. How should the domain handle the fact that a client told it to do something that it cannot? This condition can exist for many reasons but let’s imagine a simple one of the client simply not having enough information to be able to source the event in a known correct way. Linguistically the command/event separation makes much more sense here as the command arrives in the imperative “Place Sale” while the event is in the past tense “SaleCompleted”. It is quite natural for the domain to reject a client attempting to “Place a sale”, it is not natural for the domain to tell the client that something in the past tense no longer happened. Consider the discussion with a domain expert, does the domain have a time machine? Parallel realities are far too complex and costly to model in most business systems.
These are exactly the problems that have led to the separation of the concepts of Commands and Events. This separation makes the language much clearer and although subtle it tends to lead developers towards a clearer understanding of context based solely on the language being used. Anytime one ends up with dual definitions of a concept there is a weight placed on the developer to recognize and distinguish context, this weight can translate into both ramp up time for new developers on a project and another thing a member of the team needs to “remember”. Anytime a team member needs to remember something to distinguish context there is a higher probability that it will be overlooked or mistook for another context. Being explicit in the language and avoiding dual definitions helps make things clearer both for domain experts, the developers, and anyone who may be consuming the API.
Works Cited
Fowler, M. (n.d.). Domain Event. Retrieved from EAA Dev: http://martinfowler.com/eeaDev/DomainEvent.html
Jill Nicola, M. M. (2001). Streamlined Object Modelling. Prentice Hall.