Post-mortem incident analysis

After having a priority 1 issue in production, we conduct a post-mortem analysis.

Some best practices (ground rules)

We try to see the issue as an holistic situation where we can learn more than expected.

Seeking Single Root Cause for an outageAddressing the multiple systemic contributing factors to an outage
Prevention (only)Breaking down TTR (time to resolve components)
Blameful post-mortemsBlameless post-mortems
Surface Level post incident reviewsStrategic Level understanding and improvement

Mean Time to Recover (MTTR) Metric

MTTR = MTTD (Detect) + MTTI (Isolate) + MTTF (Fix)

Spotify Engineering Culture

This couple of videos helped me to understand how agility can be implemented into complex software development environments.

A lot of agility knowledge shared in these videos, and so many topics related to SAFe,

Video #1, this is more technical oriented, you will find things as agility, squads, continuous deployment, how teams are organized, events…

Video #2, this is more cultural oriented, where you will find the relationship with failure, innovation, hacking time, hacking events, innovation/predictability, chaos/bureaucracy

In video #2 there is an explanation about how to lead with chaos and bureaucracy that is really interesting (starts in minute 9:00), and it is represented by this picture:


SAFe Scaled Agile Framework

Time to learn, time to read, now working on a deal where the customer is implementing their own version of SAFe.

There are a lot of documentation and intelligence on all of this, very interesting time learning about it.

Something I do not like is that is a closed community, I understand that organizations need ways to finance themselves, but to impose a training payment is a stopper for the expansion of these practices.

Organizational debt, DevOps and OpenSource

A debt is a debt, it’s something you acquire to be able to acquire other capabilities or values in a market, product, behavior… Every company have this type of debt with more or less exposure.

organizational-debtThis debt can be excess of organization (typically a big company) or lack of organization (commonly a start-up). Companies acquire debt to be more competitive: more speed of execution, less price per product,…

Organizational Debt: The interest companies pay when their structure and policies stay fixed and/or accumulate as the world changes.


The worst problem of organizational debt is that the people running the company does not know it. The second problem is that they know the issue and they kick it forward, increasing the debt. The third big problem is the lack of action. If some of these reasons are affecting a company, it’s basically because it’s run by incompetent people and they will disappear sooner or later.

Apart of that, the typical “debts” acquired by companies are:

  1. Obsolescence: use of old fashion processes and methodologies, with respect the market conditions.
  2. Accumulation: too much roles, too much review processes, too much people… again with respect the market conditions.
  3. Bad behavior of leaders and employees: No one trust each other, fear, people act in a defensive way.

For instance, DevOps is a culture supported by some automation tools that if it’s well implemented, we will have the opportunity to remove organizational and technical debt:

  • Organizational debt: due to the fact that some processes will be more agile. The challenge here is that the organization has to acquire them and be able to adapt to them (this is the hard side of DevOps).
  • Technical debt: due to the use of automation tools that will remove obsolete tools or manual scripts that makes the speed of execution to be lower.

To me, DevOps has more positive impact on organizational debt than on technical debt.

Open Source is also contributing significantly to the debt reduction, specially the technical debt. You have other organizations implementing solutions that you can use in your organization with low price. If your organization is agile enough to use them, you will be able to reduce the level of exposure of these lack of technological need. If you do not have this ability, you will have to acquire a service/product (increasing financial debt) or even worst: you will have to implement by yourself.

MVC pattern + DevOps + agile = SaaS developer

This equation is so simple, there are transferable skills that can be used in different platforms.

MVC pattern + DevOps + agile = SaaS developer

In the market, companies working on SalesForce, ServiceNow, Workday… all of them are looking for developers. You cannot learn how to develop ServiceNow code using the traditional training channels: they just don’t exist.

So the skills that these companies are looking for are:

  • Developers with perfect understanding of MVC pattern.
  • Developers who understand the cultural aspects of DevOps, the principles of automation in SDLC and quality of software.
  • Developers with perfect understanding of some of the agile methodologies.

Hiring these skills, you ensure that the learning curve to be a developer in your own platform is minimized. They can become experts in just some months.

By the moment, the only academy that certifies that these guys is the SalesForce academy. So many of the certified developers in are being hired by other companies to work on ServiceNow. It’s simple, they only have to get familiar with the library of objects.

I have just noticed that AWS offers their own DevOps training, classifying DevOps as a methodology, which is wrong!

Walterfall Vs Agile

And vice verse, because in some software areas Agile is becoming the primary methodology to build applications and pieces of software, because the cycle of delivery of versions is shorter and shorter and short cycles of delivery is what the market is demanding.


They are different things itself , so we can do thousand of lists about what is good about one and better than the other. To me both are required and to know them well is a necessary skill all people related to the industry should have. This is being key in terms of maturity of the organizations and their capacity to deliver within the standards. Some years ago a software developer was required to understand all patterns, now they also have to be skilled on Scrum, agile….

For IT Start-ups agile is in their DNA, after a week attending events organized for start-ups, they do not pay any attention to agile, they assume is the way to execute, it’s assumed as a basic things as the breath.

RAD or Waterfall?

Dynamic Systems Development (or DSDM) is a method based on the assumption that nothing is perfect first time, but that 80% of the solution can be produced in 20% of the time it would take to produce the whole solution.

It is highly iterative, all steps can be re-visited, therefore the current step need be completed only enough to move to the next step. It builds on the best practices of traditional waterfall development approaches, and includes the controlled use of new techniques without allowing RAD to revert back to hacking out code and documenting it on the back of a cigarette packet!

The DSDM Consortium, launched in January 1994, is creating an industry-standard, public domain method for Rapid Application Development (RAD). It has over 50 members including corporate users such as: British Airways, Allied Domecq, American Express and J P Morgan Investment Management and software houses/tools vendors such as: Logica, Data Sciences, Cognos, Sapiens.
In the early 1990s a new term ‘Rapid Application Development’ (or RAD) was launched upon an unsuspecting IT industry. RAD was intended to be different from the classical, sequential (or ‘Waterfall’) methods for application development.
RAD grew as a movement in a very unstructured way; there was no commonly agreed definition of a RAD process, and many different vendors and consultants came up with their own interpretation and approach.
  • By 1993 there was momentum in the marketplace with a growing number of tools vendors developing or repositioning their products to meet a growing demand from their customers for RAD technology.
  • In 1994 the DSDM Consortium was born with over 50 organisations interested in improving the software development process.
  • In 1995 version 2 was published.
  • In 1997 version 3 was published.


This question was raised in 1990, but apparently just some years ago is when it has become popular. 🙂

Delays due to lack of alignment with project objectives

Some months ago we were reviewing a project that had continuous delays with different reasons for delay and without delivering builds as expected.

I sat with the PM to review the schedule and the different modules to deliver.

We sent the schedule to the trash and started another one. We don’t need to deliver each month to the client, but we need to ensure the delivery of the final product at the end of October, so we defined a sprint each month.

We defined the product backlog in functional modules, no using the architecture structure that was supposed to be already built. We fit the modules using the original estimation of hours provided to the project by the SMEs: The project will assume current delay, but not more, new delays will be assumed by team.

We also let some free time to complete this common part of the application in order to don’t crash the quality of the architecture, and assigned an unique person as responsible of this activities.

I was asking the junior PM how he was ‘dispatching’/’gathering’ the ‘work to be done’/’done’, and we change the way he was doing; it was not done properly, he didn’t get the compromise of the delivery from his team. No problem man, we established a weekly meeting to review the coming week and a daily meeting to review the work to be done on the current week (review problems, consider minor changes, share evolution…).

On the first meeting, the new approach on the execution of the project, explaining all the aspects mentioned above and defining a responsible to each module: main objective, ensure the responsibility of the delivery on each responsible.

2 months later…

Ee have a new little delay on the backlog, but it’s something we will recuperate soon. The important thing is that people is aligned with the product to be delivered and the team has real speed of development.

The junior PM is the more happy person in the world, he has seen how the efficiency of his work has increased.

Agile works?? I don’t know, what I know is people needs to be aligned with the objectives and that you need to provide the team a realistic way to deliver, based on real world.

If you don’t ask a person the move a mountain of sand in one time, why do you do ask it with a software solution?