Considerations for the project scope of a Machine Learning project

Let’s start with the basic questions

  • What is the business problem to be solved?
  • What is the situation AS-IS?
  • What are the current pain points you are facing?
  • How are you attending these pain points?
  • Are the causes of the problem identified?
  • What is the problem impact?
  • What is the desired situation TO-BE?
  • What are the expectations about the solution to be build?
  • Do we have enough data?
  • Is there already data analysis done on that data?
  • How difficult will be to access to the data?
  • What is your tolerance to errors?

Data: the ability to build ML models depends on the quantity, quality and availability of data. So ask questions about data is where you are going to invest more time.


I use to define the project scope using the “POLDAT” best practice, and determining what is in-scope and out-of-scope:

In scopeOut of scope
Standard “project scope” table

Doing it, you will be able to ask for many questions, determine in detail so many variables and facilitate the approval of the project scope.

As every project manager knows, scope creep is a hidden enemy we all want to avoid.

Pay attention to

Processes: the definition of the business processes that will be consuming the ML solution deliverables (decisions, reports, forecasts, etc.).

Organizations: Determine the organization, roles and people benefiting or consuming the ML solution.

Location: It could be relevant when the availability of the ML solution need to be high or reach specific locations or places where communications are not always available. In manufacturing this is something that sometimes is so complex.

Data: The section that is more complicated to define is the data section. You probably are going to receive a more or less clear data sources and architecture, but it’s not usual to have clean data or accurate data. This makes that the “data preparation” part of the project requires more time and participation of other stakeholders.

Data quality: try to define some type of criteria about the quality of data available that you are going to consume. I have seen situations where a ML project starts assuming that data from a given source has high degree of quality, and that assumption was finally not materialized, provoking the cancelation of the project.

Applications: You have to define what is the application that will suddenly request decisions to the ML solution or that will consume data or results from the ML solution. Related to this, define what are the changes that the application need. How they will be approached is something to be reviewed later.

Technology: in high amount of ML projects the main technology is clear, what is not always clear is the technology or availability of the data integration solution that will enable the data to go from the current operations to the ML solution. Here there are not just technology hurdles, but security, availability and operations questions that need to be defined.

Any other thing that should be taken into account?

Leave a Comment