How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among..."> How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among..." /> How To Build a Benchmark for Your Models I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among..." />

Upgrade to Pro

How To Build a Benchmark for Your Models

I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with:

They rarely have a clear idea of the project objective.

This is one of the main obstacles data scientists face, especially now that Gen AI is taking over every domain.

But let’s suppose that after some back and forth, the objective becomes clear. We managed to pin down a specific question to answer. For example:

I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn”

Well, now what? Easy, let’s start building some models!

Wrong!

If having a clear objective is rare, having a reliable benchmark is even rarer.

In my opinion, one of the most important steps in delivering a data science project is defining and agreeing on a set of benchmarks with the client.

In this blog post, I’ll explain:

What a benchmark is,

Why it is important to have a benchmark,

How I would build one using an example scenario and

Some potential drawbacks to keep in mind

What is a benchmark?

A benchmark is a standardized way to evaluate the performance of a model. It provides a reference point against which new models can be compared.

A benchmark needs two key components to be considered complete:

A set of metrics to evaluate the performance

A set of simple models to use as baselines

The concept at its core is simple: every time I develop a new model I compare it against both previous versions and the baseline models. This ensures improvements are real and tracked.

It is essential to understand that this baseline shouldn’t be model or dataset-specific, but rather business-case-specific. It should be a general benchmark for a given business case.

If I encounter a new dataset, with the same business objective, this benchmark should be a reliable reference point.

Why building a benchmark is important

Now that we’ve defined what a benchmark is, let’s dive into why I believe it’s worth spending an extra project week on the development of a strong benchmark.

Without a Benchmark you’re aiming for perfection — If you are working without a clear reference point any result will lose meaning. “My model has a MAE of 30.000” Is that good? IDK! Maybe with a simple mean you would get a MAE of 25.000. By comparing your model to a baseline, you can measure both performance and improvement.

Improves Communicating with Clients — Clients and business teams might not immediately understand the standard output of a model. However, by engaging them with simple baselines from the start, it becomes easier to demonstrate improvements later. In many cases benchmarks could come directly from the business in different shapes or forms.

Helps in Model Selection — A benchmark gives a starting point to compare multiple models fairly. Without it, you might waste time testing models that aren’t worth considering.

Model Drift Detection and Monitoring — Models can degrade over time. By having a benchmark you might be able to intercept drifts early by comparing new model outputs against past benchmarks and baselines.

Consistency Between Different Datasets — Datasets evolve. By having a fixed set of metrics and models you ensure that performance comparisons remain valid over time.

With a clear benchmark, every step in the model development will provide immediate feedback, making the whole process more intentional and data-driven.

How I would build a benchmark

I hope I’ve convinced you of the importance of having a benchmark. Now, let’s actually build one.

Let’s start from the business question we presented at the very beginning of this blog post:

I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn”

For simplicity, I’ll assume no additional business constraints, but in real-world scenarios, constraints often exist.

For this example, I am using this dataset . The data contains some attributes from a company’s customer basealong with their churn status.

Now that we have something to work on let’s build the benchmark:

1. Defining the metrics

We are dealing with a churn use case, in particular, this is a binary classification problem. Thus the main metrics that we could use are:

Precision — Percentage of correctly predicted churners among all predicted churners

Recall — Percentage of actual churners correctly identified

F1 score — Balances precision and recall

True Positives, False Positives, True Negative and False Negatives

These are some of the “simple” metrics that could be used to evaluate the output of a model.

However, it is not an exhaustive list, standard metrics aren’t always enough. In many use cases, it might be useful to build custom metrics.

Let’s assume that in our business case the customers labeled as “high likelihood to churn” are offered a discount. This creates:

A cost when offering the discount to a non-churning customer

A profit when retaining a churning customer

Following on this definition we can build a custom metric that will be crucial in our scenario:

# Defining the business case-specific reference metric
def financial_gain:
loss_from_fp = np.sum) * 250
gain_from_tp = np.sum) * 1000
return gain_from_tp - loss_from_fp

When you are building business-driven metrics these are usually the most relevant. Such metrics could take any shape or form: Financial goals, minimum requirements, percentage of coverage and more.

2. Defining the benchmarks

Now that we’ve defined our metrics, we can define a set of baseline models to be used as a reference.

In this phase, you should define a list of simple-to-implement model in their simplest possible setup. There is no reason at this state to spend time and resources on the optimization of these models, my mindset is:

If I had 15 minutes, how would I implement this model?

In later phases of the model, you can add mode baseline models as the project proceeds.

In this case, I will use the following models:

Random Model — Assigns labels randomly

Majority Model — Always predicts the most frequent class

Simple XGB

Simple KNN

import numpy as np
import xgboost as xgb
from sklearn.neighbors import KNeighborsClassifier

class BinaryMean:
@staticmethod
def run_benchmark:
np.random.seedreturn np.random.choice, p=.mean, 1 - df_train.mean])

class SimpleXbg:
@staticmethod
def run_benchmark:
model = xgb.XGBClassifiermodel.fit.drop, df_train)
return model.predict.drop)

class MajorityClass:
@staticmethod
def run_benchmark:
majority_class = df_train.modereturn np.full, majority_class)

class SimpleKNN:
@staticmethod
def run_benchmark:
model = KNeighborsClassifiermodel.fit.drop, df_train)
return model.predict.drop)

Again, as in the case of the metrics, we can build custom benchmarks.

Let’s assume that in our business case the the marketing team contacts every client who’s:

Over 50 y/o and

That is not active anymore

Following this rule we can build this model:

# Defining the business case-specific benchmark
class BusinessBenchmark:
@staticmethod
def run_benchmark:
df = df_test.copydf.loc= 0
df.loc= 1
return dfRunning the benchmark

To run the benchmark I will use the following class. The entry point is the method compare_with_benchmark that, given a prediction, runs all the models and calculates all the metrics.

import numpy as np

class ChurnBinaryBenchmark:
def __init__:
self.metrics = metrics
self.benchmark_models = benchmark_models

def compare_pred_with_benchmark:

output_metrics = {
'Prediction': self._calculate_metrics}
dct_benchmarks = {}

for model in self.benchmark_models:
dct_benchmarks= model.run_benchmarkoutput_metrics= self._calculate_metricsreturn output_metrics

def _calculate_metrics:
return {getattr: funcfor func in self.metrics}

Now all we need is a prediction. For this example, I made a quick feature engineering and some hyperparameter tuning.

The last step is just to run the benchmark:

binary_benchmark = ChurnBinaryBenchmarkres = binary_benchmark.compare_pred_with_benchmarkpd.DataFrameBenchmark metrics comparison | Image by Author

This generates a comparison table of all models across all metrics. Using this table, it is possible to draw concrete conclusions on the model’s predictions and make informed decisions on the following steps of the process.

Some drawbacks

As we’ve seen there are plenty of reasons why it is useful to have a benchmark. However, even though benchmarks are incredibly useful, there are some pitfalls to watch out for:

Non-Informative Benchmark — When the metrics or models are poorly defined the marginal impact of having a benchmark decreases. Always define meaningful baselines.

Misinterpretation by Stakeholders — Communication with the client is essential, it is important to state clearly what the metrics are measuring. The best model might not be the best on all the defined metrics.

Overfitting to the Benchmark — You might end up trying to create features that are too specific, that might beat the benchmark, but do not generalize well in prediction. Don’t focus on beating the benchmark, but on creating the best solution possible to the problem.

Change of Objective — Objectives defined might change, due to miscommunication or changes in plans. Keep your benchmark flexible so it can adapt when needed.

Final thoughts

Benchmarks provide clarity, ensure improvements are measurable, and create a shared reference point between data scientists and clients. They help avoid the trap of assuming a model is performing well without proof and ensure that every iteration brings real value.

They also act as a communication tool, making it easier to explain progress to clients. Instead of just presenting numbers, you can show clear comparisons that highlight improvements.

Here you can find a notebook with a full implementation from this blog post.
The post How To Build a Benchmark for Your Models appeared first on Towards Data Science.
#how #build #benchmark #your #models
How To Build a Benchmark for Your Models
I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with: They rarely have a clear idea of the project objective. This is one of the main obstacles data scientists face, especially now that Gen AI is taking over every domain. But let’s suppose that after some back and forth, the objective becomes clear. We managed to pin down a specific question to answer. For example: I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn” Well, now what? Easy, let’s start building some models! Wrong! If having a clear objective is rare, having a reliable benchmark is even rarer. In my opinion, one of the most important steps in delivering a data science project is defining and agreeing on a set of benchmarks with the client. In this blog post, I’ll explain: What a benchmark is, Why it is important to have a benchmark, How I would build one using an example scenario and Some potential drawbacks to keep in mind What is a benchmark? A benchmark is a standardized way to evaluate the performance of a model. It provides a reference point against which new models can be compared. A benchmark needs two key components to be considered complete: A set of metrics to evaluate the performance A set of simple models to use as baselines The concept at its core is simple: every time I develop a new model I compare it against both previous versions and the baseline models. This ensures improvements are real and tracked. It is essential to understand that this baseline shouldn’t be model or dataset-specific, but rather business-case-specific. It should be a general benchmark for a given business case. If I encounter a new dataset, with the same business objective, this benchmark should be a reliable reference point. Why building a benchmark is important Now that we’ve defined what a benchmark is, let’s dive into why I believe it’s worth spending an extra project week on the development of a strong benchmark. Without a Benchmark you’re aiming for perfection — If you are working without a clear reference point any result will lose meaning. “My model has a MAE of 30.000” Is that good? IDK! Maybe with a simple mean you would get a MAE of 25.000. By comparing your model to a baseline, you can measure both performance and improvement. Improves Communicating with Clients — Clients and business teams might not immediately understand the standard output of a model. However, by engaging them with simple baselines from the start, it becomes easier to demonstrate improvements later. In many cases benchmarks could come directly from the business in different shapes or forms. Helps in Model Selection — A benchmark gives a starting point to compare multiple models fairly. Without it, you might waste time testing models that aren’t worth considering. Model Drift Detection and Monitoring — Models can degrade over time. By having a benchmark you might be able to intercept drifts early by comparing new model outputs against past benchmarks and baselines. Consistency Between Different Datasets — Datasets evolve. By having a fixed set of metrics and models you ensure that performance comparisons remain valid over time. With a clear benchmark, every step in the model development will provide immediate feedback, making the whole process more intentional and data-driven. How I would build a benchmark I hope I’ve convinced you of the importance of having a benchmark. Now, let’s actually build one. Let’s start from the business question we presented at the very beginning of this blog post: I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn” For simplicity, I’ll assume no additional business constraints, but in real-world scenarios, constraints often exist. For this example, I am using this dataset . The data contains some attributes from a company’s customer basealong with their churn status. Now that we have something to work on let’s build the benchmark: 1. Defining the metrics We are dealing with a churn use case, in particular, this is a binary classification problem. Thus the main metrics that we could use are: Precision — Percentage of correctly predicted churners among all predicted churners Recall — Percentage of actual churners correctly identified F1 score — Balances precision and recall True Positives, False Positives, True Negative and False Negatives These are some of the “simple” metrics that could be used to evaluate the output of a model. However, it is not an exhaustive list, standard metrics aren’t always enough. In many use cases, it might be useful to build custom metrics. Let’s assume that in our business case the customers labeled as “high likelihood to churn” are offered a discount. This creates: A cost when offering the discount to a non-churning customer A profit when retaining a churning customer Following on this definition we can build a custom metric that will be crucial in our scenario: # Defining the business case-specific reference metric def financial_gain: loss_from_fp = np.sum) * 250 gain_from_tp = np.sum) * 1000 return gain_from_tp - loss_from_fp When you are building business-driven metrics these are usually the most relevant. Such metrics could take any shape or form: Financial goals, minimum requirements, percentage of coverage and more. 2. Defining the benchmarks Now that we’ve defined our metrics, we can define a set of baseline models to be used as a reference. In this phase, you should define a list of simple-to-implement model in their simplest possible setup. There is no reason at this state to spend time and resources on the optimization of these models, my mindset is: If I had 15 minutes, how would I implement this model? In later phases of the model, you can add mode baseline models as the project proceeds. In this case, I will use the following models: Random Model — Assigns labels randomly Majority Model — Always predicts the most frequent class Simple XGB Simple KNN import numpy as np import xgboost as xgb from sklearn.neighbors import KNeighborsClassifier class BinaryMean: @staticmethod def run_benchmark: np.random.seedreturn np.random.choice, p=.mean, 1 - df_train.mean]) class SimpleXbg: @staticmethod def run_benchmark: model = xgb.XGBClassifiermodel.fit.drop, df_train) return model.predict.drop) class MajorityClass: @staticmethod def run_benchmark: majority_class = df_train.modereturn np.full, majority_class) class SimpleKNN: @staticmethod def run_benchmark: model = KNeighborsClassifiermodel.fit.drop, df_train) return model.predict.drop) Again, as in the case of the metrics, we can build custom benchmarks. Let’s assume that in our business case the the marketing team contacts every client who’s: Over 50 y/o and That is not active anymore Following this rule we can build this model: # Defining the business case-specific benchmark class BusinessBenchmark: @staticmethod def run_benchmark: df = df_test.copydf.loc= 0 df.loc= 1 return dfRunning the benchmark To run the benchmark I will use the following class. The entry point is the method compare_with_benchmark that, given a prediction, runs all the models and calculates all the metrics. import numpy as np class ChurnBinaryBenchmark: def __init__: self.metrics = metrics self.benchmark_models = benchmark_models def compare_pred_with_benchmark: output_metrics = { 'Prediction': self._calculate_metrics} dct_benchmarks = {} for model in self.benchmark_models: dct_benchmarks= model.run_benchmarkoutput_metrics= self._calculate_metricsreturn output_metrics def _calculate_metrics: return {getattr: funcfor func in self.metrics} Now all we need is a prediction. For this example, I made a quick feature engineering and some hyperparameter tuning. The last step is just to run the benchmark: binary_benchmark = ChurnBinaryBenchmarkres = binary_benchmark.compare_pred_with_benchmarkpd.DataFrameBenchmark metrics comparison | Image by Author This generates a comparison table of all models across all metrics. Using this table, it is possible to draw concrete conclusions on the model’s predictions and make informed decisions on the following steps of the process. Some drawbacks As we’ve seen there are plenty of reasons why it is useful to have a benchmark. However, even though benchmarks are incredibly useful, there are some pitfalls to watch out for: Non-Informative Benchmark — When the metrics or models are poorly defined the marginal impact of having a benchmark decreases. Always define meaningful baselines. Misinterpretation by Stakeholders — Communication with the client is essential, it is important to state clearly what the metrics are measuring. The best model might not be the best on all the defined metrics. Overfitting to the Benchmark — You might end up trying to create features that are too specific, that might beat the benchmark, but do not generalize well in prediction. Don’t focus on beating the benchmark, but on creating the best solution possible to the problem. Change of Objective — Objectives defined might change, due to miscommunication or changes in plans. Keep your benchmark flexible so it can adapt when needed. Final thoughts Benchmarks provide clarity, ensure improvements are measurable, and create a shared reference point between data scientists and clients. They help avoid the trap of assuming a model is performing well without proof and ensure that every iteration brings real value. They also act as a communication tool, making it easier to explain progress to clients. Instead of just presenting numbers, you can show clear comparisons that highlight improvements. Here you can find a notebook with a full implementation from this blog post. The post How To Build a Benchmark for Your Models appeared first on Towards Data Science. #how #build #benchmark #your #models
TOWARDSDATASCIENCE.COM
How To Build a Benchmark for Your Models
I’ve been working as a data science consultant for the past three years, and I’ve had the opportunity to work on multiple projects across various industries. Yet, I noticed one common denominator among most of the clients I worked with: They rarely have a clear idea of the project objective. This is one of the main obstacles data scientists face, especially now that Gen AI is taking over every domain. But let’s suppose that after some back and forth, the objective becomes clear. We managed to pin down a specific question to answer. For example: I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn” Well, now what? Easy, let’s start building some models! Wrong! If having a clear objective is rare, having a reliable benchmark is even rarer. In my opinion, one of the most important steps in delivering a data science project is defining and agreeing on a set of benchmarks with the client. In this blog post, I’ll explain: What a benchmark is, Why it is important to have a benchmark, How I would build one using an example scenario and Some potential drawbacks to keep in mind What is a benchmark? A benchmark is a standardized way to evaluate the performance of a model. It provides a reference point against which new models can be compared. A benchmark needs two key components to be considered complete: A set of metrics to evaluate the performance A set of simple models to use as baselines The concept at its core is simple: every time I develop a new model I compare it against both previous versions and the baseline models. This ensures improvements are real and tracked. It is essential to understand that this baseline shouldn’t be model or dataset-specific, but rather business-case-specific. It should be a general benchmark for a given business case. If I encounter a new dataset, with the same business objective, this benchmark should be a reliable reference point. Why building a benchmark is important Now that we’ve defined what a benchmark is, let’s dive into why I believe it’s worth spending an extra project week on the development of a strong benchmark. Without a Benchmark you’re aiming for perfection — If you are working without a clear reference point any result will lose meaning. “My model has a MAE of 30.000” Is that good? IDK! Maybe with a simple mean you would get a MAE of 25.000. By comparing your model to a baseline, you can measure both performance and improvement. Improves Communicating with Clients — Clients and business teams might not immediately understand the standard output of a model. However, by engaging them with simple baselines from the start, it becomes easier to demonstrate improvements later. In many cases benchmarks could come directly from the business in different shapes or forms. Helps in Model Selection — A benchmark gives a starting point to compare multiple models fairly. Without it, you might waste time testing models that aren’t worth considering. Model Drift Detection and Monitoring — Models can degrade over time. By having a benchmark you might be able to intercept drifts early by comparing new model outputs against past benchmarks and baselines. Consistency Between Different Datasets — Datasets evolve. By having a fixed set of metrics and models you ensure that performance comparisons remain valid over time. With a clear benchmark, every step in the model development will provide immediate feedback, making the whole process more intentional and data-driven. How I would build a benchmark I hope I’ve convinced you of the importance of having a benchmark. Now, let’s actually build one. Let’s start from the business question we presented at the very beginning of this blog post: I want to classify my customers into two groups according to their probability to churn: “high likelihood to churn” and “low likelihood to churn” For simplicity, I’ll assume no additional business constraints, but in real-world scenarios, constraints often exist. For this example, I am using this dataset (CC0: Public Domain). The data contains some attributes from a company’s customer base (e.g., age, sex, number of products, …) along with their churn status. Now that we have something to work on let’s build the benchmark: 1. Defining the metrics We are dealing with a churn use case, in particular, this is a binary classification problem. Thus the main metrics that we could use are: Precision — Percentage of correctly predicted churners among all predicted churners Recall — Percentage of actual churners correctly identified F1 score — Balances precision and recall True Positives, False Positives, True Negative and False Negatives These are some of the “simple” metrics that could be used to evaluate the output of a model. However, it is not an exhaustive list, standard metrics aren’t always enough. In many use cases, it might be useful to build custom metrics. Let’s assume that in our business case the customers labeled as “high likelihood to churn” are offered a discount. This creates: A cost ($250) when offering the discount to a non-churning customer A profit ($1000) when retaining a churning customer Following on this definition we can build a custom metric that will be crucial in our scenario: # Defining the business case-specific reference metric def financial_gain(y_true, y_pred): loss_from_fp = np.sum(np.logical_and(y_pred == 1, y_true == 0)) * 250 gain_from_tp = np.sum(np.logical_and(y_pred == 1, y_true == 1)) * 1000 return gain_from_tp - loss_from_fp When you are building business-driven metrics these are usually the most relevant. Such metrics could take any shape or form: Financial goals, minimum requirements, percentage of coverage and more. 2. Defining the benchmarks Now that we’ve defined our metrics, we can define a set of baseline models to be used as a reference. In this phase, you should define a list of simple-to-implement model in their simplest possible setup. There is no reason at this state to spend time and resources on the optimization of these models, my mindset is: If I had 15 minutes, how would I implement this model? In later phases of the model, you can add mode baseline models as the project proceeds. In this case, I will use the following models: Random Model — Assigns labels randomly Majority Model — Always predicts the most frequent class Simple XGB Simple KNN import numpy as np import xgboost as xgb from sklearn.neighbors import KNeighborsClassifier class BinaryMean(): @staticmethod def run_benchmark(df_train, df_test): np.random.seed(21) return np.random.choice(a=[1, 0], size=len(df_test), p=[df_train['y'].mean(), 1 - df_train['y'].mean()]) class SimpleXbg(): @staticmethod def run_benchmark(df_train, df_test): model = xgb.XGBClassifier() model.fit(df_train.select_dtypes(include=np.number).drop(columns='y'), df_train['y']) return model.predict(df_test.select_dtypes(include=np.number).drop(columns='y')) class MajorityClass(): @staticmethod def run_benchmark(df_train, df_test): majority_class = df_train['y'].mode()[0] return np.full(len(df_test), majority_class) class SimpleKNN(): @staticmethod def run_benchmark(df_train, df_test): model = KNeighborsClassifier() model.fit(df_train.select_dtypes(include=np.number).drop(columns='y'), df_train['y']) return model.predict(df_test.select_dtypes(include=np.number).drop(columns='y')) Again, as in the case of the metrics, we can build custom benchmarks. Let’s assume that in our business case the the marketing team contacts every client who’s: Over 50 y/o and That is not active anymore Following this rule we can build this model: # Defining the business case-specific benchmark class BusinessBenchmark(): @staticmethod def run_benchmark(df_train, df_test): df = df_test.copy() df.loc[:,'y_hat'] = 0 df.loc[(df['IsActiveMember'] == 0) & (df['Age'] >= 50), 'y_hat'] = 1 return df['y_hat'] Running the benchmark To run the benchmark I will use the following class. The entry point is the method compare_with_benchmark() that, given a prediction, runs all the models and calculates all the metrics. import numpy as np class ChurnBinaryBenchmark(): def __init__( self, metrics = [], benchmark_models = [], ): self.metrics = metrics self.benchmark_models = benchmark_models def compare_pred_with_benchmark( self, df_train, df_test, my_predictions, ): output_metrics = { 'Prediction': self._calculate_metrics(df_test['y'], my_predictions) } dct_benchmarks = {} for model in self.benchmark_models: dct_benchmarks[model.__name__] = model.run_benchmark(df_train = df_train, df_test = df_test) output_metrics[f'Benchmark - {model.__name__}'] = self._calculate_metrics(df_test['y'], dct_benchmarks[model.__name__]) return output_metrics def _calculate_metrics(self, y_true, y_pred): return {getattr(func, '__name__', 'Unknown') : func(y_true = y_true, y_pred = y_pred) for func in self.metrics} Now all we need is a prediction. For this example, I made a quick feature engineering and some hyperparameter tuning. The last step is just to run the benchmark: binary_benchmark = ChurnBinaryBenchmark( metrics=[f1_score, precision_score, recall_score, tp, tn, fp, fn, financial_gain], benchmark_models=[BinaryMean, SimpleXbg, MajorityClass, SimpleKNN, BusinessBenchmark] ) res = binary_benchmark.compare_pred_with_benchmark( df_train=df_train, df_test=df_test, my_predictions=preds, ) pd.DataFrame(res) Benchmark metrics comparison | Image by Author This generates a comparison table of all models across all metrics. Using this table, it is possible to draw concrete conclusions on the model’s predictions and make informed decisions on the following steps of the process. Some drawbacks As we’ve seen there are plenty of reasons why it is useful to have a benchmark. However, even though benchmarks are incredibly useful, there are some pitfalls to watch out for: Non-Informative Benchmark — When the metrics or models are poorly defined the marginal impact of having a benchmark decreases. Always define meaningful baselines. Misinterpretation by Stakeholders — Communication with the client is essential, it is important to state clearly what the metrics are measuring. The best model might not be the best on all the defined metrics. Overfitting to the Benchmark — You might end up trying to create features that are too specific, that might beat the benchmark, but do not generalize well in prediction. Don’t focus on beating the benchmark, but on creating the best solution possible to the problem. Change of Objective — Objectives defined might change, due to miscommunication or changes in plans. Keep your benchmark flexible so it can adapt when needed. Final thoughts Benchmarks provide clarity, ensure improvements are measurable, and create a shared reference point between data scientists and clients. They help avoid the trap of assuming a model is performing well without proof and ensure that every iteration brings real value. They also act as a communication tool, making it easier to explain progress to clients. Instead of just presenting numbers, you can show clear comparisons that highlight improvements. Here you can find a notebook with a full implementation from this blog post. The post How To Build a Benchmark for Your Models appeared first on Towards Data Science.
·189 Views