๐ŸƒExperiment 3 - Customized Optimization for A Regression Problem

It was a Sunday in early autumn, and the air was filled with a delightful mix of fragrant scents. Lady H. was enjoying afternoon tea on her balcony as her assistants arrived one by one to discuss garden matters, each bringing a fresh aroma. Everything seemed ordinary until Diana walked in.

"Diana! Youโ€™re wearing perfume today!? I thought youโ€™d never do that!" Lady H. exclaimed, surprised, as Diana was known for her sensitive nose and rarely wore perfume. With a broad smile and eyes sparkling, Diana replied, "The Garden Market gave us a 99% discount on the latest summer fragrance. Iโ€™ve never seen prices that low, so I decided to try itโ€”and it brings me so much joy!"

"Oh, Iโ€™m glad to hear that! ... Wait a minute, why is the Garden Market giving away perfume?" Lady H. asked, puzzled. Diana explained, "Remember how we harvested three times more summer flowers this year? The Garden Market produced far more perfume than they could sell, so they had to have a clearance sale." Lady H. thought for a moment, then said, "I see. Hmm... we should do some sales forecasting before manufacturing next time."

To forecast Garden Market's sales is a regression problem ๐Ÿ˜‰.

Baseline Forecast

To start, Lady H. used the default LGBM to perform a baseline forecast, achieving an R2 score of 0.884 in 34.3 seconds. This is a solid result, as previously notedโ€”the closer the R2 score is to 1 on the test data, the better.

๐ŸŒป Look into Sales Baseline details >>

FLAML Customized

In this experiment, lady H. wanted to test with not only customized learner but also customized objective function. The learner is still LGBM.

Below displays the distributions of the target (Sales) in both training and testing data.

The two datasets have nearly identical distributions, and the shape resembles a combination of two normal distributions, which suggests it could be modeled as a mixture of Gaussians or a Gaussian Mixture Model (GMM). Interestingly, however, Lady H. achieved better performance by using "poisson" as the objective, which assumes the target follows a Poisson Distribution.

Then she got 0.982 R2 testing score in 300 seconds with above settings, so there is an improvement in comparison with the baseline result.

After that, she wanted to try out her self-written objective function. But the challenge is, FLAML is using estimators' built-in config, and LGBM's customized objective function needs users to specify grad and hess ๐Ÿค”

  • grad is the value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

  • hess is the value of the second order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

If you have any tip on how to calculate grad and hess well, welcome to share it here, Lady H. will be more than happy to learn. But anyway, she found the formulas for some loss functions, such as "fair loss" as the code shown below.

The customized objective function achieved 0.964 R2 testing score in 300 seconds, a bit lower than "poisson" objective function.

๐ŸŒป Look into FLAML experiment details >>

Optuna Customized

The objective function for Optuna is similar to the previous Optuna experiment. Lady H. aims to answer the question left unresolved in the earlier experiment: whether the Optuna pruner performs better without using cross-validation. As shown in the code below, users need to call LightGBMPruningCallback() to create the pruning callback used in LGBM:

๐ŸŒป Look into Optuna experiment details >>

Without applying cross validation, the overall time cost is definitely reduced, and the output is showing prunning was taking effect!

Table 1.6 summarized the performance in this experiment:

Looking at all these experiments, FLAML appears to be better overall. However, it doesn't mean Optuna is worse in every aspect.

In the code, you may have noticed that Lady H. generated some visualization, which provides more insights of Optuna's hyperparameter tuning. For example,

  • Parameter importance plot shows an overall view of the parameters' impact on model's validation performance:

  • Slice plot shows the relationship between each hyperparameter, objective value and the number of trials:

  • Contour plot looks into each hyperparameter pairs:

  • Intermediate plot can interactively show you the intermediate value of each trial:

Comparing with FLAML, Optuna has a better user experience in deep learning, and in the next experiment, we will bring you to the experiment of deep learning HPO!

Last updated