Conversation
gpleiss
left a comment
There was a problem hiding this comment.
Generally good; a few changes
| stereographic projection onto a unit sphere, and :math:`(b_0, b_1)` are learned | ||
| mixture weights (via softmax, so :math:`b_0 + b_1 = 1`). | ||
|
|
||
| This kernel was proposed in `We Still Don't Understand High-Dimensional Bayesian Optimization`. |
There was a problem hiding this comment.
Nit: make this into a link, rather than having the arxiv link separate.
| self.bounds = bounds | ||
|
|
||
| # Learned mixture coefficients: softmax([raw_coeffs]) -> [constant, linear] | ||
| self.raw_coeffs = nn.Parameter(torch.zeros(*self.batch_shape, 2)) |
There was a problem hiding this comment.
We should have getters and setters for the non-raw values, so that they can be initialized appropriately (e.g. self.coeffs = <blah>)
| self.raw_coeffs = nn.Parameter(torch.zeros(*self.batch_shape, 2)) | ||
|
|
||
| # Global lengthscale: sigmoid(raw_glob_ls) * max_sq_norm | ||
| self.raw_glob_ls = nn.Parameter(torch.zeros(*self.batch_shape, 1)) |
| ard_num_dims: int | None = None, | ||
| lengthscale_prior: Prior | None = None, | ||
| lengthscale_constraint: Interval | None = None, | ||
| normalize_lengthscale: bool = False, |
There was a problem hiding this comment.
Maybe we should set it to true? And we can add a comment that it was set to False in the original paper.
| return torch.ones(x1.shape[:-1], dtype=x1.dtype, device=x1.device) | ||
|
|
||
| if self.normalize_lengthscale: # Enforce L2 norm = 1 | ||
| lengthscale = torch.softmax(self.lengthscale, dim=-1).sqrt() |
There was a problem hiding this comment.
Perhaps it would be better if we just divided the lengthscale by its norm, so as not to distort the geometry of the lengthscale.
There was a problem hiding this comment.
I tried this on a small example (SVM benchmark in high-dim BO): dividing the lengthscale by its norm is about 10x slower than using the softmax, and performance is not necessarily better. This was just a small local test with a few iterations (20-30); I have not yet double-checked this on the cluster for more problems/seeds/iterations. What do you think, worth investigating more or do you already have a strong preference on which reparameterization to go for?
| projected = project_onto_unit_sphere(x) | ||
| self.assertEqual(projected.shape, torch.Size([10, 6])) | ||
| norms = projected.norm(dim=-1) | ||
| self.assertAllClose(norms, torch.ones(10), rtol=1e-5, atol=1e-5) |
There was a problem hiding this comment.
Can you add a test that inputs that are on the unit sphere get an effectively identity mapping, to ensure we did the inverse stereographic projection correctly?
| kernel = SphericalLinearKernel(bounds=UNIT_BOUNDS_3D, lengthscale_prior=NormalPrior(0, 1)) | ||
| pickle.loads(pickle.dumps(kernel)) | ||
|
|
||
| def test_consistency_square_vs_rectangular(self): |
| def test_pickle_with_prior(self): | ||
| """Kernel with prior should survive pickle round-trip.""" | ||
| kernel = SphericalLinearKernel(bounds=UNIT_BOUNDS_3D, lengthscale_prior=NormalPrior(0, 1)) | ||
| pickle.loads(pickle.dumps(kernel)) |
| bounds = torch.tensor([[0.0, 0.0], [1.0, 1.0]]) | ||
| kernel = SphericalLinearKernel(bounds=bounds) | ||
| loaded = pickle.loads(pickle.dumps(kernel)) | ||
| self.assertAllClose(loaded.bounds, bounds) |
| """Should accept valid priors and reject invalid ones.""" | ||
| SphericalLinearKernel(bounds=UNIT_BOUNDS_3D, lengthscale_prior=None) | ||
| SphericalLinearKernel(bounds=UNIT_BOUNDS_3D, lengthscale_prior=NormalPrior(0, 1)) | ||
| self.assertRaises(TypeError, SphericalLinearKernel, UNIT_BOUNDS_3D, lengthscale_prior=1) |
|
I tried to address all of your comments as best as possible, and left a longer reply for the reparameterization question. When playing around with this model for the |
No description provided.