grad = (1 / m) * sum((h - y) .* X) can be instead of X' * (h - y) / m Maybe the J also can be optimized in this way.
grad = (1 / m) * sum((h - y) .* X) can be instead of X' * (h - y) / m
Maybe the J also can be optimized in this way.