Skip to content

Hashing of strings does not yield the same hash across JVM and CLR. #239

@parth-io

Description

@parth-io

Currently, Clojure.Runtime hashes strings via the Murmur3.HashString (see source here).

While this implementation ensures deterministic hashing for strings on CLR (Clojure.Runtime commit here and ClojureCLR issue here), similar to on JVM, the hashes are different on the platforms.

The initial commits to ClojureCLR followed the same approach as Clojure's implementation

JVM Commit ClojureCLR Commit
clojure/clojure@dff9600 clojure/clojure-clr@87aeea3
clojure/clojure@50c9095 clojure/clojure-clr@1df02fe

But this was reverted in clojure/clojure-clr@50e0538 due to C#'s GetHashCode not being deterministic.

A fix to ensure both determinism and JVM-comptability could be once again follow the same approach as Clojure's implementation - in this case, to use the same algorithm as on the JVM. This will not have the same caching that motivated the switch in clojure/clojure@50c9095:

diff --git a/Clojure/Lib/Util.cs b/Clojure/Lib/Util.cs
index 06341660..c32a2a48 100644
--- a/Clojure/Lib/Util.cs
+++ b/Clojure/Lib/Util.cs
@@ -44,11 +44,25 @@ namespace clojure.lang

             String s = o as string;
             if (s != null)
-                return Murmur3.HashString(s);
+                return Murmur3.HashInt(JavaStringHashCode(s));

             return o.GetHashCode();
         }

+        /// <summary>
+        /// Java's String.hashCode(): s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1].
+        /// Used to make hasheq for strings match JVM Clojure.
+        /// </summary>
+        private static int JavaStringHashCode(string s)
+        {
+            int h = 0;
+            for (int i = 0; i < s.Length; i++)
+            {
+                h = unchecked(31 * h + s[i]);
+            }
+            return h;
+        }
+
         [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming", "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "dohasheq")]
         private static int dohasheq(IHashEq ihe)
         {

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions