The output of both JA3 and JA4 is a hash of the original fingerprint.
A hash is a mathematical function that takes an input (like text or data) and produces a fixed-size string of characters, which acts like a digital fingerprint of that input. The key property of a hash is that it's one-way, you can't reconstruct the original input from the hash value, and even a tiny change in the input produces a completely different hash output.
JA3/JA4 provide hashes primarily for three practical reasons:
- Storage Efficiency: A hash provides a compact, fixed-length identifier. Instead of storing all TLS parameters (which can be quite lengthy), systems can store just the hash value. This is particularly important for high-volume logging and database operations.
- Quick Comparisons: Hash comparisons are computationally simpler and faster than comparing full sets of TLS parameters. When processing large volumes of network traffic, this performance difference becomes significant.
- Easy Sharing: Hashes provide a standardized way to share fingerprints across teams and organizations. Instead of needing to agree on a complex format for sharing the raw parameters, teams can simply exchange hash values that identify specific client behaviors.
Drawbacks of hashing
The core issue with hashing in JA3/JA4 is that it creates an information bottleneck that makes analysis more difficult. When TLS parameters are hashed into a single value, you lose the ability to:
- Understand which specific parameters contributed to a fingerprint match or mismatch. This makes debugging and tuning more challenging since you can't easily identify which TLS features are causing classification issues.
- Perform partial matching or similarity analysis between fingerprints. Two clients might be very similar except for a single parameter, but the hash makes them appear completely different.
- Create flexible detection rules that target specific TLS behaviors. Instead of being able to write rules like "match clients that support these specific cipher suites," you're forced to maintain lists of complete hash values.
- Evolve the fingerprinting approach without breaking existing detections. Adding new TLS parameters to the fingerprint calculation changes all the hash values, requiring complete recalibration of detection systems.
- Reverse engineer what client software might have generated a particular fingerprint, since the original parameters are obscured.
Peakhour takes the approach of keeping the full TLS Fingerprint rather than hashing. This allows full flexibility when matching and blocking, while enabling easy additions down the track.