Total Hits

3,331

Wednesday, March 9, 2011

Hashing -- Hash Function

A hash function is a well-defined procedure or mathematical function that converts a large amount of possibly varying sizes of data within a certain small, usually a single integer, which could serve as an index of an array (associative array cf.). The values returned by a hash function are called hash values, hash codes, hash sums, or simply hash checksum.

Hash functions are mostly used to speed lookup table or data comparison tasks, such as finding objects in a database, to identify duplicate records or similar in a large file, finding similar stretches in DNA sequences , and so on.

A hash function may map two or more keys to the same hash value. In many applications, it is desirable to minimize the occurrence of these collisions, which means that the hash function to map keys to hash values as evenly as possible. Depending on the application, other properties may be needed as well. Although the idea was conceived in 1950, the design of good hash functions is still a topic of active research.

Hash functions are related to (and often confused with) checksums, check digits, fingerprints, randomization features, error-correcting codes and cryptographic hash functions. Although these concepts overlap to some extent, each has its own uses and requirements and has been designed and optimized in different ways. The database maintained by the American HashKeeper National Drug Intelligence Center, for example, is more aptly described as a catalog of fingerprints file hash values.

Hash functions are by definition and implementation of pseudo random number generators (PRNG). This generalization of its unacceptable that the Hash function performance and compare hash functions can be achieved by treating the hash function as PRNGs.

Techniques such as Poisson distribution analysis can be used to analyze the collision rates of different hash functions for different sets of data. In general there is a theoretical hash function called hash function is perfect for each set of data. Perfect hash function by definition states that conflicts will not occur any meaningful returns hash values arise from the different elements of the group. In reality very difficult to find a perfect hash function practical applications of perfect hashing and its variant minimal perfect hashing quite limited. Practice is generally recognized that a perfect hash function is a hash function that produces the least amount of conflicts for a certain set of data.

The problem is that there are so many permutations of data types, some other very random, which contains a high degree of patterns that its hard to generalize hash function for all data types, or even for specific data types. Anyone can do it through trial and error to find the hash function that best suites their needs. Some analyze select Dimensions Hash functions are:

*
Distribution data

This is a measure of how well the hash function distributes hash values of elements within a set of data. Analysis of this measure requires knowing the number of collisions that occur with data to define the meaning of non-unique hash values, if chaining is used to resolve the conflict average length of chains (which would in theory be the average of the collision count of every bucket) is analyzed, the amount of grouping of values hash range must be analyzed.
*
Hash Function Efficiency

This is a measure of the effectiveness of the hash function generates hash values for elements within a set of data. When the algorithms that contain hash functions are analyzed assumption is that hash functions typically have a complexity of O (1), which is why, look ups of Hash-table data should be the "average O (1) complexity", which looks like a belly Data associative containers like maps (usually implemented as red black trees) should be complex (logn) O.

Hash function should be a theory of action is very fast, stable and deterministic. Hash function does not always lend itself to be of O (1) complex, but generally linear crossing through a string of data to be hashed is fast so that the hash functions are typically used on primary keys, which by definition are supposed to be the same associative much smaller than the large blocks of data implies that the entire operation should be faster to some degree unstable.

Hash functions in this article are known as simple hash functions. They are typically used for data hashing (string hashing). They are used to create keys that are used associative containers such as hash tables. These hash functions are cryptographically secure, they can easily be reversed in different combinations of data can be found easily produce the same hash values for each combination of data.

Have simple example of hashing... for password matching

Different types of Hashing

Hashing as a tool for putting on one or auxiliary volumes of data with the identifier contains a variety of applications in the real world. Below are some common uses of hash function.

*
String Hashing

Used in the area of data to storage. Especially in the indexing of data and instruments as back end associative structures (ie: hash tables)
*
Cryptographic Hashing

Used for the inspection report / user and confirmation. A strong cryptographic hash function has the property of being very difficult to eliminate the hash result and thus reproduce the original piece of data. cryptographic hash functions are used for user's password hash and hash of the password and stored on the system itself rather than being kept secret. performance hash cryptographic also seen as working Malena compression, being able to represent large numbers of data and ID signs, it is necessary to see whether or not data has been tampered with, and can also be used as data I gnes one to confirm the validity of documents through Other cryptographic.
*
Geometric Hashing

This form of the Shing is used in the field of computer vision for recognition of objects classified in arbitrary scenes.

The initial process involves selecting the region or object of interest. From there using affine invariant algorithms identifying feature such as detector Harris corner (HCD), Scale-Invariant Feature change (reduce) or speeded-Up Robust Features (surf), a set of articles affine is extracted which represent said he saw something or region . This setting is sometimes total-factor or constellation of features. Depending on the nature of the elements detected by the type of device or location yet to be classified could happen to match the two stars of the article although it may be minor differences (such as missing or outlier features) between two sets. The star is then said to be classified set of features.

A hash value is computed from the constellation of features. This is usually done by defining the original position of the hash values which are intended to survive - in this case the hash value is a multidimensional normalized value for the same position. Including the process of computing the hash value to another process determines the distance between two hash values are needed - distance required to measure A instead of deterministic equality operator due to potential differences in terms of the star that went into calculating hash value. Also due to the non-linear state of the environment as a simple Euclidean distance metric is primarily effective as a result of the process of automatically determining a metric distance in a certain position has been a field of research work in academia.

Example of geometric models Shing not include classification of different types of vehicles, with the aim of re-recognition in arbitrary scenes. level of detection can range from simply examining the car, with a specific model of car, and drive specific.
*
Bloom Filters

A Bloom filter allows for a "state of existence" of a large set of possible values represented by a very small piece of memory the size of the total values. In computer science this is known as a membership query with a basic concept in associative containers.

Bloom Filter achieves this through the use of many different hash function and also to allow the query results to the membership of the existence of special value to be some possibility of errors. The Bloom filter provides a guarantee that for any issue of membership there are never any false negatives, however there may be false positive. The possibility of false positive can be controlled by varying the size of the table used for the Bloom filter and also for different numbers of hash function.

Subsequent research done in the area of operations and hash tables and Bloom filters and Mitzenmacher et al. indicate that for the use of more practical for such constructs, the entropy of the data being hashed contributes to the entropy of the function hash, this leads to more on the results of the theory that a complete Bloom optimal filter (the one that gives lower false positive likely to be color table or vice versa) to provide user defined false positive potential can be constructed to operate at two different hash too many such pairwise independent hash function, increasing the efficiency of questions of membership.

Bloom filters are commonly found in applications such as spell checkers, string matching algorithms, network packet analysis tools and web / internet caches.

No comments:

Post a Comment