Metadata-Version: 2.1
Name: fuzzymap
Version: 1.0.4
Summary: Python dictionary with a FUZZY key-matching opportunity
Home-page: https://github.com/pysnippet/fuzzymap
Author: Artyom Vancyan
Author-email: artyom@pysnippet.org
Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# FuzzyMap <img src="https://github.com/pysnippet.png" align="right" height="64" />

[![PyPI](https://img.shields.io/pypi/v/fuzzymap.svg)](https://pypi.org/project/fuzzymap/)
[![License](https://img.shields.io/pypi/l/fuzzymap.svg)](https://github.com/pysnippet/fuzzymap/blob/master/LICENSE)

## What is FuzzyMap?

`FuzzyMap` is a polymorph Python dictionary. This kind of dictionary returns the value of the exact key if there is such
a key. Otherwise, it will return the value of the most similar key satisfying the given ratio. The same mechanism works
when setting a new or replacing an old key in the dictionary. If the key is not found and does not match any of the keys
by the given ratio, it returns `None`.

## How does it work?

Suppose you have scraped data from multiple sources that do not have a unique identifier, and you want to compare the
values of the items having the same identifiers. Sure there will be found a field that mostly has an equivalent value
at each source. And you can use that field to identify the corresponding items of other sources' data.

## Let's look at the following example

There is a live data parser that collects the coefficients of football matches from different bookmakers at once, then
calculates and logs the existing forks. Many bookmakers change the name of the teams to be incomparable with names on
other sites.

```python
from fuzzymap import FuzzyMap

src1 = {
    'Rapid Wien - First Vienna': {'w1': 1.93, 'x': 2.32, 'w2': 7.44},
    'Al Bourj - Al Nejmeh': {'w1': 26, 'x': 11.5, 'w2': 1.05},
    # hundreds of other teams' data
}

src2 = FuzzyMap({
    'Bourj FC - Nejmeh SC Beirut': {'w1': 32, 'x': 12, 'w2': 1.05},
    'SK Rapid Wien - First Vienna FC': {'w1': 1.97, 'x': 2.3, 'w2': 8.2},
    # hundreds of other teams' data
})

for team, coefs1 in src1.items():
    coefs2 = src2[team]

    # coefs1 = {"w1": 1.93, "x": 2.32, "w2": 7.44}
    # coefs2 = {"w1": 1.97, "x": 2.3, "w2": 8.2}
    handle_fork(coefs1, coefs2)
```

With a human brain, it is not difficult to identify that "Rapid Wien - First Vienna" and "SK Rapid Wien - First Vienna
FC" matches are the same. In the above example, the `src2` is defined as `FuzzyMap`, it makes its keys fuzzy-matchable,
and we can get an item corresponding to the key of `src1`. See the below graph demonstrating the associations of
`FuzzyMap` keys.

<p align="center">
  <img src="https://user-images.githubusercontent.com/44609997/205437148-4fb3d7bd-1fe9-4ce8-8321-d7aef9488e37.svg" height="400" />
</p>

## License

Copyright (C) 2022 Artyom Vancyan. [GPLv2](LICENSE)
