Metadata-Version: 2.1
Name: SyncAlgo
Version: 0.1.2
Summary: Example package for sync
Author-email: Weiwen Zhao <ywr1774@gmail.com>
License: Copyright (c) 2018 The Python Packaging Authority
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/KKiriri/Sync-Package
Project-URL: Bug Tracker, https://github.com/KKiriri/Sync-Package/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: <3.9,>=3.5
Description-Content-Type: text/markdown
License-File: LICENSE

# Arima Sync Package
This is brief user guide for Sync lib. <br />
Please make sure the directories are properly set up before calling the APIs.

### Directories

- data: 
  Data folder contains the input "data_file". The input "data_file" is a csv file.
- output: generated individual level dataframe is saved here in csv format. With name     "individual_<model_name>.csv"
- src: where the main scripts are

### Scripts

- utils.py: a collection of basic filing operations used by serveral scripts
- utils_function.py: a collection of basic mathemtical operations used by several scripts
- functions.py: the key algorithms functions of SynC
- paths.py: store all path-related variables 

### How to generate data
- There are three input parameters for generating the individual data. 
- 1. "data_file" contains aggregation level feature data. 
- 2. "init_features" and "feature_list" are in format of list of list. Every list in "init_features" and "feature_list" should represents one catagory of feature. e.g: [['AG_20_to_40','AG_40_to_60','AG_60_to_80','AG_80_to_100'], ['Gender']]. Here the age feature is divided into four columns in one list. They represents one type of feature with multually exclusive property, therefore they are combined in one list. While another feature gender has only one column representing "male" or "female"(gender at the demographic level), as a binary option. 
- 3. "agg_units" will be used when sync the individual data. The "agg_units" should be a single csv files contains two columns. First column contains the aggregation name, for instance, postal code "A0A1A0". Second column must be an interger represents the unit in that aggregation units. 
"A0A1A0, 65" means the number of population in postal code area A0A1A0 is 65.

- Other parameters:
- "model_name", "path" are customized parameters. "model_name" specify the training models's name. "path" is the location for the input data files "data_file" and "agg_units".
- "cumulative" set True to training and sync in a cumulative way. False to non-cumulative. The cumulative means every feature in "feature_list" is trained and generated by order. The subsequent feature in the list will be trained based on all previous feature data.

- Please refer to run.py as an example of running Sync APIs. 

### Note
- 1. Make sure the input data file "data_file" contains all of the data columns you want to train and predict. Null values in csv file should be avoided to assure the model has proper results.
- 2. Specify the "init_features" and "feature_list" using the column names in csv files. 
- 3. Currently the algorithm does not support Python 3.9/3.10.
