# Arima Sync Package
This is brief user guide for Sync lib. <br />
Please make sure the directories are properly set up before calling the APIs.

### Directories

- data: 
  Data folder contains the input "data_file". The input "data_file" is a csv file.
- output: generated individual level dataframe is saved here in csv format. With name     "individual_<model_name>.csv"
- src: where the main scripts are

### Scripts

- utils.py: a collection of basic filing operations used by serveral scripts
- utils_function.py: a collection of basic mathemtical operations used by several scripts
- functions.py: the key algorithms functions of SynC
- paths.py: store all path-related variables 

### How to generate data
- There are three input parameters for generating the individual data. 
- 1. "data_file" contains aggregation level feature data. 
- 2. "init_features" and "feature_list" are in format of list of list. Every list in "init_features" and "feature_list" should represents one catagory of feature. e.g: [['AG_20_to_40','AG_40_to_60','AG_60_to_80','AG_80_to_100'], ['Gender']]. Here the age feature is divided into four columns in one list. They represents one type of feature with multually exclusive property, therefore they are combined in one list. While another feature gender has only one column representing "male" or "female"(gender at the demographic level), as a binary option. 
- 3. "agg_units" will be used when sync the individual data. The "agg_units" should be a single csv files contains two columns. First column contains the aggregation name, for instance, postal code "A0A1A0". Second column must be an interger represents the unit in that aggregation units. 
"A0A1A0, 65" means the number of population in postal code area A0A1A0 is 65.

- Other parameters:
- "model_name", "path" are customized parameters. "model_name" specify the training models's name. "path" is the location for the input data files "data_file" and "agg_units".
- "cumulative" set True to training and sync in a cumulative way. False to non-cumulative. The cumulative means every feature in "feature_list" is trained and generated by order. The subsequent feature in the list will be trained based on all previous feature data.

- Please refer to run.py as an example of running Sync APIs. 

### Note
- 1. Make sure the input data file "data_file" contains all of the data columns you want to train and predict. Null values in csv file should be avoided to assure the model has proper results.
- 2. Specify the "init_features" and "feature_list" using the column names in csv files. 
- 3. Currently the algorithm does not support Python 3.9/3.10.