-
Guillaume Perréal authorede0e4dd20
How to run the python script
python TS2DEC.py -h
python S4DECTriplet.py --dataset DATASET_NAME --constr_sample CONSTR_SAMPLE_NUMBER --n_samples NUM_OF_LABELED_EXAMPLES --run_id ID_OF_THE_RUN
example:
python TS2DEC.py --dataset optdigits --constr_sample 2 --n_samples 10 --run_id 3
This command run the TS2DEC methods looking for the data.npy and class.npy files in the opdigits folder (optdigits/data.npy and optdigits/class.npy) as well as the background knowledge file optdigits/constraints/2_10.npy and write two different files: optdigits/TS2DEC/2_10_3.npy and optdigits/TS2DEC/features_2_10_3.npy.
The first file (optdigits/TS2DEC/2_10_3.npy) contains the clustering assignment of the examples in the dataset. It contains one column with as many lines as the number of examples and the value corresponds to the cluster identifier.
The second file (optdigits/TS2DEC/features_2_10_3.npy) contains the embedding representation generated by the encoder of TS2DEC. This file contains as many lines as the number of examples and 10 colmuns since the bottleneck layer has a dimensionality equal to 10.
Folder Structure
For each benchmar (fMNIST, USPS, Reuters and Optdigits) we provide the data we have employed:
- data.npy contains the examples in a relational representation. For instance, consider the fMNIST dataset, data.npy is a numpy array of shape (70000, 784)
- class.npy contains the class associated to each element of data.npy considering a positional notation. For instance, consider the fMNIST dataset, class.npy is a numpy array of shape (70000,) with 10 possible values (0-9).
as well as the set of background knowledge we have employed to build the constraints. Each benchmark folder contains a subfolder "constraints" that contains 50 files: 10 different samples considering 5 different level of supervision (5,10,15,20,25). The level of supervision corresponds to the number of available labeled examples per class.
For instance, considering the reuters dataset, in the folder constraints we have the file 3_15.npy that has a shape of (60, 2). The reuters benchmark has 4 classes and, considering 15 examples per class we have a total of 60 examples. The first column of the 3_15.npy file indicate the position identifier (corresponding to the position in the data.npy and class.npy) of the example while the second column indicate the class identifier.
Dependencies
Keras ( >= 2.2.2)
Scikit-learn ( >= 0.20.0)
The current implementation of TS2DEC is based on the Keras implementation for Deep Embedded Clustering (DEC) algorithm: Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. ICML 2016. Author: Xifeng Guo. 2017.1.30 Availability: https://github.com/XifengGuo/DEC-keras