diff --git a/README.md b/README.md index 140306f6907172adc603c5de59d0a31592d96980..887871b0cea1bc1e9a262763cc7a055290d51818 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,11 @@ # Populate Data Lake The populating is done by using two scripts writing in python and R. These two languages have to be used because the functionalities offered by their libraries were of an unequal level of quality. Indeed, python offers an excellent library to interact with hdfs, while R has interesting modules to manage iso19115 metadata. In order to reduce the complexity generated by the concomitant use of these two languages, the R script has been encapsulated inside the python script. Thus, the administrator only needs to run the python script. +# Run the script +```shell +python3 src/main.py +``` + ## Python : collect data and insert to Data Lake on data zone : HDFS Cluster ### Prerequisites Has to be run on python 3.6 with requirements found in python-requirements.txt