Commit 964ffb4b authored by Decoupes Remy's avatar Decoupes Remy
Browse files

reamde

No related merge requests found
Showing with 5 additions and 0 deletions
+5 -0
# Populate Data Lake
The populating is done by using two scripts writing in python and R. These two languages have to be used because the functionalities offered by their libraries were of an unequal level of quality. Indeed, python offers an excellent library to interact with hdfs, while R has interesting modules to manage iso19115 metadata. In order to reduce the complexity generated by the concomitant use of these two languages, the R script has been encapsulated inside the python script. Thus, the administrator only needs to run the python script.
# Run the script
```shell
python3 src/main.py
```
## Python : collect data and insert to Data Lake on data zone : HDFS Cluster
### Prerequisites
Has to be run on python 3.6 with requirements found in python-requirements.txt
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment