main.ipynb 9.21 KB
Newer Older
Decoupes Remy's avatar
Decoupes Remy committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "initial-thousand",
Decoupes Remy's avatar
Decoupes Remy committed
   "metadata": {},
   "source": [
    "# TP : Automatiser la publication de métadonnées sous R \n",
    "Ce notebook a pour objectif d'illustrer la publication automatisée de métadonnées. \n",
    "Les gestionnaires de données et d'IDG sont amené à gérer des volumes important de jeux de données. Selon [A. Maulpoix et al 2016](https://halshs.archives-ouvertes.fr/halshs-01302130/), les IDG Régionales possèdent en moyenne 3500 fiches et les nationales, 7000. Même si la majorité proviennent de différents moissonnages, il n'en reste pas moins que le remplissage manuel de fiches restantes reste une tâche très fastidieuses.\n",
    "\n",
    "Alors des initiatives d'automatisation ont vu le jour dans différents langage de programmation : \n",
    "\n",
    "+ Java : [Apache SIS](https://sis.apache.org/)\n",
    "+ Python : [GeoAPI](http://www.geoapi.org/snapshot/python/index.html)\n",
    "+ R : [Geoflow](https://github.com/eblondel/geoflow)\n",
    "+ ...\n",
    "\n",
    "Nous vous proposons ici d'illustrer cette pratique via le langage R et les librairies Geoflow"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "heavy-series",
   "metadata": {},
   "source": [
    "**1. Chargement des librairies nécessaire au programme**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fc7fbb9-5fbb-4d0b-ab40-07dbc282b244",
   "metadata": {},
   "outputs": [],
   "source": [
    "install.packages(\"devtools\")\n",
    "install.packages(\"XML\")\n",
    "install.packages(\"uuid\")\n",
    "install.packages(\"geometa\")\n",
    "install.packages(\"geonapi\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "daily-penalty",
   "outputs": [],
   "source": [
    "library(geonapi)\n",
    "library(geometa)\n",
    "\n",
    "library(stringr)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "wrapped-gross",
   "metadata": {},
   "source": [
    "**2. Connexion au geonetwork**\n",
    "\n",
    "Veuillez paramétrer correctement : l'adresse du serveur GeoNetwork et ses informations de connexion.\n",
    "Vous pouvez aussi changer le niveau de remontées de log à \"Debug\""
Decoupes Remy's avatar
Decoupes Remy committed
   "execution_count": null,
   "id": "billion-delivery",
   "metadata": {},
Decoupes Remy's avatar
Decoupes Remy committed
   "outputs": [],
   "source": [
    "gn <- GNManager$new(\n",
    "  url = \"http://idg-test.interne.teledetection.fr:8181/geonetwork\",\n",
    "  version = \"3.7\",\n",
    "  user = \"silat\",\n",
    "  pwd = \"***\",\n",
    "  logger = \"DEBUG\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "integrated-eating",
   "metadata": {},
   "source": [
    "**3. Lecture des tableurs**\n",
    "\n",
    "Les 2 tableurs contiennent :\n",
    "+ Les informations de descriptions des jeux de données\n",
    "+ Un petit annuaire de contact"
   ]
  },
Decoupes Remy's avatar
Decoupes Remy committed
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "wooden-laundry",
Decoupes Remy's avatar
Decoupes Remy committed
   "metadata": {},
   "outputs": [],
   "source": [
    "working_dir = getwd()\n",
    "\n",
    "datasets <- read.csv(file=paste(working_dir, \"tableurs_informations_datasets/datasets_description.csv\", sep = \"/\"))\n",
    "contacts <- read.csv(file=paste(working_dir, \"tableurs_informations_datasets/contacts.csv\", sep = \"/\"))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "negative-producer",
   "metadata": {},
   "source": [
    "**4. Parcourir les lignes du fichier de descriptions des jeux de données et créer la fiche de métadonnées correspondantes**"
Decoupes Remy's avatar
Decoupes Remy committed
   "execution_count": null,
   "id": "bound-throw",
   "metadata": {
    "scrolled": true
   },
Decoupes Remy's avatar
Decoupes Remy committed
   "outputs": [],
    "# Parcourir les jeux de données (datasets) et créer les MD\n",
    "for (dataset in datasets$n) {\n",
    "  print(paste0(\"Working on: \", datasets$titre[dataset]))\n",
    "  metadata_id <- datasets$uuid[dataset]\n",
    "  \n",
    "  ##Création métadonnée\n",
    "  md = ISOMetadata$new()\n",
    "  metadata_id=paste(metadata_id)\n",
    "  md$setFileIdentifier(metadata_id)\n",
    "  md$setCharacterSet(\"utf8\")\n",
    "  md$setMetadataStandardName(\"ISO 19115:2003/19139\")\n",
    "  md$setLanguage(\"fra\")\n",
    "  md$setDateStamp(Sys.time())\n",
    "  md$setHierarchyLevel(datasets$hierarchyLevel[dataset])\n",
    "  \n",
    "  ##Creation identification\n",
    "  ident <- ISODataIdentification$new()\n",
    "  ident$setAbstract(paste(datasets$abstract[dataset]))\n",
    "  ident$setLanguage(\"fra\")\n",
    "  for (topic in unlist(strsplit(paste(datasets$topic[dataset]), \", \"))){\n",
    "    ident$addTopicCategory(topic)\n",
    "  }\n",
    "  \n",
    "  ## keywords\n",
    "  ### General Keywords\n",
    "  dynamic_keywords <- ISOKeywords$new()\n",
    "  for (kw in unlist(strsplit(paste(datasets$keywords[dataset]), \", \"))){\n",
    "    dynamic_keywords$addKeyword(kw)\n",
    "  }\n",
    "  ident$addKeywords(dynamic_keywords)\n",
    "    \n",
    "  ##Contacts PI\n",
    "  for (pi in unlist(strsplit(paste(datasets$PI[dataset]), \", \"))){\n",
    "    rp <- ISOResponsibleParty$new()\n",
    "    ct <- contacts$name==pi\n",
    "    contact=contacts[ct,]\n",
    "    labo <- strsplit(as.character(contact$organisation), \"/\")[[1]][1]\n",
    "    rp$setOrganisationName(labo)\n",
    "    rp$setIndividualName(paste(as.character(contact$name),as.character(contact$firstname),sep=\" \"))\n",
    "    rp$setRole(\"principalInvestigator\")\n",
    "    isocontact <- ISOContact$new()\n",
    "    address <- ISOAddress$new()\n",
    "    address$setEmail(as.character(contact$mail))\n",
    "    isocontact$setAddress(address)\n",
    "    rp$setContactInfo(isocontact)\n",
    "    ident$addPointOfContact(rp) # Ajout contact pour la ressource : le PI\n",
    "  }\n",
    "    \n",
    "  #add link to data access\n",
    "  distrib <- ISODistribution$new()\n",
    "  dto <- ISODigitalTransferOptions$new()\n",
    "  for (link in unlist(strsplit(paste(datasets$web.access[dataset]), \", \"))){\n",
    "    # Remove paranthesis\n",
    "    tuple <- gsub('\\\\(',\"\",link)\n",
    "    tuple <- gsub('\\\\)',\"\",tuple)\n",
    "    newURL <- ISOOnlineResource$new()\n",
    "    newURL$setName(paste0(strsplit(paste(tuple), \" @ \")[[1]][1],\" :\"))\n",
    "    newURL$setLinkage(strsplit(paste(tuple), \" @ \")[[1]][2])\n",
    "    newURL$setProtocol(\"WWW:LINK-1.0-http--link\")\n",
    "    dto$addOnlineResource(newURL)\n",
    "  }\n",
    "  distrib$setDigitalTransferOptions(dto)\n",
    "  md$setDistributionInfo(distrib)\n",
    "\n",
    "  #adding legal constraint(s)\n",
    "  if(nchar(as.character(datasets$licence[dataset])) !=0) {\n",
    "    lc <- ISOLegalConstraints$new()\n",
    "    lc$addUseLimitation(datasets$licence[dataset])\n",
    "    ident$setResourceConstraints(lc)\n",
    "  }\n",
    "  # Titre et identification\n",
    "  ct <- ISOCitation$new()\n",
    "  ct$setTitle(paste(datasets$titre[dataset]))\n",
    "  isoid=ISOMetaIdentifier$new(code = datasets$uuid[dataset])\n",
    "  ct$addIdentifier(isoid)\n",
    "  ident$setCitation(ct)\n",
    "  ## Aperçu / thumbnail\n",
    "  for(thumbnail in unlist(strsplit(paste(datasets$thumbnail[dataset]), \", \"))){\n",
    "  go <- ISOBrowseGraphic$new(\n",
    "    fileName = thumbnail,\n",
    "    fileDescription = \"thumbnail\",\n",
    "    fileType = \"image/png\"\n",
    "  )\n",
    "  ident$addGraphicOverview(go)\n",
    "  }\n",
    "  md$addIdentificationInfo(ident)\n",
    " \n",
    "  # Conversion en iso19139 et sauvegarde du fichier XML\n",
    "  md$encode(inspire = FALSE)\n",
    "  nom_fichier = str_replace_all(datasets$titre[dataset], \" \", \"_\") # nous enlevons les espaces\n",
    "  nom_fichier = str_replace_all(nom_fichier, \"/\", \"_\") # nous enlevons les slashs\n",
    "  nom_fichier = paste(nom_fichier, \"xml\", sep=\".\") # nous ajoutons l'extension de fichier XML\n",
    "  chemin_fichier = paste(\"xml_generated\", nom_fichier, sep=\"/\") # nous cronstruisons le chemin vers le fichier\n",
    "  md$save(chemin_fichier)\n",
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nuclear-switzerland",
   "metadata": {},
   "source": [
    "**5. Upload ou mise à jour des fiches de métadonnées**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "possible-pickup",
   "metadata": {},
   "outputs": [],
   "source": [
Decoupes Remy's avatar
Decoupes Remy committed
    "require(XML)\n",
Decoupes Remy's avatar
Decoupes Remy committed
    "filenames = list.files(\"xml_generated\", pattern=\"*.xml\") \n",
    "for (file in filenames) {# nous parcourons l'ensemble des fichiers xml\n",
    "    chemin_fichier = paste(\"xml_generated\", file, sep=\"/\")\n",
    "    xml = xmlParse(chemin_fichier)\n",
    "    md = ISOMetadata$new(xml = xml) # création de l'objet ISOMetadata\n",
    "    created = gn$insertMetadata( # insertion dans GeoNetwork\n",
    "      xml = md$encode(),\n",
    "      group = \"1\",\n",
Decoupes Remy's avatar
Decoupes Remy committed
    "    )\n",
    "}"
Decoupes Remy's avatar
Decoupes Remy committed
  }
 ],
 "metadata": {
Decoupes Remy's avatar
Decoupes Remy committed
  "celltoolbar": "Éditer les Méta-Données",
Decoupes Remy's avatar
Decoupes Remy committed
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
Decoupes Remy's avatar
Decoupes Remy committed
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}