Newer
Older
"id": "initial-thousand",
"metadata": {},
"source": [
"# TP : Automatiser la publication de métadonnées sous R \n",
"Ce notebook a pour objectif d'illustrer la publication automatisée de métadonnées. \n",
"Les gestionnaires de données et d'IDG sont amené à gérer des volumes important de jeux de données. Selon [A. Maulpoix et al 2016](https://halshs.archives-ouvertes.fr/halshs-01302130/), les IDG Régionales possèdent en moyenne 3500 fiches et les nationales, 7000. Même si la majorité proviennent de différents moissonnages, il n'en reste pas moins que le remplissage manuel de fiches restantes reste une tâche très fastidieuses.\n",
"\n",
"Alors des initiatives d'automatisation ont vu le jour dans différents langage de programmation : \n",
"\n",
"+ Java : [Apache SIS](https://sis.apache.org/)\n",
"+ Python : [GeoAPI](http://www.geoapi.org/snapshot/python/index.html)\n",
"+ R : [Geoflow](https://github.com/eblondel/geoflow)\n",
"+ ...\n",
"\n",
"Nous vous proposons ici d'illustrer cette pratique via le langage R et les librairies Geoflow"
]
},
{
"cell_type": "markdown",
"id": "heavy-series",
"metadata": {},
"source": [
"**1. Chargement des librairies nécessaire au programme**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fc7fbb9-5fbb-4d0b-ab40-07dbc282b244",
"metadata": {},
"outputs": [],
"source": [
"install.packages(\"devtools\")\n",
"install.packages(\"XML\")\n",
"install.packages(\"uuid\")\n",
"install.packages(\"geometa\")\n",
"install.packages(\"geonapi\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"library(geonapi)\n",
"library(geometa)\n",
]
},
{
"cell_type": "markdown",
"id": "wrapped-gross",
"metadata": {},
"source": [
"**2. Connexion au geonetwork**\n",
"\n",
"Veuillez paramétrer correctement : l'adresse du serveur GeoNetwork et ses informations de connexion.\n",
"Vous pouvez aussi changer le niveau de remontées de log à \"Debug\""
]
},
{
"cell_type": "code",
"id": "billion-delivery",
"metadata": {},
"source": [
"gn <- GNManager$new(\n",
" url = \"http://idg-test.interne.teledetection.fr:8181/geonetwork\",\n",
" version = \"3.7\",\n",
" user = \"silat\",\n",
" pwd = \"***\",\n",
" logger = \"DEBUG\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "integrated-eating",
"metadata": {},
"source": [
"**3. Lecture des tableurs**\n",
"\n",
"Les 2 tableurs contiennent :\n",
"+ Les informations de descriptions des jeux de données\n",
"+ Un petit annuaire de contact"
]
},
"execution_count": null,
"id": "wooden-laundry",
"source": [
"working_dir = getwd()\n",
"\n",
"datasets <- read.csv(file=paste(working_dir, \"tableurs_informations_datasets/datasets_description.csv\", sep = \"/\"))\n",
"contacts <- read.csv(file=paste(working_dir, \"tableurs_informations_datasets/contacts.csv\", sep = \"/\"))\n"
]
},
{
"cell_type": "markdown",
"id": "negative-producer",
"metadata": {},
"source": [
"**4. Parcourir les lignes du fichier de descriptions des jeux de données et créer la fiche de métadonnées correspondantes**"
]
},
{
"cell_type": "code",
"metadata": {
"scrolled": true
},
"# Parcourir les jeux de données (datasets) et créer les MD\n",
"for (dataset in datasets$n) {\n",
" print(paste0(\"Working on: \", datasets$titre[dataset]))\n",
" metadata_id <- datasets$uuid[dataset]\n",
" \n",
" ##Création métadonnée\n",
" md = ISOMetadata$new()\n",
" metadata_id=paste(metadata_id)\n",
" md$setFileIdentifier(metadata_id)\n",
" md$setCharacterSet(\"utf8\")\n",
" md$setMetadataStandardName(\"ISO 19115:2003/19139\")\n",
" md$setLanguage(\"fra\")\n",
" md$setDateStamp(Sys.time())\n",
" md$setHierarchyLevel(datasets$hierarchyLevel[dataset])\n",
" \n",
" ##Creation identification\n",
" ident <- ISODataIdentification$new()\n",
" ident$setAbstract(paste(datasets$abstract[dataset]))\n",
" ident$setLanguage(\"fra\")\n",
" for (topic in unlist(strsplit(paste(datasets$topic[dataset]), \", \"))){\n",
" ident$addTopicCategory(topic)\n",
" }\n",
" \n",
" ## keywords\n",
" ### General Keywords\n",
" dynamic_keywords <- ISOKeywords$new()\n",
" for (kw in unlist(strsplit(paste(datasets$keywords[dataset]), \", \"))){\n",
" dynamic_keywords$addKeyword(kw)\n",
" }\n",
" ident$addKeywords(dynamic_keywords)\n",
" \n",
" ##Contacts PI\n",
" for (pi in unlist(strsplit(paste(datasets$PI[dataset]), \", \"))){\n",
" rp <- ISOResponsibleParty$new()\n",
" ct <- contacts$name==pi\n",
" contact=contacts[ct,]\n",
" labo <- strsplit(as.character(contact$organisation), \"/\")[[1]][1]\n",
" rp$setOrganisationName(labo)\n",
" rp$setIndividualName(paste(as.character(contact$name),as.character(contact$firstname),sep=\" \"))\n",
" rp$setRole(\"principalInvestigator\")\n",
" isocontact <- ISOContact$new()\n",
" address <- ISOAddress$new()\n",
" address$setEmail(as.character(contact$mail))\n",
" isocontact$setAddress(address)\n",
" rp$setContactInfo(isocontact)\n",
" ident$addPointOfContact(rp) # Ajout contact pour la ressource : le PI\n",
" }\n",
" \n",
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
" #add link to data access\n",
" distrib <- ISODistribution$new()\n",
" dto <- ISODigitalTransferOptions$new()\n",
" for (link in unlist(strsplit(paste(datasets$web.access[dataset]), \", \"))){\n",
" # Remove paranthesis\n",
" tuple <- gsub('\\\\(',\"\",link)\n",
" tuple <- gsub('\\\\)',\"\",tuple)\n",
" newURL <- ISOOnlineResource$new()\n",
" newURL$setName(paste0(strsplit(paste(tuple), \" @ \")[[1]][1],\" :\"))\n",
" newURL$setLinkage(strsplit(paste(tuple), \" @ \")[[1]][2])\n",
" newURL$setProtocol(\"WWW:LINK-1.0-http--link\")\n",
" dto$addOnlineResource(newURL)\n",
" }\n",
" distrib$setDigitalTransferOptions(dto)\n",
" md$setDistributionInfo(distrib)\n",
"\n",
" #adding legal constraint(s)\n",
" if(nchar(as.character(datasets$licence[dataset])) !=0) {\n",
" lc <- ISOLegalConstraints$new()\n",
" lc$addUseLimitation(datasets$licence[dataset])\n",
" ident$setResourceConstraints(lc)\n",
" }\n",
" # Titre et identification\n",
" ct <- ISOCitation$new()\n",
" ct$setTitle(paste(datasets$titre[dataset]))\n",
" isoid=ISOMetaIdentifier$new(code = datasets$uuid[dataset])\n",
" ident$setCitation(ct)\n",
" ## Aperçu / thumbnail\n",
" for(thumbnail in unlist(strsplit(paste(datasets$thumbnail[dataset]), \", \"))){\n",
" go <- ISOBrowseGraphic$new(\n",
" fileName = thumbnail,\n",
" fileDescription = \"thumbnail\",\n",
" fileType = \"image/png\"\n",
" )\n",
" ident$addGraphicOverview(go)\n",
" }\n",
" md$addIdentificationInfo(ident)\n",
" \n",
" # Conversion en iso19139 et sauvegarde du fichier XML\n",
" md$encode(inspire = FALSE)\n",
" nom_fichier = str_replace_all(datasets$titre[dataset], \" \", \"_\") # nous enlevons les espaces\n",
" nom_fichier = str_replace_all(nom_fichier, \"/\", \"_\") # nous enlevons les slashs\n",
" nom_fichier = paste(nom_fichier, \"xml\", sep=\".\") # nous ajoutons l'extension de fichier XML\n",
" chemin_fichier = paste(\"xml_generated\", nom_fichier, sep=\"/\") # nous cronstruisons le chemin vers le fichier\n",
" md$save(chemin_fichier)\n",
]
},
{
"cell_type": "markdown",
"id": "nuclear-switzerland",
"metadata": {},
"source": [
"**5. Upload ou mise à jour des fiches de métadonnées**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "possible-pickup",
"metadata": {},
"outputs": [],
"source": [
"filenames = list.files(\"xml_generated\", pattern=\"*.xml\") \n",
"for (file in filenames) {# nous parcourons l'ensemble des fichiers xml\n",
" chemin_fichier = paste(\"xml_generated\", file, sep=\"/\")\n",
" xml = xmlParse(chemin_fichier)\n",
" md = ISOMetadata$new(xml = xml) # création de l'objet ISOMetadata\n",
" created = gn$insertMetadata( # insertion dans GeoNetwork\n",
" xml = md$encode(),\n",
" group = \"1\",\n",
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
}
},
"nbformat": 4,
"nbformat_minor": 5
}