Saturday, January 14, 2017

API v3 HAL, import to HAL.

voir le fichier "Guideline Fichier doc " à la fin  (iframe .docx)
qui est le plus clair et en anglais.

remarquer les 5 formats de l'export



Intro

Les APIs (Application Programming Interface) sont des interfaces qui permettent une communication machine à machine. Cette page vous donne accès à la documentation des APIs de HAL v3.0:
https://api.archives-ouvertes.fr/docs
Il y a 3 API
  • SWORD import
    Effectuer un dépôt sur HAL grâce à l'API SWORD.
  • HAL OAI-PMH server.
  • Search HAL resources via an API.
On va se focaliser sur la première API SWORD.

import

les ressources internes



GitHub HAL

HAL
HAL is an open archive where authors can deposit scholarly documents from all academic fields
 JavaScript   Updated on 10 Nov 2016 (@janv 2017)

Episciences.org
The main idea of the Episciences.org project is to provide a technical platform of peer-reviewing
 Updated on 21 Apr 2015 (@janv 2017)

Sciencesconf.org
Sciencesconf.org is a Web platform available to all organizers of scientific conferences.
 Updated on 12 Mar 2015 (@janv 2017)


API de dépôt SWORD

 une documentation des processus SWORD sur HAL.

intro

L'API de dépôt SWORD permet l'import automatique de documents dans l'archive ouverte HAL (hal.archives-ouvertes.fr).

nous avons choisi le protocole SWORD (Simple Web-service Offering Repository Deposit), standard d’échanges international fondé par le JISC (Joint Information Systems Committee) pour les imports.
http://swordapp.org/about/
https://www.jisc.ac.uk/about

Le protocole SWORD définit un ensemble de services Web basé sur Atom Publishing Protocol (APP), RFC5023.
https://www.ccsd.cnrs.fr/fr/2013/11/les-api-dans-hal-v3/
http://swordapp.org/

La version implémentée dans HAL de SWORD/APP suit de près la version 2.0 du protocole.
SWORD version 2 is a new initiative to update the SWORD v1 standard in order to allow it to cope not only with the traditional ‘fire and forget’ deposit scenario, but also to facilitate the functions needed to support the whole deposit lifecycle of scholarly works.

The Atom Publishing Protocol (AtomPub) is an application-level protocol for publishing and editing Web resources.  The protocol is based on HTTP transfer of Atom-formatted representations.  The Atom format is documented in the Atom Syndication Format.
https://tools.ietf.org/html/rfc5023
The Internet Engineering Task Force (IETF®)
https://www.ietf.org/

Using SWORD

Depositing is a two-stage process within APP and SWORD. First, a request from an authenticated user is sent to the implementation for what APP calls the ‘service document’, this returns details of the collections that user is allowed to deposit to within the repository. At this point, the user may deposit their file into the chosen collection. Various things may prevent success, for example lack of authentication credentials, unacceptable file format or a corrupt MD5 checksum. The repository will send a respond indicating the success, or otherwise of the deposit.

Avant d'utiliser SWORD sur HAL

Le dépôt via le protocols SWORD nécessite l'utilisation d'un compte déposant valide.

L'authentification s'effectue grâce à la méthode HTTP Basic.

L'API SWORD de HAL permet de spécifier dans l'entête HTTP On-Behalf-Of le(les) identifiant(s)/uid du(des) compte(s) pour le(s)quel(s) l'action est effectuée. Le caractère ";" sert de séparateur.

Les métadonnées transmises via ce protocole sont celles généralement utilisées dans HAL. Il est donc nécessaire d'être familier avec un dépôt dans HAL. En particulier, chaque dépôt devra être catégorisé suivant les disciplines scientifiques de HAL et rangé suivant un des types de document, etc. L'accès aux référentiels utilisés dans HAL est disponible ici.
https://api.archives-ouvertes.fr/docs/ref/
Voir ci-dessous structure "des types de document" : Fields of reference

En cas d'erreurs, d'entêtes non compréhensibles ou de contenus incorrects l'API utilise les erreurs SWORD standards introduites dans le paragraphe "La gestion des erreurs" ci-dessous.

Effectuer un dépôt

Le point d'entrée (BaseURL) de l'API SWORD sur HAL est
  https://api.archives-ouvertes.fr/sword

Une version de test est disponible :
https://api-preprod.archives-ouvertes.fr/sword
pour tester vos développements dans un environnement HAL très proche de la production.

L'objet servicedocument

Ce service permet de connaitre les caractéristiques de l'API SWORD sur HAL. Il permet en particulier de récupérer les formats XML connus pour l'import des notices.

L'URL de ce service est
  https://api.archives-ouvertes.fr/sword/servicedocument

Chaque portail de HAL est défini comme une collection au sens SWORD. L'URL de dépôt pour le portail Hal est ainsi définie : https://api.archives-ouvertes.fr/sword/hal

Le format de métadonnées

Le format de métadonnées à utiliser pour l'import SWORD dans HAL est basé sur le format TEI.
http://www.tei-c.org/index.xml
L'URL, dans l'entête HTTP Packaging du protocole HTTP, est 
http://purl.org/net/sword-types/AOfr

Le schéma XML est disponible sur https://api.archives-ouvertes.fr/documents/aofr-sword.xsd.

L'ensemble des métadonnées possibles sont à retrouver ici : Le format XML complet avec documentation intégrée
https://api.archives-ouvertes.fr/documents/all.xml

Voir à la fin de ce post pour plus de détails.

Les exemples

Vous pouvez tester les exemples suivants :

Des exemples par type de document sont également disponibles sur le : dépôt Github du CCSD
https://github.com/CCSDForge/HAL/tree/master/Sword

Le dépôt

Le dépôt via cette API consiste au transfert (requête HTTP POST)
  • soit d'un fichier XML 
  • soit d'une archive ZIP contenant la fiche descriptive, au format XML, de la ressource plus une(des) pièce(s) jointe(s) correspondant au texte intérgal et pointée(s) dans le fichier XML.

Le format XML est annoncé au serveur via l'entête HTTP Packaging. Il faut renseigner l'entête Content-Type pour indiquer au serveur le type de contenu (application/zip ou text/xml). Dans le cas d'une archive ZIP, le nom du fichier XML est indiqué dans l'entête HTTP Content-Disposition. Il est possible de s'assurer de l'intégrité du contenu envoyé en fournissant sa signature md5 dans l'entête HTTP Content-MD5.

HAL propose 4 entêtes HTTP spécifiques pour le dépôt :
  • Export-To-Arxiv : indique si le dépôt doit être transféré sur l'archive arXiv [true ou false, défaut : false]
  • Export-To-PMC : indique si le dépôt doit être transféré sur l'archive PubMed Central [true ou false, défaut : false]
  • Hide-For-RePEc : permet de cacher le dépôt du reservoire accessible à RePEc dans l'archive HAL [true ou false, défaut : false]
  • Hide-In-OAI : permet de cacher le dépôt du reservoire OAI-PMH et du Sitemap [true ou false, défaut : false]

https://en.wikipedia.org/wiki/PubMed_Central
https://fr.wikipedia.org/wiki/ArXiv

Dans le cas d'un dépôt de notice sans document (la référence à un fichier target="DOC.pdf" doit être supprimée) la réponse en cas de succès est "202 Accepted".


Un formulaire Web permettant de déposer votre fichier XML ou ZIP est disponible sur https://api.archives-ouvertes.fr/sword/upload/


La mise à jour

Mise à jour d'un document

Lorsqu'un dépôt est accepté sur l'archive HAL, il est possible de remplacer son contenu en déposant une nouvelle version. Via l'API SWORD il faut utiliser la requête HTTP PUT sur l'URL https://api.archives-ouvertes.fr/sword/%identifiant ressource% en incluant dans le contenu HTTP soit un fichier ZIP (fichier + XML) soit un fichier XML dans le format annoncé dans l'entête Packaging. Il faut renseigner l'entête Content-Type pour indiquer au serveur le type de c ntenu. Dans le cas d'une archive ZIP, le nom du fichier XML est indiqué dans l'entête HTTP Content-Disposition. Il est possible de s'assurer de l'intégrité du contenu envoyé en fournissant sa signature md5 dans l'entête HTTP Content MD5.

Exemple :

PUT https://api.archives-ouvertes.fr/sword/hal-00000002 HTTP/1.1
Authorization: Basic ZGFmZnk6c2VjZXJldA==
Content-Type: application/zip
Packaging: http://purl.org/net/sword-types/AOfr
Content-Disposition: attachment; filename=meta.xml

ZIP file content
...
En cas de succès, la réponse du serveur est "201 Created" et un contenu décrivant une entrée SWORD.

Dans le cas d'un dépôt de notice, sans document, la réponse en cas de succès est "202 Accepted".
curl -v -u test_ws:test https://api-preprod.archives-ouvertes.fr/sword/hal-00000010 -H "Packaging:http://purl.org/net/sword-types/AOfr" -X PUT -H "Content-Type:application/zip" -H "Content-Disposition: attachment; filename=comm.xml" --data-binary @depot.zip

Mise à jour des métadonnées

Lorsqu'un dépôt est accepté sur l'archive HAL, il est possible de corriger/modifier les métadonnées descriptives du dépôt. Via l'API SWORD il faut utiliser la requête HTTP PUT sur l'URL https://api.archives-ouvertes.fr/sword/identifiant ressourcevversion en incluant dans le contenu HTTP le XML dans le format annoncé dans l'entête Packaging. Il est possible de s'assurer de l'intégrité du contenu envoyé en fournissant sa signature md5 dans l'entête HTTP Content-MD5.

Exemple :

PUT https://api.archives-ouvertes.fr/sword/hal-00000002v3 HTTP/1.1
Authorization: Basic ZGFmZnk6c2VjZXJldA==
On-Behalf-Of: test
Content-Type: text/xml
Packaging: http://purl.org/net/sword-types/AOfr

<?xml version="1.0"?>
<TEI>
...

En cas de succès, la réponse du serveur est "200 OK" et un contenu décrivant une entrée SWORD.

curl -X PUT -d @comm.xml -v -u test_ws:test https://api-preprod.archives-ouvertes.fr/sword/hal-00000050 -H "Packaging:http://purl.org/net/sword-types/AOfr" -H "Content-Type:text/xml"

La suppression

Il est possible de supprimer des ressources de HAL via l'API SWORD pour des dépôts de notices, sans document, ou des dépôts non encore acceptés en effectuant la requête HTTP DELETE sur l'URL https://api.archives-ouvertes.fr/sword/%identifiant ressource%v%version%

Exemple :

DELETE https://api.archives-ouvertes.fr/sword/hal-00000002 HTTP/1.1
Authorization: Basic ZGFmZnk6c2VjZXJldA==
En cas de succès, la réponse du serveur est "204 No Content".

curl -X DELETE -v -u test_ws:test https://api-preprod.archives-ouvertes.fr/sword/hal-00000010

Le statut d'un dépôt

Une requête HTTP GET sur l'URL https://api.archives-ouvertes.fr/sword/%identifiant ressource%v%version% permet de connaitre le statut d'une ressource.

Le statut retourné est :

  • accept : la ressource est en ligne
  • replace : la ressource a une nouvelle version
  • verify : le dépôt est en validation
  • update : une(des) modification(s) a(ont) été demandée(s) lors de la validation
  • delete : la ressource a été refusée

Le champ comment permet de connaître la(les) raison(s) du refus ou de la modification.

Si la personne connectée est le déposant ou un des propriétaires du dépôt, le mot de passe du dépôt est également retourné (attribut password de document)

Exemple : GET https://api.archives-ouvertes.fr/sword/hal-00000001v1
        Schéma de retour :
        <document id="hal-00000001" version="1">
            <status>accept|replace|verify|update|delete</status>
            <comment></comment>
        </document>

curl -X GET -v -u test_ws:test https://api-preprod.archives-ouvertes.fr/sword/hal-0000002

La gestion des erreurs

La gestion des erreurs est décrite dans la partie 12 de la spécification SWORD, où un élément appelé sword:error est introduit.
http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#errordocuments
Ce message permet d'apporter plus d'informations aux ordinaires 4xx et/ou 5xx du protocole HTTP.

Le contenu de la réponse d'erreur est un noeud Atom classique dont l'élément racine est sword:error, un attribut href contenant une URI qui identifie l'erreur, un champ title présentant un code erreur et une erreur humaine présenté dans l'élément summary et/ou sword:verboseDescription.

Le statut HTTP pour toutes les erreurs qui ne sont pas définies dans les spécifications SWORD est "400 Bad Request".

La réponse SWORD

En cas de succès des verbes POST (dépôt) ou PUT (modification ou nouvelle version) le serveur retourne un document XML : Atom Entry Document.
http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html#depositreceipt

L'identifiant, le mot de passe et la version de la ressource déposée sont disponible dans ce retour.

Exemple complet de retour :
        <?xml version="1.0" encoding="utf-8"?>
        <entry xmlns="http://www.w3.org/2005/Atom" xmlns:sword="http://purl.org/net/sword/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:hal="http://hal.archives-ouvertes.fr/">
            <title>Accepted media deposit to HAL</title>
            <id>hal-00000001</id>
            <hal:password>XXXXXXXX</hal:password>
            <hal:version>1</hal:version>
            <updated>2015-03-27T15:04:31+01:00</updated>
            <summary>A media deposit was stored in the HAL workspace</summary>
            <sword:treatment>stored in HAL workspace</sword:treatment>
            <sword:userAgent>HAL SWORD API Server</sword:userAgent>
            <source>
            <generator uri="https://api.archives-ouvertes.fr/sword" version="1.0">hal@ccsd.cnrs.fr</generator>
            </source>
            <link rel="alternate" href="https://hal.archives-ouvertes.fr/hal-00000001"/>
        </entry>

How to make an XML import ?

see the .docx below.
just this intro:

  1. Before using this service, you must already have an account on HAL or create a new one.
  2. Create your XML files with an XML editor. Consult the section “1.2. Help for the XML file construction”. Beware : you have to create one XML file for each publication.
  3. You have the possibility to make 2 different types of uploads: uploads without any attached file(s) (notice) or a deposit with an attached file(s). Consult the section “1.3.2. General elements : <editionStmt>”
  4. Validate your XML file with the XML schema of HAL on an XML editor or by command-line on Linux by copying the XML schema of HAL on your disk and launching the following command-line:
    xmllint --noout TEI_COUV_complet_Sword.xml --schema aofr-sword.xsd
  5. If you want, you can import your files via a Web-API.
  6. Connexion with your HAL login and password.
  7. Import your articles by transferring your XML files one by one.
  8. After your submission, there is a validation step .
  9. After the validation and if no modification is required, your document is online.


Guideline Fichier doc 

https://github.com/CCSDForge/HAL/blob/master/Sword/SWORD_import_HAL.doc





Structure du fichier TEI XML HAL

TEI pour import/export HAL
Format standard choisit pour l’import/export dans HAL
Décomposition et explication du fichier TEI de HA:
http://aramis.resinfo.org/wiki/lib/exe/fetch.php?media=ateliers:aramis-hal-v3-le-format-tei_25_02_2015.pdf

conversion entre de nombreux formats

http://www.tei-c.org/oxgarage/#
OxGarage is an web, and RESTful, service to manage the transformation of documents between a variety of formats. The majority of transformations use the Text Encoding Initiative format as a pivot format
https://github.com/TEIC/Oxgarage

The TEI Special Interest Group on Scholarly Publishing, approved and created in June 2009, focuses on the use of TEI in original scholarly publication, as the medium of authorship and archival format from which print, Web, ebook, and other formats may be derived.
http://www.tei-c.org/Activities/SIG/Publishing/index.xml

Outils pour les formats TEI

The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics. Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation.
http://www.tei-c.org/index.xml

example:
http://www.tei-c.org/Activities/Projects/ta01.xml
and see the structure xml:
http://www.tei-c.org/Activities/Projects/ta01.xml?style=raw

Tools

http://www.tei-c.org/Tools/




mendeley data gives DOI. force 11 joint-declaration-data-citation-principles-final


Mendeley Data is a place where researchers can upload and share their research data for free. Datasets can be shared privately amongst individuals, as well as published to share with the world. Sharing research data is great for science as it enables data reuse and supports reproducibility of studies. It’s also a fantastic way to gain exposure for your research outputs, as every dataset has a DOI and can be cited.

When you publish your data with our service, you choose a licence to publish it under, from a range of Creative Commons and open software licences. This means you retain control of the data, and choose the terms under which others may consume and reuse it. You may delete your dataset at any time, by contacting us.

Your data is stored on Amazon S3 servers, in Germany, where it benefits from redundancy and multiple backups. Our service has been extensively penetration tested and received certification. In addition, we partner with DANS (Data Archiving and Network Services - an industry-leading scientific data archive service), to preserve your data over the longterm. This means your dataset will be discoverable in perpetuity, via the DOI it is issued on publication. If you have any further questions, please contact us.

Datasets must be:

  • scientific in nature
  • research data - rather than the research article, which may have resulted from the research

Datasets must not be:

  • have already been published, and therefore not already have a DOI
  • contain executable files or archives that are not accompanied by individually detailed file descriptions.
  • contain copyrighted content (audio, video, image, etc)
  • contain sensitive information (for example, but not limited to: patient details, dates of birth etc.)
All services provided by Mendeley Data - storing, posting and accessing data - are free-to-use. In future, we may introduce a freemium model - for instance charging for storing and posting data, above a certain dataset size threshold. This will not affect existing datasets, which will continue to be stored for free. We will offer paid-for versions of our service to institutions.


https://data.mendeley.com/

https://data.mendeley.com/faq

force11

https://www.force11.org/group/joint-declaration-data-citation-principles-final
(2014)

Interoperability and Flexibility

Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities

Ten search engines for researchers that go beyond general search engine (Google)


Open access search engines

CORE

What is it?
An experimental service, allowing keyword and semantic search of over 10 million open access articles.

Key feature: If you find an article you like, CORE will find similar ones by analysing the text of that article.


Aggregating the world’s open access research papers

We offer seamless access to millions of open access research papers, enrich the collected data for text-mining and provide unique services to the research community.

https://core.ac.uk/

an example:
https://core.ac.uk/display/8507685
(my PLOS-ONE)

BASE

What is it?
BASE is one of the world's most voluminous search engines especially for academic open access web resources from over 2,000 sources.

Key features: Allows you to search intellectually selected resources and their bibliographic data, including those from the so-called ‘deep web’, which are ignored by commercial search engines. There are several options for sorting the results list and you can browse by Dewey Decimal Classification and document type.


Library catalogues

Copac

What is it?
A Jisc service allowing you to look through the catalogues of over 70 major UK and Irish libraries.

Key features: Good for locating books and other material held in research collections in the UK;  especially useful for humanities.


France: sudoc


Web Scale Discovery services

What is it?
Many university libraries have one of these services working behind the scenes, they index a vast range of academic resources and provide sophisticated search tools.

Key features: The search includes journal articles, e-books, reviews, legal documents and more that are harvested from primary and secondary publishers, aggregators and open-access repositories.

Zetoc

What is it?
One of the world’s most comprehensive research databases, this Jisc service gives you access to over 28,000 journals and more than 52 million article citations and conference papers through the British Library’s electronic table of contents.

Key features: Researchers can get email alerts of the table of contents in journals, keeping them up to date with the latest literature in their field.

Europeana

What is it?
This is a meta-catalogue of cultural heritage collections from a range of Europe's leading galleries, libraries, archives and museums. The catalogue includes books and manuscripts, photos and paintings, television and film, sculpture and crafts, diaries and maps, sheet music and recordings.

Features: You can download your resource, print it, use it, save it, share it and play with it.

Social web

Twitter

What is it?
Harness the power of social discovery and particularly the #icanhazpdf hashtag for locating PDFs that you do not have access to through your institution.

Features: Tweet an article you need using this hashtag and someone will point you to a copy that you can access.


Ref.
https://www.jisc.ac.uk/blog/ten-search-engines-for-researchers-that-go-beyond-google-11-jul-2013?from=promo

JISC: article processing charge in UK; Institution as e-textbook publisher


We’re working to develop services, provide support, and influence policy in order to enable UK higher education to realise the rewards of open access (OA). Read our introductory guide and find out more about our role in open access.
https://www.jisc.ac.uk/content/open-access

 APC (article processing charge) data
https://www.jisc.ac.uk/monitor-UK

About Monitor UK

Monitor UK presents APC (article processing charge) data, from across the UK, in a number of simple reports. This enables institutions and funders to explore and evaluate UK cost and compliance data relating to open access publishing.

The reports made available provide business intelligence not otherwise available, which can be sliced and diced by publisher and institution and filtered by date range in an easy-to-use web interface.

Learn more about the benefits, and register for access, on the Monitor website.
http://monitor.jisc.ac.uk/uk

Institution as e-textbook publisher

Jisc Collections is funding four project teams from UK higher-education institutions to investigate the viability of publishing their own e-textbooks.

The fundamental question they seek to address is: Will the institution as e-textbook creator help students by providing a more affordable higher education, and promote a better, more sustainable information environment for libraries, students and faculty?

Four project teams have been selected to carry out this work: The University of Liverpool, The University of Nottingham, The University of the Highlands & Islands with Edinburgh Napier University and University College London.

The programme started in April 2014 and is scheduled to finish in September 2018, when all books should have been published for a minimum of two years. This website aims to provide the background and details of the work each team is carrying out from the menu on the left hand side.

https://www.jisc-collections.ac.uk/Institution-as-E-textbook-Publisher/

https://www.jisc-collections.ac.uk/Institution-as-E-textbook-Publisher/Programme/Participating-institutions/

Friday, January 13, 2017

DOI, ARK and Ezid (a service of CDL 500$/yr)



EZID is a service of the California Digital Library,

Create and manage long-term globally unique IDs for data and other sources using EZID

500$/yr for DOI and ARK (1 million/yr)

DOIs
Recommended for final products, published materials, and citation. Use for objects under good long-term management.

ARKs
Recommended for projects requiring more flexibility, early or unpublished versions of material.



DataCite (INIST, CERN). Metadata Schema. JAST, DOI. both human- and machine-readable metadata. embed the metadata using JSON and a script tag. The latter approach is easier to implement, as all metadata are in a single place, and the JSON can be embedded dynamically via a scrip.

Introduction

DataCite Metadata Schema
Documentation for the Publication and Citation of Research Data

https://www.datacite.org/

Members of the Metadata Working Group

Madeleine de Smaele, TU Delft (co‐chair of working group) Joan Starr,
California Digital Library (co‐chair of working group) Jan Ashton, British Library

Amy Barton, Purdue University Library
Tina Bradford, NRC/CISTI (New)
Anne Ciolek‐Figiel, Inist‐CNRS
Stefanie Dietiker, ETH Zurich (New)
Jannean Elliott, DOE/OSTI
Berrit Genat, TIB
Karoline Harzenetter, GESIS
Barbara Hirschmann, ETH Zurich (Departing)
Stefan Jakobsson, SND (New)
Jean‐Yves Mailloux, NRC/CISTI (Departing)
Lars Holm Nielsen, CERN (Departing)
Mohamed Yahia, Inist‐CNRS
Frauke Ziedorn, TIB (On leave, Metadata Supervisor)

resources

mainly a schema
https://schema.datacite.org/
The DataCite Metadata Schema is a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.

DataCite Metadata Working Group. (2016). DataCite Metadata Schema for the Publication and Citation of Research Data. Version 4.0. DataCite e.V. http://doi.org/10.5438/0013

many examples

https://schema.datacite.org/meta/kernel-4.0/

DOI

DataCite does not allocate DOIs directly; this activity is undertaken by many of DataCite’s members, who act as DOI allocating agents. DataCite members enable data owners, stewards, or archives to assign persistent identifiers to research data. The list below provides details and contact information for all of DataCite’s members.
INIST CNRS, CERN, CDL...
https://www.datacite.org/members.html

source code

The DataCite assets server.
https://github.com/datacite/segugio
This repository holds the official metadata schemas from DataCite as required by the DataCite Metadata Store.
https://github.com/datacite/schema

Uses the middleman static site generator
https://middlemanapp.com/

Using Schema.org for DOI Registration

a "post" of the DataCite blog : https://doi.org/10.5438/0000-00CC

Three weeks ago we started assigning DOIs to every post on this blog "https://blog.datacite.or" (Fenner, 2016c). The process we implemented uses a new command line utility and integrates well with our the publishing workflow, with (almost) no extra effort compared to how we published blog posts before.

Given that DataCite is a DOI registration agency, we obviously are careful about following best practices for assigning DOIs. DataCite focusses on DOIs for research data, but many of the general principles can also apply to blog posts. And we have learned a few things already.

Using schema.org metadata embedded in landing pages

Our initial implementation collected the metadata required for DOI registration in a way that is specific to a particular type of blogging software, so-called static site generators. While popular, this leaves out a large number of blogs, for example every blog hosted by Wordpress, by far the most popular blogging platform. We have now relaunched our blog to collect metadata differently, generic enough to work for any blog, but also well aligned with best practices for DOIs.

Our practice is that every DOI should resolve to a landing page, and that landing page should provide both human- and machine-readable metadata

Machine-readable metadata can be embedded into web pages in a number of ways. Traditionally this was done using HTML meta tags, more recent approaches to embedding metadata in HTML include microdata, microformats and RDFa. An alternative approach is to embed the metadata using JSON and a  script tag The latter approach is easier to implement, as all metadata are in a single place, and the JSON can be embedded dynamically via a script.

As for the vocabulary, the DataCite Metadata Schema has never been widely used for metadata embedded in web pages. Dublin Core Metadata (“Dublin Core Metadata Element Set, Version 1.1,” 2012) are often used for metadata in HTML meta tags. Schema.org is an initiative started in 2011 with many of the same goals as Dublin Core, namely to create, maintain, and promote schemas for structured data on the Internet.

(...)

DOI minting workflow

Publishing a blog post with embedded schema.org metadata, which is then used to mint a DOI and register DOI metadata, changes the DOI minting workflow for this blog. Although the publication workflow of a blog is much simpler than for peer-reviewed content, there are still three distinct phases:

  • post is drafted by author
  • post is shared for feedback with staff (and possibly others)
  • post is published

Blog posts in JATS XML

Blog posts are web pages and the landing page for the DOI also contains the fulltext of the post. But there are good reasons to make a blog post also available in downloadable form, most importantly to facilitate reuse, and for archiving. Journal Article Tag Suite (JATS) is an XML standard for tagging journal articles, used by the PubMed Central full-text archive of biomedical literature and by an increasing number of scholarly publishers.

JATS is an appropriate format for the blog posts of this blog, and starting this week all of our posts are also available in JATS XML format. You can see the download URL in the schema.org markup (the JATS for this post is here), we will add a more visible link to all posts once some minor tagging issues are resolved. We will also start registering the download URL with the DataCite MDS as media, making the JATS XML available to DOI content negotiation, and thus direct download. This should facilitate reuse by others, e.g. aggregation of content from multiple sources and display of content in different formats. This blog uses the Creative Commons Attribution license, allowing the copying, redistribution and remixing of the material in any medium or format for any purpose.






Des carrières "rénovées" dans l'Enseignement supérieur et la Recherche. janvier 2017


4 filières et 8 corps concernés

- Filière chercheurs : Chargés de recherche.
- Filière enseignants chercheurs : Maîtres de conférences et assimilés - Maîtres de conférences des disciplines de santé.
- Filière des ingénieurs (ITA-ITRF) : Assistants ingénieurs - Ingénieurs d’études -  Ingénieurs de recherche des deux filières de recherche et formation et des établissements publics scientifiques et technologiques.
- Filière des bibliothèques : Bibliothécaires – Conservateurs des bibliothèques

http://www.education.gouv.fr/cid111672/des-carrieres-renovees-et-mieux-remunerees-dans-l-enseignement-superieur-et-la-recherche.html

dossier de presse:
http://cache.media.enseignementsup-recherche.gouv.fr/file/PPCR/93/8/ppcr_livret_694938.pdf


Thursday, January 12, 2017

API IMPORT in Zenodo, Zenodo Github. Research data repository and open access archives. ORCID and DataCite Metadata


Zenodo is a research data repository. It was created by OpenAIRE and CERN to provide a place for researchers to deposit datasets.
https://home.cern/about/updates/2013/05/cern-and-openaireplus-launch-european-research-repository

some examples

an example of a .zip with many pdf

https://zenodo.org/record/168580#.WIeylGrNzdQ

7 blocks (see green arrows)

  1. Title
  2. author
  3. abstract
  4. acknowledgments
  5. frame with pdf or zip...
  6. Files
  7. References




DOI





many export solutions

  1. BibTeX Export
  2. Citation Style Language JSON Export
  3. DataCite XML Export
  4. Dublin Core Export
  5. JSON Export
  6. MARC21 XML Export
  7. a link to Mendeley:
    https://www.mendeley.com/sign/in/?acw=&utt=


If you select JSON for example, you will get directly in the window:


another example, communities COAR

Publications and outputs from or related to the Confederation of Open Access Repositories (COAR). Topics on open access repositories, interoperability, usage data, vocabularies, training, licenses and more.
https://zenodo.org/communities/coar

another example, an article

https://zenodo.org/communities/2249-0205/?page=1&size=20
And
google search
gives this 2nd position:

a web service

Zenodo, a CERN service, is an open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science.

DOI

Zenodo assigns all publicly available uploads a Digital Object Identifier (DOI) to make the upload easily and uniquely citeable. Zenodo further supports harvesting of all content via the OAI-PMH protocol.
Withdrawal of data and revocation of DOIs:
Content not considered to fall under the scope of the repository will be removed and associated DOIs issued by Zenodo revoked. Please signal promptly, ideally no later than 24 hours from upload, any suspected policy violation. Alternatively, content found to already have an external DOI will have the Zenodo DOI invalidated and the record updated to indicate the original external DOI. User access may be revoked on violation of Terms of Use.

 DOI from DataCite not CrossRef then you cannot use the crossRef's services.
http://stephane-mottin.blogspot.fr/2017/02/tous-les-doi-noffrent-pas-des-services.html

log

You can log by
  • ORCID Id/passORCID
  • GitHub username/pass
  • email/pass

Upload

What can I upload?

All research outputs from all fields of science are welcome. In the upload form you can choose between types of files: publications (book, book section, conference paper, journal article, patent, preprint, report, thesis, technical note, working paper, etc.), posters, presentations, datasets, images (figures, plots, drawings, diagrams, photos), software, videos/audio and interactive materials such as lessons. We do check every piece of content being uploaded to ensure it is research related.

Dans le champ "description" qui a un  Rich text editor, on ne peut même pas copier/coller du HTML par exemple d'un article PLOS. On ne peut même pas mettre un lien.
On peut saisir une équation en TeX sous la forme par exemple (entre {}):
x = {-b \pm \sqrt{b^2-4ac} \over 2a}

community

Zenodo allows you to create your own collection and accept or reject uploads submitted to it. Creating a space for your next workshop or project has never been easier. Plus, everything is citeable and discoverable!

Want your own community?
It's easy. Just click the button to get started.
  • Curate — accept/reject what goes in your community collection.
  • Export — your community collection is automatically exported via OAI-PMH
  • Upload — get custom upload link to send to people
We currently accept up to 50GB per dataset (you can have multiple datasets); there is no size limit on communities.

Metadata types and sources

All metadata is stored internally in MARC according to the schema defined in http://inveniosoftware.org/wiki/Project/OpenAIREplus/DevelopmentRecordMarkup.
Metadata is exported in several standard formats such as MARCXML, Dublin Core, and DataCite Metadata Schema according to OpenAIRE Guidelines.


Open source

Powered by Invenio
Zenodo is a small layer on top of Invenio http://github.com/inveniosoftware/invenio, a ​free software suite enabling you to run your own ​digital library or document repository on the web.

code:
https://github.com/zenodo/zenodo


GitHub

Zenodo has integration with GitHub to make code hosted in GitHub citable.
  • Select the repository you want to preserve, and toggle the switch below to turn on automatic preservation of your software.
  • Go to GitHub and create a release. Zenodo will automatically download a .zip-ball of each new release and register a DOI.
  • After your first release, a DOI badge that you can include in GitHub README will appear next to your repository below.

https://zenodo.org/account/settings/github/

---

Ref.

https://en.wikipedia.org/wiki/Zenodo
https://en.wikipedia.org/wiki/Category:Open-access_archives

IMPORT in zenodo

resources

Invenio

Zenodo is a small layer on top of Invenio <http://github.com/inveniosoftware/invenio>, a ​free software suite enabling you to run your own ​digital library or document repository on the web.

Invenio is a free software suite enabling you to run your own digital library or document repository on the web. The technology offered by the software covers all aspects of digital library management, from document ingestion through classification, indexing, and curation up to document dissemination. Invenio complies with standards such as the Open Archives Initiative and uses MARC 21 as its underlying bibliographic format. The flexibility and performance of Invenio make it a comprehensive solution for management of document repositories of moderate to large sizes.

Invenio has been originally developed at CERN to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002, covering articles, books, journals, photos, videos, and more. Invenio is nowadays co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and is being used by many more scientific institutions worldwide.

zenodo interface

pour un upload à la main, il y a 11 catégories de champs

  1. Upload type 
    1. Book section 
    2. ... Journal article, etc
  2. Basic Info
    1. date
    2. Title
    3. Authors (one by one)!!!
    4. Description (only text (and math formula) without link!!!)
    5. Keyword
    6. Additional notes, for example sommaire
  3. License
    1. Open
    2. CC 4.0; you must add its category
  4. Communities
    1. integrations (for example)
  5. Funding
    1. CNRS (for exemple)
  6. related/alt identif
    1. ISSN, ISBN, URL
  7. Contributors for example the dir of collection
  8. reference
  9. journal
  10. c
  11. Book
    1. Publisher
    2. Place
    3. ISBN
    4. Book Title
    5. Page (of this book)

zenodo API

The process

an example:
Similar to figshare, Zenodo can store your data and give you a DOI to make it citable.
We have started to deposit all Brain Catalogue’s data at Zenodo, and soon you should be able to cite your favourite brains in your works.
Initially, we uploaded the data manually, but that became tedious very soon. Luckily, Zenodo has a very simple to use and well documented API. In just 3 lines of code using curl you can easily deposit a data file and make it citable (Full information is available at https://zenodo.org/dev).

Before starting anything you need to obtain a token, which is a random alphanumeric string that identifies your queries. You only need to do this once. With your token safely stored (I keep it in the $token variable), data uploading takes just 3 steps:

1. Create a new deposit and obtain a deposit ID:

curl -i -H "Content-Type: application/json" -X POST --data '{"metadata":{"access_right": "open","creators": [{"affiliation": "Brain Catalogue", "name": "Toro, Roberto"}],"description": "Brain MRI","keywords": ["MRI", "Brain"],"license": "cc-by-nc-4.0", "title": "Brain MRI", "upload_type": "dataset"}}' https://zenodo.org/api/deposit/depositions/?access_token=$token |tee zenodo.json

Zenodo responds with a json file, which here I’m saving to zenodo.json. Now you can use awk to parse that file and recover the deposit id. I do that like this:
zid=$(cat zenodo.json|tr , '\n'|awk '/"id"/{printf"%i",$2}')

With your deposit ID in hand, you are ready to upload your data file

2. Upload data file:

curl -i -F name=MRI.nii.gz -F file=@/path/to/the/data/file/MRI.nii.gz https://zenodo.org/api/deposit/depositions/$zid/files?access_token=$token

The server will respond with a HTTP 100 ‘Continue’ message, and depending on the size of your file you’ll have to wait some time. Once the upload is finished you are ready to

3. Publish your dataset:

curl -i -X POST https://zenodo.org/api/deposit/depositions/$zid/actions/publish?access_token=$token

And that’s it. You can now go to Zenodo and view the web page for your data


Ref.
http://siphonophore.org/blog/2016/01/16/at-brain-catalogue-we-love-zenodo/

---
A bug in JSON object
https://github.com/zenodo/zenodo/issues/865
on the web documentation API Documentation for developers ( https://zenodo.org/dev)
Resources > Representations > Deposition metadata > subjects

the example of json object for subject is :
[{"term": "Astronomy",
"id": "http://id.loc.gov/authorities/subjects/sh85009003",
"scheme": "url"}]
but id is not supported and the json is rejected
the field must named 'identifier'

resources

http://developers.zenodo.org/ 
(Zenodo REST API documentation uses Slate. )

bof: https://zenodo.readthedocs.io/



Ref. https://indico.cern.ch/event/533421/contributions/2330179/attachments/1378438/2094268/kumasi2016-practical-exercises-rest-api.pdf

blog zenodo

http://blog.zenodo.org/
Zenodo docs have landed!
by  Krzysztof Nowak on January 23, 2017

wiki zenodo

https://github.com/zenodo/zenodo/wiki/What's-new%3F

YAML Github
Zenodio is a Python package we’re building to interact with Zenodo. For our various doc/technote/publishing projects we want to use YAML files (embedded in a Git repository, for example) to maintain deposition metadata so that the upload process itself can be automated.
The zenodio.metadata sub package provides a Python representation of Zenodo metadata (but not File or Zenodo deposition metadata).
Zenodio is a simple Python interface for getting data into and out of Zenodo, the digital archive developed by CERN. Zenodo is an awesome tool for scientists to archive the products of research, including datasets, codes, and documents. Zenodio adds a layer of mechanization to Zenodo, allowing you to grab metadata about records in a Zenodo collection, or upload new artifacts to Zenodo with a smart Python API.
We’re still designing the upload API, but metadata harvesting is ready to go.
Zenodio is built by SQuaRE for the Large Synoptic Survey Telescope.
https://github.com/lsst-sqre/zenodio/tree/metadata_api
http://zenodio.lsst.io/en/latest/
https://jira.lsstcorp.org/browse/DM-4852

Differences between ORCID and DataCite (DOI) Metadata

THOR is a 30 month project funded by the European Commission under the Horizon 2020 programme. It will establish seamless integration between articles, data, and researchers across the research lifecycle. This will create a wealth of open resources and foster a sustainable international e-infrastructure.

Differences between ORCID and DataCite Metadata
One of the first tasks for DataCite in the European Commission-funded THOR project, which started in June 2015, was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, organizations and artefacts - and the relations between them - are described in the respective metadata schemata, and how they are implemented in two example data repositories, Archaeology Data Service and Dryad Digital Repository. The focus of our work was on identifying major gaps. Our report was finished and made publicly available in September 2015. The key findings are on these topics:
  • Common Approach to Personal Names
  • Standardized Contributor Roles
  • Standardized Relation Types
  • Metadata for Organisations
  • Persistent Identifiers for Projects
  • Harmonization of ORCID and DataCite Metadata

https://project-thor.readme.io/docs/differences-between-orcid-and-datacite-metadata

This document identifies gaps in existing PID infrastructures, with a focus on ORCID and DataCite Metadata and links between contributors, organizations and artefacts. What prevents us from establishing interoperability and overcoming barriers between PID platforms for contributors, artefacts and organisations, and research solutions for federated attribution, claiming, publishing and direct data access? It goes on to propose strategies to overcome these gaps.:
https://zenodo.org/record/30799#.WIi5DmrNzdQ

PLOS format


en coller texte et en mettant une URL img
In biophotonics, the light absorption in a tissue is usually modeled by the Helmholtz equation with two constant parameters, the scattering coefficient and the absorption coefficient. This classic approximation of “haemoglobin diluted everywhere” (constant absorption coefficient) corresponds to the classical homogenization approach. The paper discusses the limitations of this approach. The scattering coefficient is supposed to be constant (equal to one) while the absorption coefficient is equal to zero everywhere except for a periodic set of thin parallel strips simulating the blood vessels, where it is a large parameter  The problem contains two other parameters which are small: , the ratio of the distance between the axes of vessels to the characteristic macroscopic size, and , the ratio of the thickness of thin vessels and the period. We construct asymptotic expansion in two cases: and and prove that in the first case the classical homogenization (averaging) of the differential equation is true while in the second case it is wrong. This result may be applied in the biomedical optics, for instance, in the modeling of the skin and cosmetics.

en copier/coller
In biophotonics, the light absorption in a tissue is usually modeled by the Helmholtz equation with two constant parameters, the scattering coefficient and the absorption coefficient. This classic approximation of “haemoglobin diluted everywhere” (constant absorption coefficient) corresponds to the classical homogenization approach. The paper discusses the limitations of this approach. The scattering coefficient is supposed to be constant (equal to one) while the absorption coefficient is equal to zero everywhere except for a periodic set of thin parallel strips simulating the blood vessels, where it is a large parameter  The problem contains two other parameters which are small: , the ratio of the distance between the axes of vessels to the characteristic macroscopic size, and , the ratio of the thickness of thin vessels and the period. We construct asymptotic expansion in two cases:  and  and prove that in the first case the classical homogenization (averaging) of the differential equation is true while in the second case it is wrong. This result may be applied in the biomedical optics, for instance, in the modeling of the skin and cosmetics.

Les paramètres de sécurité de votre navigateur empêchent l'éditeur d'accéder directement aux données du presse-papier


Quand vous voulez coller votre clipboard dans une page de votre navigateur par exemple dans Rich text Editor, il arrive que l'on ait cette alerte:
"Les paramètres de sécurité de votre navigateur empêchent l'éditeur d'accéder directement aux données du presse-papier. Vous devez les coller à nouveau dans cette fenêtre."

By default, JavaScript is not allowed to read or set your clipboard data for security and privacy reasons. This is because websites scripts can erase and replace what you currently have in your clipboard (data loss issue) and they can read whatever you have in your clipboard (security and privacy issue); as such, you should grant access with caution. There are, however, instances when you might want to bypass this restriction for certain sites. Rich text editors (such as implementations of Mozilla's Midas) often require access to the clipboard to use copy/paste functions. Other sites may copy useful information to the clipboard for the user to paste elsewhere.

http://kb.mozillazine.org/Granting_JavaScript_access_to_the_clipboard
http://kb.mailchimp.com/fr/campaigns/design/prevent-formatting-problems-with-paste-from-rich-text

Wednesday, January 11, 2017

trier inverser un array dans excel


but:
copier-coller des lignes sur excel en inversant la séquence.
 Par exemple: j'ai dans ma colonne A:
aaa
bbb
ccc
ddd
eee

et j'aimerais avoir dans ma colonne B
eee
ddd
ccc
bbb
aaa

Réponse:

si les valeurs d'origine sont déjà classées dans un ordre (croissant ou décroissant).
faire avec trier (fig ci-dessus)
Si ce n'est pas le cas, il faut bricoler un peu.
- Prendre ou ajouter une colonne et mettre les cellules au format numérique,
- saisir 1,(cell+1)puis recopier ( les cellules vers le bas,
- trier cette colonne en ordre inverse et là les valeurs de la colonne à inverser seront dans l'ordre inverse.

transpose array (excel)

with Transpose function

https://support.office.com/en-us/article/TRANSPOSE-function-ed039415-ed8a-4a81-93e9-4b6dfac76027

with copy /specialPaste

  1. Select a cell range and choose Edit→Copy.
  2. Select a destination cell.
  3. Choose Edit→Paste Special.
  4. Select the Transpose check box and then click OK.

https://support.office.com/en-us/article/Transpose-rotate-data-from-rows-to-columns-or-vice-versa-3419f2e3-beab-4318-aae5-d0f862209744

Keyboard shortcuts for Google Sheets


https://support.google.com/docs/answer/181110?p=spreadsheets_shortcuts&visit_id=1-636197418403265415-4222612324&rd=1

To open a list of keyboard shortcuts in Google Sheets, press Ctrl + / (Windows, Chrome OS) or ⌘ + / (Mac).

You can also use menu access keys. Open any application menu using the keyboard, then type the underlined letter for the item you'd like to select. For example, to open the Insert menu on a Mac, press Ctrl + Option + I. To select "Image," type the underlined letter i.

Monday, January 9, 2017

lodel de cleo CNRS (openEdition.org)


Lodel est un logiciel d'édition électronique. Il permet de publier en ligne des articles issus d'un traitement de texte.

Lodel est un logiciel d’édition électronique simple d'utilisation et adaptable à des usages particuliers. Il appartient à la famille des gestionnaires de contenus (en anglais, Content management system, CMS) et s’est spécialisé dans l’édition de textes longs et complexes dans un environnement éditorial très structuré.

Modèle de document pour Microsoft Word
https://github.com/OpenEdition/lodel/wiki/Mod%C3%A8le-de-document-pour-Microsoft-Word
Cette version du modèle de document introduit des macros Word qui permettent d'automatiser la correction et le nettoyage des textes. Ces macros permettent de gagner beaucoup de temps dans la préparation des textes.

Lodel est un logiciel libre. Il est développé à 99.9% par le Cléo qui produit les plateformes d'OpenEdition (dont Revues.org et OpenEdition Books). Le code est partagé sous licence GPLv2 sur GitHub mais notre équipe est réduite et a beaucoup évolué ces dernières années. Nous disposons de peu de temps pour répondre aux questions posées sur la liste. Nous remercions au passage tous les abonnés qui prennent le temps de proposer des réponses.

Lodel est peu documenté. Lire le code et les logs d'erreur est souvent la seule méthode pour résoudre les problèmes. Pour lire les logs d'erreurs, il faut y avoir accès, ce qui n'est pas toujours le cas dans les hébergements mutualisés. Lodel n'est pas bien "packagé". Nous l'utilisons au Cléo sous Debian, mais nous n'avons pas testé d'autres environnements d'installation. Dans d'autres configurations, il est possible que des adaptations soient nécessaires. Lodel existe depuis longtemps. La base de Lodel date de la première moitié des années 2000. Il a évolué sur cette base. Il est daté, en particulier dans sa conception et dans certaines librairies utilisées (pear par exemple).

Donc pour résumer, la prise en main n'est pas simple et il faut disposer de compétences informatiques et de persévérance pour l'utiliser...


Installation

Notez qu'une version pré-installée de Lodel (et OTX, l’application de conversion Word/Office vers XML/TEI) en tant qu’image de machine virtuelle linux Debian est téléchargeable à l’adresse : http://lodel.org/downloads/vms/

Pré-requis

  • Utiliser son propre serveur linux, Lodel n'est pas utilisable sur hébergement dédié.
  • Serveur HTTP (nginx, apache) avec PHP
  • Serveur MySQL
  • pour être utilisé avec OTX, il faut une valeur de max_allowed_packet et key_buffer très grande (16 M)

Marche à suivre

  • Cloner de préférence la dernière version tagguée
  • Faire pointer le virtual host sur la racine de l'installation lodel.
  • L'utilisateur du serveur HTTP doit avoir les droits de lecture sur tous les fichiers.
  • Créer une base de donnée et un utilisateur ayant les droits de modification sur cette base.
  • Aller à l'adresse configurée avec un navigateur web, suivre les instructions.
  • Il faudra donner temporairement les droits d'écriture sur le dossier d'une instance de site.
  • Vérifer qu'à l'intérieur du dossier d'un site l'utilisateur du serveur HTTP a bien les droits d'écriture sur les dossiers: upload, docannexe, docannexe/file, docannexe/image, lodel/sources, lodel/icons


https://lodel.org/
http://cleo.openedition.org/


https://github.com/OpenEdition/lodel/

système de gestion de base de données orienté documents (JSON:CouchDB ) (XML: et non en lignes et en colonnes.



JSON

Apache CouchDB est un système de gestion de base de données orienté documents, écrit en langage Erlang et distribué sous licence Apache.

Conçu pour le Web, il fait partie de la mouvance NoSQL, et a été conçu pour pouvoir être réparti sur de multiples serveurs.

Conception
Au lieu d'être ordonnée en lignes et en colonnes, la base de données CouchDB est une collection de documents JSON. 
De plus, CouchDB contient un serveur HTTP qui permet d'effectuer des requêtes, et renvoie ses données sous forme JSON. 
On peut ainsi interroger un serveur CouchDB directement avec un navigateur Web, ou on peut exécuter des requêtes avec JavaScript. 

Les principales opérations effectuées sont MAP et REDUCE (voir article MapReduce https://fr.wikipedia.org/wiki/MapReduce). Ces opérations sont utiles lorsque la base de données est répartie, elles sont soumises à des contraintes de commutativité, d'associativité et d'idempotence.

https://fr.wikipedia.org/wiki/CouchDB
https://en.wikipedia.org/wiki/CouchDB (plus complet)

http://couchdb.apache.org/

XML

Une base de données XML Native (NXD en anglais) est une base de données qui s'appuie sur le modèle de données fourni par XML. Elle utilise typiquement des langages de requête XML comme XPath ou XQuery.

L'indexation dans une base de données XML nécessite d'indexer non seulement le contenu des éléments mais aussi la structure, les relations entre éléments pour que des requêtes XPath comme /foo/bar utilisent l'index.


There are a number of reasons to directly specify data in XML or other document formats such as JSON. For XML in particular, they include:

An enterprise may have a lot of XML in an existing standard format

  • Data may need to be exposed or ingested as XML, so using another format such as relational forces double-modeling of the data
  • XML is very well suited to sparse data, deeply nested data and mixed content (such as text with embedded markup tags)
  • XML is human readable whereas relational tables require expertise to access
  • Metadata is often available as XML
  • Semantic web data is available as RDF/XML

Steve O'Connell gives one reason for the use of XML in databases: the increasingly common use of XML for data transport, which has meant that "data is extracted from databases and put into XML documents and vice-versa". It may prove more efficient (in terms of conversion costs) and easier to store the data in XML format. In content-based applications, the ability of the native XML database also minimizes the need for extraction or entry of metadata to support searching and navigation.

https://en.wikipedia.org/wiki/XML_database
https://fr.wikipedia.org/wiki/BaseX

Glossa supporters are encouraging colleagues not just to submit to Glossa, but also to abandon Lingua (elsevier), which they now call “zombie Lingua.”


Academics Want You to Read Their Work for Free (or small article processing charge)
https://en.wikipedia.org/wiki/Article_processing_charge


https://en.wikipedia.org/wiki/Lingua_(journal)
Lingua: An International Review of General Linguistics is a peer-reviewed academic journal of general linguistics that was established in 1949 and is published by Elsevier.
https://www.journals.elsevier.com/lingua

The editorial board of the former Lingua continues publishing their journal. As Elsevier insists they hold the right to the journal title “Lingua”, the original editorial board continues publishing their journal under the new name Glossa in association with Ubiquity Press.
http://www.glossa-journal.org/
https://en.wikipedia.org/wiki/Ubiquity_Press

---

In 2012, more than 12,000 researchers vowed to boycott Elsevier for supporting the Research Works Act (RWA), a bill that would have made it illegal for federal grants to require grantees to publish the work in open-access journals. Members of the academic community saw this as a move to protect big publishers’ business interests while restricting open-access options. More recently, Elsevier was hit with another wave of negative publicity for issuing takedown notices to scientists sharing copies of their published research on their personal websites and on Academia.edu, a social-networking site for academics.

Now, Glossa supporters are encouraging colleagues not just to submit to Glossa, but also to abandon Lingua, which they now call “zombie Lingua.”

“Glossa is the new Lingua—same [editorial] processes, same team, same editorial board, same editors. Only the name changes,” says Rooryck. On blogs and online message boards, Glossa supporters have been rallying their colleagues to refrain from submitting, reviewing, or editing papers for Lingua. Scores of authors are moving their Lingua submissions to Glossa; Rooryck says that thus far, between regular submissions and a Lingua special issue, authors have pulled around 100 papers from Lingua and transferred them to Glossa.

Harry Whitaker, the interim editor-in-chief of Lingua, disapproves of the Glossa editorial board’s approach. “What’s the point of trying to tear down Lingua?” he asks. “It doesn’t add anything to whatever luster Glossa may acquire.”

Whitaker, who founded two other Elsevier journals and has a combined 50 years of editorial experience with the company, came into his new position after he heard about the former Lingua board’s actions and contacted Elsevier to express his dismay. “I disagreed with just about everything they were doing,” he said. He came out of retirement to sign a new contract with Elsevier in early January, and has since recruited several interim editors. He says that he and his editorial staff have received a fair amount of animosity from Glossa supporters.

But Whitaker stands firmly in favor of for-profit publishing; noting that publishers’ profits allow them to invest in new projects. (Elsevier gave Whitaker funds to found two new journals—Brain and Cognition and Brain and Language.) Plus, he says, profits ensure longevity. “That’s one of the many reasons I support the idea of a publisher that makes money,” he says. “Lingua will be here when I retire, and Lingua will be here when I die.”

The fate of Cognition, meanwhile remains to be seen. Barner and Snedeker plan to submit their petition to Elsevier on Wednesday. “The battle has been taken from a very small region—linguistics—to a much larger one,” says Rooryck. Barner and Snedeker are staying silent about their long-term plans, but their request sends a clear message to publishers: Scientists are ready for change.

http://www.theatlantic.com/science/archive/2016/01/elsevier-academic-publishing-petition/427059/