back to PRO homepage
  Home |   Linked Open Data |   Browse |   Search |   Statistics |   Download
 RACE-PRO
 PRO tracker
 Wiki
 PRO Consortium
 Publications
 Documentation

Protein Ontology Linked Open Data

PRO Dataset Description

Back to Top
Back to Top

VoID

High-quality consistent metadata description is essential to the successful discovery, exchange, and query of a Linked Dataset. The Protein Ontology Linked Dataset is accompanied with a full VoID metadata description, which is compliant with the W3C HCLS specification. The full VoID description is at void.ttl.

Summary Level Description

This level provides a description of a dataset that is independent of a specific version or format.

  • Prefix

    @prefix : <#> .
    @prefix void: <http://rdfs.org/ns/void#> .                        # Describing Linked Datasets with the VoID Vocabulary
    @prefix void-ext: <http://ldf.fi/void-ext#> .                     # Extensions to the Vocabulary of Interlinked Datasets (VoID)
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .      # RDF Syntax
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .           # RDF Schema
    @prefix owl: <http://www.w3.org/2002/07/owl#> .                   # OWL ontology
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .                # XML Schema
    @prefix dcterms: <http://purl.org/dc/terms/> .                    # Dublin Core Metadata Terms
    @prefix dctypes: <http://purl.org/dc/dcmitype/> .                 # Dublin Core Metadata Types
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .                      # Friend-of-a-Friend
    @prefix cito: <http://purl.org/spar/cito/> .                      # Citation Typing Ontology
    @prefix dcat: <http://www.w3.org/ns/dcat#> .                      # Data Catalog
    @prefix freq: <http://purl.org/cld/freq/> .                       # Collection Description Frequency Vocabulary
    @prefix idot: <http://identifiers.org/idot/> .                    # Identifiers.org vocabulary
    @prefix lexvo: <http://lexvo.org/ontology#> .                     # Lexical Vocabulary
    @prefix pav: <http://purl.org/pav/> .                             # Provenance Authoring and Versioning ontology
    @prefix prov: <http://www.w3.org/ns/prov#> .                      # PROV Ontology
    @prefix schemaorg: <http://schema.org/>  .                        # schema.org vocabulary
    @prefix sd: <http://www.w3.org/ns/sparql-service-description#> .  # SPARQL 1.1 Service Description
    @prefix sio: <http://semanticscience.org/resource/> .             # Semanticscience Integrated Ontology (SIO)	

  • Dataset Identification and Declaration of Type

    All summary and version level descriptions are typed as "dctypes:Dataset". The distribution level description can also be typed as "dctypes:Dataset" but needs to be additionally typed as a "dcat:Distribution". RDF formatted datasets are typed as an instance of a "void:Dataset".

    :pro
        rdf:type dctypes:Dataset .

  • Title

    The "dct:title" defines a human-readable title for the dataset. Alternative or older titles can be specified using "dct:alternative".

    :pro
        dct:title "PRO"@en ; 
        dct:alternative "Protein Ontology"@en . 
    

  • Description

    The "dct:description" describes the contents of the dataset.

    :pro
        dct:description """PRO describes the relationships of proteins and protein evolutionary classes, delineates the multiple protein forms of a gene locus (ontology for protein forms), protein complexes, and interconnects existing ontologies. Further information is available at http://www.proteininformationresource.org/pro/."""@en .
    

  • Publisher

    The "dct:publisher" states the person, organisation, or service that is responsible for publishing the dataset.

    :pro
        dct:publisher [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] .
    

  • Change Frequency

    The "dct:accrualPeriodicity" from the Dublin Core Frequency vocabulary ( DCFreq) specifies the update frequency of the dataset.

    :pro
        dct:accrualPeriodicity freq:quarterly .
    

  • Webpage and Logo

    The "foaf:page" states a link to human-readable web page. "schemaorg:logo" states a link to an images file containing the logo for the dataset.

    :pro
        foaf:page <https://proconsortium.org/> ;
        schemaorg:logo <https://proconsortium.org/PROlogofinalEDGclear.png> .
    

  • Keywords

    The "dcat:keyword" provides the keywords and topics of coverage for the dataset.

    :pro
        dcat:keyword "Biomedical ontology"^^xsd:string, "Protein ontology"^^xsd:string, "Community annotation"^^xsd:string, "Protein"^^xsd:string .
    

  • Licensing and Rights

    The "dct:license" states the license under which the dataset is published.

    :pro
        dct:license <https://creativecommons.org/licenses/by/4.0/> ;
        dct:rights """The PRotein Ontology is licensed under CC BY 4.0. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially. You must give appropriate credit (by using the original ontology IRI for the whole ontology or original term IRIs for individual terms), provide a link to the license, and indicate if any changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.""" .
    

  • Language

    The "dct:language" declares the languages in which the dataset is published.

    :pro
        dct:language <http://lexvo.org/id/iso639-3/eng> .
    

  • References

    The "cito:citesAsAuthority" from the Citation Typing Ontology (CiTO) links to publications about the dataset.

    :pro
        cito:citesAsAuthority <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210558/> .
    

  • Preferred and Alternative Prefixes

    The "idot:preferredPrefix" specifies the short names for the dataset. The "idot:alternatePrefix" specifies the alternate short names for the dataset.

    :pro
        idot:preferredPrefix "pro" ;
        idot:alternatePrefix "prodb" .
    

Version Level Description

This level captures version-specific characteristics of a dataset.

  • Versioning

    The "dct:isVersionOf" relates the version level description to the summary level description.

    :pro61_0
        dct:isVersionOf :pro .
    

    The "pav:version" declares the version identifier.

    :pro61_0
        pav:version "61_0"^^xsd:string .
    

    The "pav:previousVersion" links to the previous version of the dataset.

    :pro61_0
        pav:previousVersion :pro60_0 .
    

    The "pav:hasCurrentVersion" declares the current version of the dataset.

    :pro
        pav:hasCurrentVersion :pro61_0 .
    

  • Dates of Creation and Issuance

    The "dct:created" states the date the dataset was generated. "dct:issued" states the date the dataset was made public.

    :pro61_0
        dct:created "2020-08-18"^^xsd:date ;
        dct:issued "2020-08-18"^^xsd:date .
    

  • Authorship, Creation, Curation

    The "dct:creator" states the individual or organization responsible for creating the dataset. The value should be an IRI that can be resolved for more information. PAV ontology can be used to define fine-grained attribution of a creation event. For example, "pav:authoredBy" states the URIs of the authors. "pav:authoredOn" states the date of authorship. "pav:createdBy" states the URIs of the creators. "pav:createdOn" states the date of creation. "pav:curatedBy" states the URIs of the curators. "pav:curatedOn" states the date of curation.

    :pro61_0
        dct:creator [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
        pav:authoredBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
        pav:authoredOn"2020-08-18"^^xsd:date ;
        pav:createdBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
        pav:createdOn"2020-08-18"^^xsd:date ;
        pav:curatedBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
        pav:curatedOn"2020-08-18"^^xsd:date . 
    

Distribution Level Description

This level captures metadata about a specific form and version of a dataset.

  • Distributions and Formats

    Version level description should link to the distribution level descriptions that represent the files in different formats.

    :pro61_0
        dcat:distribution :pro61_0rdf .
    
    :pro61_0rdf
        a dctypes:Dataset, dcat:Distribution, void:Dataset ;
        dct:format "application/rdf+xml" ;
    

  • Vocabularies

    The "void:vocabulary" from VoID describes the RDFS vocabularies or OWL ontologies that represent the data.

    :pro61_0rdf
        void:vocabulary <http://www.w3.org/ns/dcat#>, <http://purl.org/dc/terms/> .
    

  • Conformance

    The "dct:conformsTo" indicates a particular format or standard the dataset conforms to.

    :pro61_0rdf
        dct:conformsTo <http://www.w3.org/2001/sw/hcls/notes/hcls-dataset/> .
    

  • Subsets

    The "dct:hasPart" indicates the parts of dataset.

    :prolod61_0rdf
        dct:hasPart :pro61_0rdf dct:hasPart :paf61_0rdf ;
        void:subset :pro61_0rdf, :paf61_0rdf
    

  • File Locations

    The "dcat:downloadURL" declares the distribution file. The "dcat:byteSize" declares the size of the distribution file. For RDF resources, the "void:dataDump" declares the distribution file. The "dcat:accessURL" specifies a directory containing the files of interest.

    # Summary level declaration
    :pro
        dcat:accessURL <https://lod.proconsortium.org/releases/latest/> .
    
    # Version level declaration 
    :pro61_0
        dcat:accessURL <https://lod.proconsortium.org/releases/release_61.0/> ;
        dcat:downloadURL <https://lod.proconsortium.org/releases/release_61.0/pro.owl/> ;
        dcat:landingPage <https://lod.proconsortium.org/release.html/> .
    
    # Distribution level declaration
    :pro61_0rdf
        dcat:accessURL <https://lod.proconsortium.org/releases/release_61.0/> ;
        dcat:downloadURL <https://lod.proconsortium.org/releases/release_61.0/pro.owl> ;
        void:dataDump <https://lod.proconsortium.org/releases/release_61.0/pro.owl> .
    

  • SPARQL Query Endpoint

    The "void:sparqlEndpoint" specifies a SPARQL endpoint at the summary level.

    :pro
        void:sparqlEndpoint <https://sparql.proconsortium.org/virtuoso/sparql> .
    

  • Dataset Documentation

    The documentation for the dataset is made available by using the "dcat:landingPage".

    :prolod61_0
        dcat:landingPage <https://lod.proconsortium.org/dataset.html> .
    

  • Identifier, Resource, and Access Patterns

    The "idot:identifierPatthern" identifies items or records in the dataset using a regular expression pattern for distribution level description.

    :pro61_0rdf
        idot:identifierPattern "PR_\\d+"^^xsd:string .
    

    The "void:uriRegexPattern" denotes a superset of data item URIs in the dataset using a regular expression pattern for distribution level description.

    :pro61_0rdf
        void:uriRegexPattern "http://purl.obolibrary.org/obo/PR_\\d+" .
    

  • Example Identifier and Resource

    The "idot:exampleIdentifier" provides an example identifier for distribution level descriptions.

    :pro61_0
        idot:exampleIdentifier "PR_000000002"^^xsd:string .	
    

    The "void:exampleResource" provides an example resource for distribution level descriptions.

    :pro61_0rdf
        void:exampleResource <https://sparql.proconsortium.org/virtuoso/describe/?uri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPR_000000002> .
    


Linked Dataset

Linksets are a way of identifying the content that links instances in one dataset with instances in another dataset. A separate linkset is created for each link predicate relating a particular pair of datasets. A linkset should be declared to be a subset of the dataset which publishes it. The linkset itself MUST be declared to be of type void:Linkset and provide the same metadata as an RDF Distribution. There are some metadata properties specific for a linkset. These are stating the source and target of each link, i.e. the datasets that are linked and the predicate used in the links. The statistics relevant for a linkset are the number of triples it contains. This can be reported using the void:triples property.

PRO-UniProt Linkset

:pro61_0-uniprotkb-exactMatch-linkset
#Linkset specific Metadata
	a void:Linkset ;
	void:subjectsTarget :pro61_0;
	void:objectsTarget <http://purl.uniprot.org/void#UniProtDataset_2019_11> ;
	void:linkPredicate skos:exactMatch ;
#Metadata for a RDF distribution
	a dctypes:Dataset, dcat:Distribution, void:Dataset ;
	dcterms:format "text/turtle" ;
	dcterms:title "PRO Version 61_0 to UniProtKB ExactMatch Linkset"@en ;
	dcterms:description """A linkset connecting PRO Version 50_0 to UniProtKB with skos:exactMatch linkPredicate"""@en ;
	dcterms:created "2020-08-18"^^xsd:date ;
	dcterms:issued "2020-08-18"^^xsd:date ;
	dcterms:creator [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
	pav:authoredBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
	pav:authoredOn"2020-08-18"^^xsd:date ;
	pav:createdBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
	pav:createdOn"2020-08-18"^^xsd:date ;
	pav:curatedBy [ foaf:page <https://proconsortium.org/pro_cst.shtml> ] ;
	pav:curatedOn"2020-08-18"^^xsd:date ; 
	dcterms:license <https://creativecommons.org/licenses/by/4.0/> ;
	dcterms:rights """The PRotein Ontology is licensed under CC BY 4.0. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially. You must give appropriate credit (by using the original ontology IRI for the whole ontology or original term IRIs for individual terms), provide a link to the license, and indicate if any changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.""" ;
	dcterms:language <http://lexvo.org/id/iso639-3/eng> ;
	void:vocabulary <http://www.w3.org/ns/dcat#>, <http://purl.org/dc/terms/> ;
	dcterms:conformsTo <http://www.w3.org/2001/sw/hcls/notes/hcls-dataset/> ;
#Identifiers
	idot:identifierPattern "PR_\\w"^^xsd:string ;
	void:uriRegexPattern "http://purl.obolibrary.org/obo/PR_\\w" ;
	idot:accessIdentifierPattern  "https://sparql.proconsortium.org/virtuoso/describe/?url=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FPR_\\w" ;
	idot:exampleIdentifier "PR_Q96PN6"^^xsd:string ;
#Provenance and Change
	pav:version "61_0"^^xsd:string ;
	pav:previousVersion :pro60_0-uniprotkb-exactMatch-linkset ;
	dcterms:isVersionOf :pro-uniprotkb-exactMatch-linkset ;
#Availability/Distributions
	dcat:distribution :paf61_0rdf-uniprotkb-exactMatch-linkset ;
	dcat:downloadURL <https://lod.proconsortium.org/releases/release_61.0/pro-uniprotkb-exactMatch-linkset.ttl> ;
	dcat:landingPage <https://lod.proconsortium.org/release.html> ;
	void:sparqlEndpoint <https://sparql.proconsortium.org/virtuoso/sparql> .