OD Standards Definition

An open data standard is a set of specifications (or requirements) for how some sets of data should be made publicly available. Generally, open data standards describe data about a particular subject, for example service requests (Open311) or building permits (BLDS). Like the data they describe, open data standards are generally developed “in the open”, meaning that anyone who is interested has a way to contribute. There are several types of standards, two of which are explained below:
  • Schematic: Schematic standards define the structure of the data to be published. This includes the names, descriptions, and data types of data fields or columns. Schematic standards also may include how one dataset is related to another.
  • Semantic: Semantic standards define the terminology or language in the data which is published. For example, U.S. FBI NIBRS / Unified Crime Reporting semantics include the terms “Arson” and “Robbery,” with a detailed definition of what each term means. (Note: NIBRS also includes schematic standards.)
  • Atomic: Atomic standards define how basic elements of data must be represented when there is an opportunity for confusion. Atomic standards may represent individual data values or a combination of data values. For example, a date and time should be formatted as “2017-01-01T13:00:00Z” (the ISO8601 representation for “January 1, 2017 at 1pm Universal Coordinated Time”). For another example, a spatial location should be represented as “+37.5665,+126.9780” (the WGS84 coordinates for Seoul, South Korea).
Some open data standards combine one or more of the above types where it makes sense to do so. For example, BLDS recommends publishing a data element named PermitClassMapped which may contain one of two values: “Residential” or “Non-Residential”. BLDS further defines “Residential” as ‘Residential (Single Family/Duplex)’ and “Non-Residential” as ‘Non-Residential (Commercial, Industrial, Institutional).’

Example

Open data standards allow data to be more easily consumed and repurporsed for valuable ends. Consider the General Transit Feed Specification (GTFS). Through collaborations with public and private organizations, a standarad for public transporation was developed. Since its establishment, several municipalities across North America have adopted the standard. The GTFS has made it easier for citizens to know what public transportation to take to get to their destination on time.

Metrics

Click-to or hover-over to see metric's description



  • Open License

    What qualifies a standard as being “open” is debated. However, openness may be inferred when the standard is published under an open license. Open licenses iterate that anyone has the right to repurpose and share the material without restriction. Examples of open licenses include public domain licenses, the UK Open Government License v3, creative commons licenses, and open data common licenses (World Bank, Open Data Essentials).

  • Transferable to Other Jurisdictions

    There is a hierarchy regarding the ease of implementation for a standard. For example, CSV format of a standard requires a minimal degree of resources and technical knowledge. On the other hand, more complex and sophisticated ways of formatting standards, such as RDF and SOAP, are not as easy for municipal bodies to implement. More often than not, sophisticated formats tend not to be manageable for municipal actors that lack resources and technical background. Standards that handle dynamic data and cURL APIs exemplify more complex ways of publishing city datasets.

  • Stakeholder Participation

    Stakeholders for a standard include civil society, government, and the private sector. An open standard should aim to include all types of stakeholders in its conception and maintenance. Types of stakeholder participation can be inferred based on the types of publisher reputations.

  • Consensus-Based Governance

    Standardization implies an ongoing dialogue between producers and consumers of data. It is important to note that consensus-based governance does not mean that all inputs are accepted if the majority agrees. Instead, consensus-based indicates a process willing to address any request pertaining to the standard’s statement of purpose. A charter providing transparency of decisions about the standard’s evolution support a consensus-based approach. Consensus-based governance can be inferred by the presence of a mailing list or active working group for the standard.

  • Extensions

    This indicates the flexibility of a standard’s implementation. Extensibility of a standard provides insight into how a standard is being implemented and enhanced for specific purposes.

  • Human Readable

    Human readable requires a medium of data or information that can easily be understood by people. Therefore, the standard should encode the data by using easily identifiable text. Of course, there are semantic consideration for human readable standards. For example, there could be a variety of interpretive meanings associated with encoding the data through text.

  • Machine Readable

    Acceptable machine readable structures include XML, RSS feed, CSV, RDF, JSON, TXT, XLS(X), and KML formats. Formats that are not machine readable include PDF, HTML, DOC(X), anything scanned, anything faxed, and anything typed in an email (Suszan, 2014). Standard’s ought to compliment techniques to provide human and machine readable structures for the data. Publishing data as machine readable includes (1) established standard vocabularies, (2) enriching the HTML resources with metadata, semantics, and identifiers, (3) and implementing simple, manageable, and stable URIs (Bennett and Harvey, 2009). Data tables, according to the standard’s specification, should be normalized so to be incorporated into a relational database.

  • Requires Up-To-Date Data

    This metric varies depending on the domain of the data. Some domains require formats that handle data in real time. However, other domains may require that the standard specify that data be updated quarterly or annually. For example, standards that handle transit and road construction data would require a web feed format to deliver updates about developments as they occur. However, budget datasets only requires a quarterly or yearly update. In practice, many municipal publishers still publish data in static files.

  • Takes into Account Associated Metadata for the Dataset

    This metric checks whether the standard schema requires metadata. A “yes” for this metric indicates a presence of both descriptive and structural metadata for the primary data. Each standard should readily make available the time and date of the data’s creation, the author, location of the data on the computer networks, and information about any standard applied to the raw data. Metadata should have embedded permanent and/or discoverable URIs and should utilize electronic citations of the data in the form of hyperlinks (Bennett and Harvey, 2009).

Categories

Originally defined by Jury Konga, categories have evolved to meet a growing inventory of open data standards