Skip to main content
Scholarly Communication

Copyright and Data Curation

Digital technologies have engendered new research methodologies that can render mass collections or assemblages of things as data and analyze them as such. Things such as images, the millions of books on Google Books, or commercial databases of scholarly research articles that were originally created to be viewed or read can now be mined for data, coded, and analyzed statistically.

These new technologies and research methods, like many technologies before them, raise concomitant copyright issues and questions. In addition, the advent of open data policies from the U.S. government, foundations, and other grant funders have also raised questions from researchers about who owns data; what, if any legal protections exist for data; and how other researchers may use such data? These questions arise throughout the life cycle of data, from its creation, to archiving it, and its possible licensing for use by other researchers.

Data and its curation clearly raise other legal issues as well, including privacy, cybersecurity, trade secrets, and patent law. In the context of copyright law, data implicates issues about the subject matter and ownership of copyright, or what is copyrightable, and who owns the copyright in copyrightable intellectual property.

Data v. Databases

By data, I mean the raw content of assembled, collected, or generated stuff to be subjected to statistical analysis and interpretation. Illustrations or representations of the analyzed data in tables, charts or graphs, present related but separate copyright issues.

By databases, I am referring to the organization of the data, its relationship to different data elements, or how the data is organized in a structured set of data, typically stored in a computer, and made accessible and manipulable by means of software applications.

Copyrightability of Data and Databases

U.S. copyright has very little to say, at least not directly, about either data or databases. Instead, copyright law provides a framework for establishing the subject matter of copyright – or what is copyrightable – and who owns copyrightable intellectual property once it has been created. Copyright law then provides certain protections for that copyrightable intellectual property in the form of specifically enumerated exclusive rights granted to copyright owners.

Under the law, copyright protection is granted to “…original works of authorship fixed in any tangible medium of expression…” A lot of data will not be copyrightable because it does not meet the first requirement for copyright protection, namely, originality. While many sources of data, such as images or texts in a database, are of course copyrightable, the data generated from those sources, as well as other data sets generally, does not constitute an “original work of authorship,” as described by the Copyright Act and litigated in numerous cases. This might not make sense to a lot of researchers: if a researcher designs an experiment or study, runs experiments or conducts surveys, collects and compiles the data, isn’t that original, and aren’t they the author of it? Yes, in a certain sense, but not in the sense that is important for copyright. Copyright is intended to incentivize the publication and distribution of creative works. Facts and data aren’t considered original works of authorship because they are not “created” so much as they are “engendered” by or are a result of a researcher’s methods. They are discovered and compiled, and copyright does not reward that effort.

Moreover, data is typically factual or informational, and U.S. copyright does not protect facts or information. It is not possible to copyright facts, ideas, procedures, processes, methods, systems, concepts, formulas, algorithms, principles or discoveries, although such things might be protectable by patent law.

Similarly, while U.S. copyright law does protect compilations, Congress has not seen fit to extend copyright protection to databases themselves. There could nevertheless be a thin layer of copyright protection in a database, premised on choices regarding what data to include in the database, the organization of the data, or defining the relationships between different data elements. Such creative decisions potentially meet the requirements for copyrightability and copyright protection.

Ownership and Protection of Data and Databases

Because of the varying degrees of copyrightability of databases and data content, and because copyright only protects copyrightable works, different strategies are required to manage the ownership and protection of data and databases. Copyright can govern the use of databases and some data content (that is “an original work of authorship”), but other mechanisms must be relied on to regulate access to and the use of data and databases, typically on the basis of access controls by means of authentication, and contracts and licensing agreements to restrict the extraction and reuse of the data, or other contents of a database.

Data Curation and Licensing

Ideally, repository collections of data will provide information regarding the terms of use for the database and its data content. The Open Data Commons group (http://opendatacommons.org) has developed three standard licenses based on copyright and contract principles. They are:
1. Public Domain Dedication and License (PDDL): This dedicates the database and its content to the public domain, free for everyone to use as they see fit.
2. Attribution License (ODC-By): Users are free to use the database and its content in new and different ways, provided they provide attribution to the source of the data and/or the database.
3. Open Database License (ODC-ODbL): ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.


2 Comments

  • Naz Pantaloni says:

    Thanks, Gail,

    We are interested and I think many readers of this blog at IU would appreciate being aware, if they aren’t already, of the work of the Research Data Alliance. I am sharing a link to your website for those who wish to pursue it: https://www.rd-alliance.org/.

  • Gail Clement says:

    Helpful to see your thoughts! You and your colleagues might find some interest in the extensive work of the Research Data Alliance’s Group on Legal Interoperability of Research Data, where lawyers, researchers, funding agencies, librarians, and open science enthusiasts have collaborated to devise Principles and Guidelines. We’d be grateful for any feedback you might have on our reports and presentations, ideas for areas needing further work, and hope you might consider joining the group!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.