With its expansive reach, health data covers many subject areas — from analytics to biomedical research, mobile health applications, to policy and the organizations that oversee it. For even a veteran of the health data sphere, the terminology may be obscure or confusing. The Health Datapedia strives to clarify and shed light on some of these commonly used terms.
A · B · C · D · E · F · G · H · I · J · K · L · M · N · O · P · Q · R · S · T · U · V · W · X · Y · Z
Accountable Care Organizations (ACOs): A network of doctors, hospitals or other healthcare providers that share financial and medical responsibility for providing coordinated care to patients in hopes of limiting unnecessary spending
AHRQ: Agency for Health Care Research and Quality
All-Payer Claims Databases (APCDs): Databases created by state mandata, that typically include data derived from medical claims, pharmacy claims, eligibility files, provider files, and dental claims from private and public payers. In states without a legislative mandate, there may be voluntary reporting of these data[17]
Anonymize: The process of removing all personal identifying information from data
Application Programming Interface (API): An interface implemented by a software program that enables it to interact with other software. APIs are important in the healthcare space because they allow third-party programmers (or innovators) to adapt existing legacy systems to evolving new systems
Armed Forces Health Longitudinal Technology Application (AHLTA): The current Department of Defense electronic medical records system
Attribution License: A license that requires an original source of licensed material to be cited
Authentication: Determining the identity of a principal
Authorization: Determining the rights of an authenticated principal
Big Data: The definition of “big data” is constantly evolving but it is often characterized by three “Vs”—volume, velocity and variety
Biobank: A large repository of tissue samples collected from patients over the course of a research study
Blue Button: A broad initiative across healthcare payers that seeks to empower patients by enabling them to access their own medical information in electronic form
Business Associate: Under HIPAA, a business associate is a person to whom a “covered entity” discloses PHI so that the business associate can carry out, assist with the performance of or perform on behalf of, a function or activity for the covered entity. A business associate can include: lawyers, auditors, data storage companies that maintain PHI, vendors or PHRs, and subcontractors that create, receive, maintain or transmit PHI on behalf of the business associate
CDC: Centers for Disease Control and Prevention
Clinical Data Research Networks (CDRNs): Networks that conduct randomized comparative effectiveness studies using data from clinical practice in a large, defined population
Clinical Decision Support: Providing relevant information to a clinician at the right time and place in order to provide ideal healthcare
Clinical Document Architecture (CDA): A markup standard developed by the organization, Health Level 7 (HL7) that defines the structure of clinical documents including discharge summaries and progress notes. Clinical documents can include images, text and other forms of multimedia. CDA is based on XML
Cloud-based: Technology that allows software to be run and data to be stored on remote servers[1]
Connectivity: The ability of communities to connect to the Internet
CMS: Centers for Medicare and Medicaid Services
Common Rule: A federal rule of ethics that governs biomedical and behavioral research involving human subjects in the U.S.
Comparative effectiveness research: Research that informs clinical decisions by comparing evidence on the benefits, harms and effectiveness of various medical treatments
Covered Entit(ies): Includes a healthcare provider that conducts certain financial and administrative transactions electronically (e.g.—billing, eligibility and funds transfers), a health plan or a healthcare clearinghouse. Should an individual, organization of agency meet this definition, it must comply with HIPAA’s privacy and security requirements
Data Access Protocol: A system that grants outsiders access to a database without overloading it.
Database rights: A right that prevents third parties from extracting and reusing content from a database. Common in many European jurisdictions.
Data-centric: A focus on specific data relevant to a given task
Data element access services: Services associated with crawling, indexing, security, identity, authentication, authorization and privacy
Data element indexing: A process and infrastructure for locating data elements
Data evangelist: A person who champions the benefits and use of data
Data Minimization: The concept that companies should limit the data they collect and retain, and dispose of it once they no longer need it[16]
Data mining: The computational process of discovering patterns in large data sets
Data provenance: The history of the data’s origin, ownership, use and subsequent modifications
Dataset(s): A collection of related sets of information that are comprised of separate elements but can be manipulated as a unit by a computer
Data segmentation: The electronic labeling or tagging of a patient’s health information in a way that provides patients or providers to share portions, but not all of a patient record.
Data warehouse: A system used in computing for reporting and data analysis. Data is integrated from one or more sources to create a central repository of data
De-identified data: Data that lacks a patient’s identifying information but it is still possible to provide information back to the patient under specified circumstances
Digital breadcrumbs: Small, often unconnected pieces of information that are produced by mobile phones, the Internet and other trackable daily activities such as shopping or commuting[2]
Digital signature: A cryptographic method for ensuring that data cannot be altered by anyone except the person that created the data
Electronic Health Record (EHR): An electronic record of a patient’s health-related information. An EHR may contain information from clinical visits, lab and imaging studies or other information pertaining to a patient’s medical history
Electronic Medical Record (EMR): An electronic record of a patient’s medical and clinical record gathered in one provider’s office
Electronic Protected Health Information (ePHI): Information, in electronic form that concerns the health status, provision or payment for healthcare that can be linked back to a specific individual
Encryption: The technology of making data indecipherable except for the person with the corresponding “key”
Enterprise Architecture: Defined in the JASON Report as the way a specific enterprise’s business processes are organized
FDA: Food and Drug Administration
Fast Healthcare Interoperability Resources (FHIR): The latest standard developed by the HL7 organization, FHIR defines a set of “resources” that are granular clinical concepts that can be maintained independently or aggregated into complex documents which offers some flexibility in addressing interoperability problems.
Genome: Genetic material of an organism
Genotype: Genetic makeup of a specific human being
Health Data Enclaves: A “secure environment that allows for remote access to confidential data. The environment is firewalled from outside intrusion, only accessible to authorized users and all information outflow and inflow is controlled and monitored by experienced confidentiality officers.”[3] A health data enclave could not only help aggregate siloed health data but allow stakeholders to utilize powerful analytic tools to assess large health datasets in a secure, HIPAA-compliant setting
Health Information Exchange (HIE): The exchange of electronic health information across organizations within a community, region or hospital system
Health Information Technology (HIT): Technologies that manage and transmit health information for use by providers, payers, consumers and other stakeholders
HHS: Department of Health and Human Services
Health Insurance Portability and Accountability Act (HIPAA): Enacted by Congress in 1996, HIPAA was enacted to guarantee the availability and renewability of health insurance coverage as well as limit restrictions on pre-existing conditions. HIPAA also contained tax provisions relating to health insurance and provisions requiring HHS to issue standards that would facilitate the electronic transmission of health information without compromising patient privacy
HIPAA Privacy Rule: Establishes national standards to protect patients’ medical records and other personal health information. The Rule applies to health plans, clearinghouses and healthcare providers that conduct certain healthcare transactions electronically. The Rule also requires that proper safeguards are taken to protect the privacy of personal health information and sets limits and conditions on the use and disclosure of such information without patient authorization. The Rule also enables patients to have rights over their health information including the ability to examine and obtain a copy of their own medical records and to request corrections
HIPAA Security Rule: Establishes national standards to protect an individual’s electronic personal health information that is created, used, received or maintained by a covered entity. The Rule also requires appropriate administrative, physical and technical safeguards to ensure the confidentiality, integrity and security of electronic protected health information. Examples where the Security Rule would apply include—electronic storage media, computer memory devices, and media used to exchange information already in electronic storage media (including email and internet)
HITECH Act: Also known as the Health Information Technology for Economic and Clinical Health (HITECH) Act. Enacted as part of the American Recovery and Reinvestment Act the HITECH Act authorized at least $20 billion for the adoption and use of interoperable EHRs
HL7: A set of international standards developed to enable the transfer of clinical and administrative data between hospital information systems
Human Genome Project (HGP): An international research project dedicated to determining the sequence of chemical base pairs that make up human DNA and mapping all of the genes of the human genome. The Project was declared complete in April 2003
ICD-10: The tenth revision of the International Statistical Classification of Diseases and Health Related Problems, a medical classification list by the World Health Organization (WHO). ICD-10 contains among other things, codes for diseases, symptoms, abnormal findings, complaints and injuries
Information Asset Registers (IARs): Registers set up to capture and organize meta-data about the vast quantities of information held by government departments and agencies. A comprehensive IAR includes among other items, databases, old sets of files, recent electronic files, collections of statistics, and research[4]
Integration engines: An application of a universal exchange language that can assist data exchanges between personal health records and other types of EHRs in a cloud
Internet of Things: The ability of everyday objects to connect to the Internet and to send and receive data
Interoperability: The extent to which systems and devices can exchange and interpret data in such a way that it can be understood by the user[5]
IOM: Institute of Medicine
Institutional Review Board (IRB):A committee that is tasked with reviewing, monitoring and approving research on human subjects including research on information gathered from a patient’s medical records[6]
JASON Report: A report authored by the MITRE Corporation in November 2013 that suggested the current lack of interoperability among EHRs is a substantial impediment to the free exchange of health data and the development of a robust health data infrastructure. The Report also found that the problem of interoperability can only be solved by establishing a comprehensive, transparent and overarching software architecture for health information
Key: A piece of data that can unlock and make readable cryptographically protected information
Learning Healthcare System: Harnessing patient data and analytics to learn about the best treatment for each patient and feeding that knowledge back to providers and other stakeholders to create cycles of continuous improvement[7]
Legacy system: An old method, technology or computer system that is used instead of available upgraded versions
Limited Data Sets: Protected health information (PHI) that is less regulated by HIPAA because it excludes direct identifiers of a patient with certain exceptions including such information as city, state, or zip code
Machine-readable: Data or metadata that is in a format so that it can be read by a computer
Markup Language(s): Designed for the processing, definition and presentation of text. A markup language specifies code within a text file for formatting (including layout and style.) XML is a common example of a markup language
Meaningful Use: Using certified EHR technology to improve quality, safety and efficiency in healthcare, engage patients and caregivers, improve care coordination and population and public health, as well as maintain the privacy and security of a patient’s health information[8]
Metadata: Information that characterizes data, including contextual information
Metadata tag: A tag accompanying each piece of data that describes the attributes, provenance and required security protections of that piece of information
mHealth: The practice of medicine and public health that is supported by mobile devices
Middleware: Software that extracts and reformats data elements from existing clinical systems
National Patient-Centered Clinical Research Network (PCORNet): Created in December 2013 and funded by PCORI, PCORNet consists of 11 clinical data research networks and 18 patient-powered research networks that are dedicated to improving comparative effectiveness research by integrating data from these various networks and creating networks for conducting clinical outcomes research
NIH: National Institutes of Health
NIH Big Data to Knowledge (BD2K): An initiative by the NIH that is focused on empowering biomedical scientists to capitalize on the big data that is generated by the research community. BD2K is focused on developing methods, standards, tools and software that will improve the use of big data in the biomedical research community by supporting a number of initiatives including research and training in data science
NIST: National Institute of Standards and Technology
ONC: Office of the National Coordinator for Health Information Technology
OpenFDA: An initiative by the FDA that provides public APIs. OpenFDA is also exploring the possibility of being able to download access to a number of datasets including adverse events, drug product labeling and recall enforcement reports
Open Data: Data that can be used for any purpose
Open Data Commons: A type of data that is available to all. Such data can include an obligation of acknowledging the source of the data. However, the data may be private, commercial or government controlled[9]
Open Health Data: Publically available data that can be accessed, downloaded or utilized without further requirements or stipulations of use by the data holder
Open Government Data: Open data that is produced by the government. Generally, it is data gathered during the course of business activities and do not identify individuals or breach commercial sensitivity
Open Standards: Commonly understood to meantechnical standards that are free from licensing restrictions. Can also denote standards that are developed in a vendor-neutral manner
Patient-Centered Outcomes Research Institute (PCORI): Authorized by the Affordable Care Act, PCORI is an independent nonprofit organization dedicated to funding comparative clinical effectiveness research. PCORI’s main goal is to assist patients, clinicians, payers and policymakers in making better informed decisions about healthcare as well as improving delivery of care and outcomes
Patient-centric: Healthcare organized around the needs and requirements of a patient
Patient-Generated Data: Health-related data that is created, collected, or recorded by a patient and/or their designee to help address a health concern[10]
Patient-powered Research Networks (PPRN): Networks governed and operated by patients and/or caregivers, that are focused on a particular condition and are interested in sharing health information and participating in research
Personal Health Record (PHR): An electronic record of health information that is maintained or managed by the patient
Personal/Proprietary Data: Data that is controlled by an individual, commercial entity or non-government institution that has the desire or legal right to restrict access to and use of the data[11]
Personalization: Tailoring medical care to meet the unique characteristics of an individual patient
Personally Identifiable Information (PII): Any information that can be used to identify, contact or locate an individual either by itself or in combination with other accessible sources
Population Health: “The health outcomes of a group of individuals, including the distribution of such outcomes within the group”[12]
Post-marketing surveillance: A system that identifies adverse events that did not arise during the drug or device approval process
Predictive analytics: A variety of statistical techniques including modeling, machine learning, and data mining that analyze current and historical data to make forecasts about future, unknown events in real-time
Primary care medical home (PCMH): A care model where the patient and primary care physician are the center of a virtual organization which is paid a fee to coordinate the care a patient receives from specialists and other providers
Protected Health Information (PHI): Information concerning the health status, provision or payment for healthcare that can be linked back to a specific individual
Quantified Self: A movement dedicated to acquiring self-knowledge through self-tracking using technology
Randomized clinical trials: A clinical trial where participants are randomly assigned to different forms of treatment
Regional Extension Centers (RECs): Established as part of the American Recovery and Reinvestment Act and overseen by the Office of the National Coordinator for Health IT. RECs were established to help critical access hospitals with IT implementation
Regional Health Information Organizations (RHIO): An organization that brings together healthcare stakeholders within a defined geographic area and governs health information exchange among them to improve health and the delivery of care within that community. (Also known as a Health Information Exchange Organization (HIO))
Semantics: The clinical or operational meaning of data
Service-oriented architecture: An approach to health IT that uses software policies, practices and frameworks to allow a user to access sets of “services” on another party’s computers and data
Share-alike License: A license that requires users of a work to provide the content under same or similar conditions as the original.[13]
Social-sharing paradox: Consumers sharing data but expecting others to protect their privacy
Software Architecture: Referred to in the JASON Report as the collective components of a software system that interact in specified ways and across specified interfaces to ensure specified functionality[14]
Standardized health records: Health records that follow a standardized format and can be accessed by all necessary parties
Syndromic surveillance: A type of surveillance using health-related data that precedes diagnosis and signals a sufficient probability of a case or outbreak to warrant a public health response
Syntax: The formatting of data that are exchanged, as well as the details of the exchange protocols, including privacy protection
Tab-separated values: Acommon form of text file format for sharing tabular data. The
format is very simple and machine readable
Tagged data elements: Data accompanied by metadata describing the attributes and privacy protections of the data
Trust network: A form of data sharing that combines a computer network that keeps track of user permissions for each piece of personal data with a legal contract that specifies what can and cannot be done with the data and the potential ramifications if these permissions are breached[15]
Two-factor authentication: The use of two of the following to determine the identity of a principal: physical credentials (e.g.—smart cards), biometrics (e.g.—fingerprints), or a secret (e.g.—password)
Universal exchange language: A common language and format in which all electronic health systems can exchange data
Usability: The ease with which clinicians can learn to use EHRs, capture data from clinical encounters and in turn make use of the data to improve the delivery of care
Value-based purchasing: The idea that buyers (e.g.—patients) should hold healthcare providers accountable for the cost and quality of care
VistA: An integrated system of software applications that supports patient care at the Veterans Health Administration
Web API: An API that is devised to work over the Internet
XML: A set of rules for encoding documents in machine readable format
[1] President’s Council of Advisors on Science and Technology. (2010). Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward. Washington, DC.
[16] Federal Trade Commission. (2015). Internet of Things: Privacy & Security in a Connected World.
[2] Heitmueller et al., (2014). Developing Public Policy to Advance the Use of Big Data in Health Care. Health Affairs, 33(9), 1523.