Information on changes to pseudonyms

The pseudonymisation of datasets processed in FIONA is being renewed. During a transition period, identifiers protected with the old method will be replaced with identifiers protected using the new method. Research Services will centrally re‑protect all centrally updated ready‑made datasets, but research projects must pseudonymise themselves any other datasets stored in their project folders if those datasets need to be linked with data protected using the new identifiers.

The new pseudonymisation solution enhances data security and enables the protection of a wider range of identifiers than before.

The transition period for the change runs until 31 December 2026.

What needs to be done with the datasets?

The re-protection of updating standard datasets will be carried out centrally by Statistics Finland’s Research Services during the transition period.

For other datasets that have been manually delivered to research projects, the re-protection must be done by the research projects themselves. These datasets include:

  • customised datasets
  • one-time-purchase standard datasets
  • extracts made from standard datasets
  • external datasets
  • researchers’ working files

Identifiers need to be re-pseudonymised only if the data must be linked to information protected with a new identifier.

⁠Identifiers need to be re-pseudonymised only if the data must be linked to information protected with a new identifier.

⁠Link tables for changing identifiers

⁠Link tables enabling the conversion of old identifiers to new ones have been imported into FIONA. The link tables and instructions can be found in the FIONA folder D:\keys.

Re-protection is carried out on the W-drive. Once the re-protection is completed, please email Research Services and request that the updated dataset be transferred to the D-drive.

For the re-pseudonymisation of personal identifiers, researchers have been provided with the shnro–hid_e link table. It allows old pseudonymous identifiers to be converted into new ones. The link table is still being updated, as approximately 700 hid_e identifiers correspond to two different shnro identifiers. For the time being, it is recommended to search for the hid_e identifier based on the shnro identifier (rather than the other way around), as this usually results in only one matching pair.

The link table will remain available during the transition period, until 31 December 2026.

Problems in combining old and new identifiers

It has been observed that the old protection method has, in some cases, assigned the same pseudonym to different values of a variable. In practice, this issue primarily affects rare pseudonyms used in individual projects.

The replacement of pseudonyms has also weakened compatibility in the widely used protected business ID variable (syrtun). Problems have emerged in situations where the business ID variable contains values other than a business ID: in such cases, the link between the old and new pseudonym may not work properly for atypical identifiers. This occurs, for example, with self‑employed persons whose identifier—in particular in older datasets—is a personal identity code instead of a business ID.

An inconsistency has been identified in the conversion of the pseudonymised identifiers to the new identifier for the OID variables in the EDUC_TYHR, EDUC_ESIPERUS, and EDUC_OPISK ready-made datasets. This inconsistency affects the compatibility of these datasets’ OID variables with the OID variables of other datasets.

The issue related to business identifiers particularly affects ready‑made datasets containing business IDs. Research Services is reviewing the datasets and aims to implement corrective actions as soon as possible. Additional information will be provided on this webpage.

Impact on Replicability of Research Results

Differences between the old and new pseudonymisation methods may lead to situations where data protected with the two methods cannot produce fully identical research results. This is especially important for projects that have a research article currently in the publication process.

If datasets protected with the old pseudonyms do not need to be linked with data protected using the new identifiers, there is no need to adopt the new pseudonyms. In cases where transitioning to the new identifiers is necessary, but datasets protected with the old method are still required, an exception arrangement can be made. In such cases, Research Services can provide a separate folder in FIONA where the necessary code, auxiliary files, and, if needed, datasets can be stored.

If a project needs to retain datasets protected with the old identifier — for example, to ensure the reproducibility of results for a study currently in the publication process — it must send a notification to Research Services by email no later than 30 June 2026. The notification must include the following information:

  • title of the publication
  • the W‑drive folder containing the files necessary for result replication
  • the current stage of the publication process
  • the date until which replication is needed

The project commits to deleting the contents of the folder once the stated deadline has passed.

Storage Capacity

Re‑pseudonymising datasets consumes disk space, and this may cause problems especially if a project uses large datasets that need to be re‑protected.

Insufficient storage space understandably makes it difficult to replace pseudonymous identifiers. Research Services is currently exploring options for increasing available storage.

At the moment, the issue can be worked around by splitting the dataset into smaller parts and transferring the files from the W‑drive to the D‑drive in several batches.

Timeline

The deadline for switching to the new identifiers has been extended until the end of 2026. All research project datasets must be re‑protected by 31 December 2026.

The deadline does not apply to projects that are no longer expanding. If no new datasets are added to a project, the existing datasets may continue to be used with the old identifiers.

New datasets delivered to projects will be pseudonymised using the new identifiers.

Contact information

Research Services
tutkijapalvelut@stat.fi