Information on changes to pseudonyms

Page updated 15.5.2026

The pseudonymisation of datasets processed in FIONA is being renewed. During a transition period, identifiers protected with the old method will be replaced with identifiers protected using the new method.

Research Services will centrally re‑protect all centrally updated ready‑made datasets, but research projects must pseudonymise themselves any other datasets stored in their project folders if those datasets need to be linked with data protected using the new identifiers. Statistics Finland provides projects with additional storage space for the replacement of pseudonymous identifiers.

The new pseudonymisation solution enhances data security and enables the protection of a wider range of identifiers than before.

The transition period for the change runs until 31 December 2026.

Which datasets need to be re‑pseudonymised?

The re-protection of updating standard datasets will be carried out centrally by Statistics Finland’s Research Services during the transition period.

For other datasets that have been manually delivered to research projects, the re-protection must be done by the research projects themselves. These datasets include:

customised datasets
one-time-purchase standard datasets
extracts made from standard datasets
external datasets
researchers’ working files

Identifiers need to be re-pseudonymised only if the data must be linked to information protected with a new identifier.

⁠Identifiers need to be re-pseudonymised only if the data must be linked to information protected with a new identifier.

⁠How is the pseudonymisation carried out?

⁠Link tables enabling the conversion of old identifiers to new ones have been imported into FIONA. The link tables and instructions can be found in the FIONA folder D:\keys.

The re-protection of identifiers is carried out by creating a copy of the original dataset in which the old identifiers have been replaced with new identifiers retrieved from the link table. Once a new version of the dataset has been created and its contents have been verified as correct, please request Research Services by email to transfer the updated dataset to the D drive. The new version will then replace the previous version protected with the old identifiers, which will be deleted in this process. In connection with the transfer, Research Services will only perform a visual check to verify that the contents correspond, meaning that responsibility for quality assurance lies with the user.

Free additional storage space available for replacing pseudonyms

Research Services will establish a new temporary storage drive in FIONA to support the replacement of pseudonymous identifiers.

The drive will be labeled X: and it will be taken into use on Monday, June 29, 2026, for those projects that have indicated a need for additional storage space.

The X drive will be available until the end of the year, at which point the transition period for replacing pseudonymous identifiers will also end.

Deployment of the storage space

Project-specific folders will be created on the X drive. Read and write access will be restricted to users of the respective project.

Folders will be created only upon request.
Requests should be sent to: tutkijapalvelut@stat.fi
The message must include the FIONA project identifier as well as a request for additional storage space for pseudonym replacement.

Registration is open immediately. The first batch of requests will be delivered to FIONA administration on approximately 4 June, but requests may also be submitted after this date.

Purpose of the X drive

The X drive is intended exclusively for the replacement of pseudonymous identifiers. It must not be used for other work or for general data storage.

Instructions for replacing pseudonymous identifiers

Retrieve the new identifiers for the data stored on the D drive:

Source: link table in the folder D:/keys

The old identifier is used as the key
Create a new dataset:

Save the output file resulting from the extraction to the X drive
Carefully verify the dataset:
- the new identifiers are included
- the old identifiers have been removed

The researcher is responsible for the correctness of the dataset, so this verification must be performed with particular care.

Use and release of storage space

The X drive capacity is shared between projects. To ensure smooth service for all users:

Verify the quality of the new dataset version immediately
Then request Research Services to transfer the dataset to the D drive
Once the dataset has been transferred, delete it promptly from the X drive

The simultaneous storage usage per project is limited to a maximum of 500 GB.

Practical tips and observations on linking pseudonymous identifiers

It has been observed that the old protection method has, in some cases, assigned the same pseudonym to different values of a variable. In practice, this issue primarily affects rare pseudonyms used in individual projects.

Problems in combining old and new identifiers

The replacement of pseudonyms has also weakened compatibility in the widely used protected business ID variable (syrtun). Problems have emerged in situations where the business ID variable contains values other than a business ID: in such cases, the link between the old and new pseudonym may not work properly for atypical identifiers. This occurs, for example, with self‑employed persons whose identifier—in particular in older datasets—is a personal identity code instead of a business ID.

An inconsistency has been identified in the conversion of the pseudonymised identifiers to the new identifier for the OID variables in the EDUC_TYHR, EDUC_ESIPERUS, and EDUC_OPISK ready-made datasets. This inconsistency affects the compatibility of these datasets’ OID variables with the OID variables of other datasets.

The issue related to business identifiers particularly affects ready‑made datasets containing business IDs. Research Services is reviewing the datasets and aims to implement corrective actions as soon as possible. Additional information will be provided on this webpage.

Business identifiers

Correction of combining business identifiers

The correction of combining issues related to business identifiers has been initiated starting from the FIRM FSS ready-made dataset.

Incorrect values have ended up in the variable yrtun_s in cases where syrtun has contained something other than a business ID, such as a personal identity code. The correction of the issue has been initiated, and the FIRM FSS ready-made dataset will be corrected first. In the future, the different values contained in the syrtun variable will be separated into their own variables, so that the protected business ID and hid_e will be stored separately in the dataset.

With regard to the FIRM FSS ready-made dataset, the issue particularly affects older years. In these cases, the variable syrtun2 can be utilised for the years 1986–1998, as it contains the business ID on those rows where syrtun contains a personal identity code. If the values differ from each other, the syrtun2 variable can be used to link the variable yrtun_s.

Changes to identifiers in ready-made datasets

Research Services are re‑protecting identifiers in the ready-made datasets. Most of the datasets have already been re‑protected, but some are still being processed.

Datasets for which re‑protection is still in progress:

EDUC_ESIPERUS_K, EDUC_OPISK_K, EDUC_TYHR_K, EDUC_HAREK, EDUC_VIRTA
FOLK_MUUTTO_MAANOSA, FOLK_MUUTTO_SUOMI_MUU, FOLK_VL_7085, FOLK_TKT
FLOWN
TRAFI_ajoneuvo, TRAFI_omistaja

Datasets for which re‑protection has been completed:

EDUC_ESIPERUS, EDUC_OPISK, EDUC_TREK, EDUC_TYHR, EDUC_YTL
FIRM_BANKR, FIRM_BASE, FIRM_COMMOD, FIRM_CPI, FIRM_DEMOG, FIRM_EMPENT, FIRM_EMPEST, FIRM_ENTER, FIRM_ESTAB, FIRM_FAMBUS, FIRM_FSS, FIRM_GROUP, FIRM_GVC, FIRM_ICT, FIRM_IFATS,FIRM_OFATS, FIRM_PAT, FIRM_PROD, FIRM_RDINNO, FIRM_SUBSID, FIRM_TRADE, FIRM_TRANSP, FIRM_VAT
FOLK_ASKUN, FOLK_ASLII, FOLK_ENHEN, FOLK_JAKSOT (ELAKE, TYOTTOMAT, SIJOITETUT, TYONHAKIJAT, TKT, TYOSUHDE), FOLK_LAPS, FOLK_MUUTTO, FOLK_PERH, FOLK_PERUS, FOLK_TULO, FOLK_TUTK, FOLK_VAEN
INFRA_SIJAINTI
KEHA_URA, TEM_TYOKUNTO, TEM_TYONHAKIJA, TEM_TYONHAKU, TEM_TYOPAIKKA
MIGR_OLESK
PRH_BOARD
SES_BASE, SES_HAR
TAX_BENEFIT, TAX_INCOMES, TAX_SUMINCOMES, TAX_HELPPO, TAX_XPER
TULLI_COMMOD, TULLI_ENTER

Impact on Replicability of Research Results

Differences between the old and new pseudonymisation methods may lead to situations where data protected with the two methods cannot produce fully identical research results. This is especially important for projects that have a research article currently in the publication process.

If datasets protected with the old pseudonyms do not need to be linked with data protected using the new identifiers, there is no need to adopt the new pseudonyms. In cases where transitioning to the new identifiers is necessary, but datasets protected with the old method are still required, an exception arrangement can be made. In such cases, Research Services can provide a separate folder in FIONA where the necessary code, auxiliary files, and, if needed, datasets can be stored.

If a project needs to retain datasets protected with the old identifier — for example, to ensure the reproducibility of results for a study currently in the publication process — it must send a notification to Research Services by email no later than 30 June 2026. The notification must include the following information:

title of the publication
the W‑drive folder containing the files necessary for result replication
the current stage of the publication process
the date until which replication is needed

The project commits to deleting the contents of the folder once the stated deadline has passed.

Storage Capacity

Re‑pseudonymising datasets consumes disk space, and this may cause problems especially if a project uses large datasets that need to be re‑protected.

Research Services will provide free additional storage space to projects from the summer until the end of 2026. The additional space can be requested by notifying Research Services of the need.

Timeline

The deadline for switching to the new identifiers has been extended until the end of 2026. All research project datasets must be re‑protected by 31 December 2026.

The deadline does not apply to projects that are no longer expanding. If no new datasets are added to a project, the existing datasets may continue to be used with the old identifiers.

New datasets delivered to projects will be pseudonymised using the new identifiers.

Contact information

Research Services

tutkijapalvelut@stat.fi