Data concerning an individual person or enterprise must not be identified from files or tables. Data with a disclosure risk must be protected by planning the contents of outputs to be acceptable for data protection, for example by using sufficiently rough classifications.
Cells containing a disclosure risk can be protected in the table
- by changing the structure of the table
- by suppressing individual cell values or entire rows
- by changing cell values, for example, by rounding
- by replacing the original cell value with an approximate random number.
When selecting the protection method of a table, efforts should be made to find a method that sufficiently protects the table but retains the features important for its intended use as well as possible.
Changing the structure of the table means controlling the number of variables or changing the classification. By changing the classification, the cells with a risk of disclosure are removed from the table by combining the categories containing them with other categories in the table. In practice, changing the classification usually means that the whole classification becomes less detailed.
Cell suppression includes primary suppression of cells with a risk of disclosure and secondary suppression. Secondary suppression ensures that the values of primarily suppressed cells cannot be disclosed by means of table row or column totals. Suppression can also be made specifically for each row. If only a small number of statistical units belong to a particular row total of the table (fewer than the used threshold value), the row is suppressed in total without regard to the number of statistical units in its different cells.
Further information about the data protection methods can be found in the materials of the Remote access to research data online guide.
Exporting research outputs and codes from remote access use
Research outputs and other material may be exported from the system only through a checking process. The checking ensures that no individuals or enterprises can be identified from the published data.
The number and size of files containing outputs must be kept reasonable. In practice, a reasonable number means only a few individual files that are meant to be published, not dozens of different versions or long log files intended for comparison by a group of writers, for example.
Output checking process
To export outputs from remote access use, open the Output check icon on the FIONA desktop. Fill in the opening form where you describe which material you intend to export from the remote access system. After this, the outputs are either checked manually in advance or if you are subject to random checks, you can receive the outputs directly to your email.
All outputs of new users are checked in advance. The probability of being subject to random checks rises when the user experiences approved consecutive preliminary checks. To avoid unnecessary rejections, all additional information affecting checking of the outputs should be written on the output checking form.
If there are breaches or errors in the output checking request or if the requested data are not described on a sufficient level, the Research Services may reject the output checking request. Then the streak of the user’s approved requests breaks, and the random checking procedure will be reset to preliminary checks. The sanction for serious and repeated breaches is removal from the random checking procedure and other measures resulting from data protection breaches.
If you are unsure about the data protection of the output, please contact the Research Services already before asking for the output to be exported from the system. The contact and advance assessment of the output's data protection do not affect how quickly the researcher can be subjected to random checks but it may reduce the occurrence of unclear situations.
Outputs based on Findata material
If you intend to request outputs from FIONA that are based on datasets according to Findata's user licence, the Findata form must be appended to the output checking request. The form can be found in FIONA and on Findata's website.
Fill in on the form
- information about the user licence of Findata by virtue of which datasets are in FIONA
- basic information about the outputs that are to be exported from FIONA.
Statistics Finland and Findata cooperate in monitoring the output checking requests, so the requested outputs from FIONA do not need to be delivered afterwards to Findata for checking, but it is sufficient to fill in the form properly.
Data protection of outputs to be published
The researcher is responsible for the data protection of their research outputs. The researcher must ensure that the outputs requested to be exported from the remote access system do not contain unit-level datasets or the possibility to disclose data concerning an individual observation.
Consideration should be applied when exporting outputs from the remote access system. Only outputs intended to be published should be exported from the system. The contents of the tables and graphs should be in the same format in which they are meant to be published. Outputs that cannot be published due to data protection cannot be exported from the system.
The outputs must show the numbers of observations used in calculating tables, images, key figures, etc. If data that deviate from the data protection guidelines are to be exported from the remote access system and data concerning individuals cannot be disclosed, the Research Services must be contacted before making the output request.