The income distribution statistics describe the structure and distribution of households' and household-dwelling units' income by population group and region in Finland. The statistics are compiled annually and their data content is based on international recommendations (OECD (2013) OECD Framework for Statistics on the Distribution of Household Income, Consumption and Wealth. OECD Publishing; UNECE (2011) Canberra Group Handbook on Household Income Statistics, Second edition 2011). The income distribution statistics comprehensibly describe households' disposable monetary income, which is the primary income concept of the income distribution statistics. Another main income concept is factor income, i.e. wages and salaries, entrepreneurial and property income, current transfers received, as well as current transfers paid. Several indicators are produced based on the statistics, the main being the Gini coefficient describing income differentials, the average and median of households' and household-dwelling units' income, and the relative at-risk-of-poverty rate.

The total data on income distribution are entirely register-based data on the income of persons and household-dwelling units covering the entire population and they enable compilation of statistics according to detailed classifications, especially regionally. They are the primary national data source for describing income differentials by population group and regional income. The time series data of the total data have been compiled in a comparable manner starting from 1995.

The sample data are internationally comparable and they are available from 1966 onwards. The sample data are formed in compliance with the Regulation on ESS EU-SILC statistics (Regulation (EC) No 1177/2003 of the European Parliament and of the Council). Due to limitations related to representativeness, the sample data are not suitable for detailed income distribution examinations between regions or population groups. On the other hand, the sample data utilise classifications and background data, in accordance with which data are not included in the total data. The most important of these classifications is socio-economic group.

### Statistical population

The target population of the income distribution statistics are private households and their members, i.e. the dwelling population in Finland at the end of the statistical reference year (31 December).

The frame population includes all private households and their members living permanently in Finland at the end of the statistical reference year (31.12., survey year – 1).

The household-dwelling population is formed by all persons living permanently at dwellings. Good two per cent of the entire population are excluded from the statistics. They include persons without a postal address, the institutional population (e.g. long-term residents of old people's homes, care institutions, prisons or hospitals), persons permanently resident abroad and persons temporarily resident in Finland. Conscripts are regarded as part of the population in these statistics.

### Statistical unit

The statistical units of the income distribution statistics are a private household (a household-dwelling unit and a common housekeeping unit), person and consumption units.

The definition of a household differs between the total data and sample data of the income distribution statistics. In the total data, the household is a household-dwelling unit. A household-dwelling unit is formed of persons living permanently in the same dwelling or at the same address. The household-dwelling unit is used in all register-based statistics of Statistics Finland. In the sample data of the income distribution statistics, the household is defined based on common housekeeping with the help of data collected with interviews. A household is formed of all those persons who live together and have meals together or otherwise use their income together. In the population, the correspondence on the individual level of households in the data has been around 94 to 95 per cent in recent years.

### Unit of measure

The units of measure in the income distribution statistics are euros, %, numbers of households, persons and consumption units.

### Base period

The base year for the real values of monetary data in the income distribution statistics is the latest statistical reference year.

### Reference period

The data of the income distribution statistics describe data for the statistical reference year, which is the whole calendar year, and for the end of the statistical reference year (31 December).

### Reference area

Regional classifications corresponding to the EU's uniform NUTS classification of regional units (NUTS2, or classification of major regions, NUTS3 or classification of regions), sub-regional unit and municipality are used in the income distribution statistics.

### Sector coverage

The income distribution statistics cover private households in Finland.

The total data on income distribution over population in household-dwelling units in Finland.

### Time coverage

The time series data of the total data of the income distribution statistics cover comparable data from year 1995 onwards.

The sample data of the income distribution statistics are available annually from 1986 onwards. The data published for the years 1966, 1971, 1976, and 1981 are based on the Household Budget Survey.

### Frequency of dissemination

The data of the income distribution statistics are disseminated yearly. Possible revisions are made to the time series in connection with annual releases.

## Accuracy, reliability and timeliness

### Overall accuracy

Only administrative register data are used as data sources for the total data of the income distribution statistics, so the quality of the statistics depends on the quality of the source data and the error related to the processing of the data. The quality of data sources is good in statistics compilation based on a register system.

The sample data of the income distribution statistics is based on a representative sample survey. Most of the data derive from administrative data sources. Some of the data are collected by interviewing households. The sources of error are sampling error and other error sources are coverage, measurement, non-response and processing errors.

The main sources of error in the sample data of the income distribution statistics are related to non-response. Unit non-response is corrected with weighting based on the sampling design (two-phase sampling design). The design weights are first corrected by stratum with the inverse inclusion probabilities of sample persons. After this, the response-corrected weights are scaled to the number of households and the weights are calibrated to correspond with the population’s key known demographic distributions and income sums in the total data. The error caused by item non-response is minor in the sample data and mostly concerns the interest income subject to withholding tax of the few income data collected with the interview. The item non-response is corrected by imputation.

In addition to non-response and random variation, the quality of the results of the income distribution statistics is also affected by coverage errors (the frame population differs from the target population) and measurement errors (the measured value of the result variable differs from its actual value). These error sources are minor in both datasets (total and sample data).

Some of the error sources in the income distribution statistics can cause systematic errors. Systematic errors are estimated by comparing the estimates with the data concerning the entire population available from the total data and other registers and with corresponding data from other statistics. As regards population data, the quality of the total data is examined, for example, in the quality description of Statistics Finland's statistics on dwellings and housing conditions. The coverage of income data in the total data is good relative to the used income concept (disposable monetary income). The data do not include income items that are entirely excluded from registers or that are not considered to be income. The coverage and quality of income data are studied by comparing total data with other statistical sources, such as the statistics of the Tax Administration, the Social Insurance Institution, the Finnish Centre for Pensions and the National Institute for Health and Welfare, and data on the household sector in Statistics Finland's national accounts. Comparisons are conducted regularly every year and more detailed information on them can be requested from Statistics Finland.

In the sample data of the income distribution statistics, the bias and accuracy of estimates are estimated with the help of standard errors of the data.

### Timeliness

The data for the statistical reference year are released as final data based on the income distribution statistics approximately 11 to 12 months from the end of the statistical reference year.

### Time lag - first results / TP1

Preliminary data on income distribution statistics are not released.

### Punctuality

The data are supplied to users punctually in accordance with the release date stated in the release calendar, as preliminary around 11 to 12 months from the end of the statistical reference year.

### Data revision

The time series data of the income distribution statistics are revised for the statistical reference year and retrospectively for the time series data if the effect of the corrections on key result data is statistically significant and data sources are available for the revision. The time series data of the statistics can also be updated with extended data content, such as classifications.

### Data revision - practice

The preliminary data of the income distribution statistics become revised for the statistical reference year if the data sources used for the statistics are updated, or there is a need for revision due to detected errors or deficiencies before the final data are published.

Methodological changes to the statistical reference year and the revisions to time series data they cause are planned in advance. The time series is revised if the effect on key result data of the statistics is statistically significant.

### Non-sampling error

Besides sampling errors other sources of error in the sample data of the income distribution statistics are coverage, measurement, non-response and processing errors.

The main sources of error in the sample data of the income distribution statistics are related to non-response. Unit non-response is corrected with weighting based on the sampling design (two-phase sampling design). The design weights are first corrected by stratum with the inverse inclusion probabilities of sample persons. After this, the response-corrected weights are scaled to the number of households and the weights are calibrated to correspond with the population’s key known demographic distributions and income sums in the total data. The error caused by item non-response is minor in the sample data and concerns interest income subject to withholding tax of the few income data collected with the interview. The item non-response is corrected by imputation.

The coverage error of the income distribution statistics is minor. Likewise, the processing error of the statistics compiled annually with an established production process is estimated to be relatively small.

### Coverage error

The framework for the total data of the income distribution statistics is the total data based on Statistics Finland's population and dwelling data resource of 31 December. The sources of errors in the data have been checked and the quality is good.

The sampling frame for the sample data of the income distribution statistics consists of total data based on the Population Information System of the Digital and Population Data Services Agency and Statistics Finland's population and dwelling data resource. The reference period of the population of the sample data is 31 December. The sampling frame is formed before the end of the statistical reference year, as a result of which the sampling frame contains slight errors. The sample is checked from the updated total data before the data collection and after that in the interviews, when persons not belonging to the target population of the statistics in the reference period (31 December), so-called over-coverage, are removed from it. Excluded from the sample accepted in interviews are persons temporarily absent from the household, e.g. persons residing abroad for more than a year if their household resident in Finland considers that the person was not part of the household in question during the reference period. The number of sample persons left outside the sampling frame which synchronise with the registers at a delay is small as well.

The population for the statistical reference year is revised after the reference period of the statistics approximately three months later in the data of Statistics Finland's statistics on household-dwelling units and the total data of the income distribution statistics. The data are used in the calibration of the sample data of the income distribution statistics, with which it is made to correspond to the population.

### Over-coverage rate / A2

The over-coverage of the sample data of the income distribution statistics was 1.1 per cent of the gross sample (unweighted) in 2022.

### Measurement error

The data of the income distribution statistics are compiled in an integrated manner according to the work stages of the established production process. Changes, for example in data sources or production systems, are tested and possible error sources are checked when forming the data. The measurement error is minor in statistics compilation based on a register system.

In the sample data of the income distribution statistics, the measurement error is primarily connected to data collected with interviews, which is affected by error sources concerning responses, both for the target and the interviewer. The error is estimated to be random for a majority of the data. Measurement errors in the data collection are prevented with interviewer training and instructions for data collection, as well as questionnaire designing and testing. Automatic checks (outlier and data logicality checks) are included in the form. The data obtained from the data collection are checked and errors are corrected in the statistics.

### Non-response error

The unit non-response of the income distribution statistics is corrected with weighting, which aims to remove non-response error.

### Unit non-response rate / A4

The unit non-response of the income distribution statistics was 29.3 per cent of the entire net sample in 2022 (unweighted data). A rotating four-panel design is used in the statistics. Net non-response by panel was 49.8 per cent in the first survey round, 18.6 per cent in the second survey round, 12.7 per cent in the third survey round and 8.3 per cent in the fourth survey round in 2022.

The unit non-response of the income distribution statistics is corrected with weighting.

### Item non-response rate / A5

In the sample data of the income distribution statistics, the data collected with interviews contain item non-response: interest income subject to withholding tax, housing expenditure items. Missing data are corrected by imputation. The respondent donor method is used stochastically as imputation method. The sub-populations are formed from a stratum and variables selected on the basis of exploratory analyses.

### Processing error

Data processing errors in the income distribution statistics are minor. The data are processed in the established production process by work phase.

### Model assumption error

The sampling design and estimation of the sample data of the income distribution statistics are based on established methods. Design-based estimation is used, for which the data selection is model-assisted.

## Comparability

### Comparability - geographical

The total data of the income distribution statistics describe household-dwelling units' income exhaustively according to the following regional classifications: the EU's uniform NUTS classification of regional units (NUTS2, or classification of major regions, NUTS3 or classification of regions), sub-regional unit and municipality.

The sample data of the income distribution statistics are based on a nationally representative sample survey. The sample data are nationally regionally comparable according to NUTS2 or the classification of major regions used in the statistics and by municipality group, and internationally by country according to the NUTS2 classification taking into account the difference in the income concept. The income of the sample data in the income distribution statistics corresponds, apart from small exceptions, to the data published by Eurostat and the OECD. Such an exception is caused by fringe benefits included in wages and salaries, which are included exhaustively in income in national statistics, but not in EU-SILC (EU Statistics on Income and Living Conditions).

### Comparability - over time

The time series data from the total dataset of the income distribution statistics are available for the years 1995 onwards. The time series formed based on the total dataset of the income distribution statistics is not completely comparable between the years 1995–2009 and 2010–.

Time series data from the sample dataset are available from 1986 onwards. In the time series data, efforts have been made to take into account the most significant changes in the formation of incomes. The data from the years 1986-1992 and 1993- are not entirely comparable due to the tax reform of 1993. In the time series data, the information for the years 1966, 1971, 1976, and 1981 is based on the consumption survey.

In the total data of the income distribution statistics, new income items were added to the income nomenclature starting from the statistical reference year 2010. New income items are child maintenance allowance, child support received, tax-free grants and daily allowances of conscripts. Paid child support was included in current transfers paid as tax-like payment for persons who have claimed deductions for maintenance payments in taxation.Child support received is derived from the tax deduction data of payers of maintenance payments. Specifications were also made to the formation of rehabilitation grants by removing the share of a person’s rehabilitation grant that is transferred directly to the employer.The temporal comparability of the income concepts of the income distribution statistics is also made more difficult by the 2005 dividend tax renewal, where the system of corporation tax credit was abandoned. Before the renewal, corporation tax credit was considered income in dividend, factor and gross income. Because the corporation tax credit was also included in current transfers paid, the renewal does not affect the comparability of disposable income. The changes caused by the tax renewal have been revised in the time series data of the income distribution statistics for 1993 to 2004. These changes have the same effect on temporal comparisons of income data produced on the basis of both total and sample data.

The imputed dwelling income from owner-occupied dwellings formed from the sample is still produced as a separate income component and it is still included in households' disposable income (but not in monetary income). In the 2006 statistics, the calculation method of housing income was renewed by taking into account, on the one hand, uniform practices with Statistics Finland's other statistics (especially the Household Budget Survey and national accounts) and on the other hand, the requirements of the regulation concerning the ESS EU-SILC statistics. The main changes are related to revisions in gross rent strata calculated for sample households with the stratum method and handling of depreciations related to owner-occupied dwellings. In the strata, gross rent based on the statistical grouping of municipalities has been replaced with municipality-specific data and, in larger municipalities, with sub-area data (since 2012, the gross rent by sub-area has been used for more municipalities than before). Stratum-specific gross rent values are still based on the average rents of new and old tenancies of non-subsidised dwellings in Statistics Finland’s rent statistics but the rent values of strata with low numbers of observations have been revised with the selling prices of old dwellings in housing companies. Depreciations are not subtracted from the dwelling income of those living in detached houses. Dwelling income calculated using the new method has been updated retrospectively in the statistics’ time series data starting from 1993 so the income concepts are comparable in this respect when the time series data of the income distribution statistics are used as the data source.

Starting from the income distribution statistics for 2006, sample-based current transfers received between households have no longer been included in money or other gifts received by households. The reason for this is coherence with the income concept of the ESS EU-SILC statistics. The new calculation method decreases the households’ average disposable income by around EUR 150 per household.

The data collection method of the survey was changed in 2021. At that time, online responding was introduced as a data collection method alongside telephone interviews. Online responding was offered to households with only one member at the time of sampling. In 2022, online responding was also available for larger households. Of the survey sample obtained, approximately 38 per cent of the responses were collected with a web form. The web responders represent around 1,238,000 households in the whole population, that is, around 42 per cent of all households.

In 2021, data based on the Incomes Register were used as partial substitute for previous interview data in forming the data describing a person's employment and other economic activity. In addition, data describing activity were collected from the respondents with a renewed data collection form. Due to changes in the data source and data collection form, the input data used in the classification of socio-economic group have changed. However, with consideration to ordinary uncertainties related to sample statistics, the time series of socio-economic groups can be considered comparable with previous years.

### Length of comparable time series / CC2

The time series data of the sample data of the income distribution statistics are comparable from 1987 to 2022, altogether 35 years. In addition, some data are comparable for 1966, 1971, 1976, 1981 and 1986. The comparability of the sample data time series of the income distribution statistics is good for 1993 to 2018 and for the main income items relatively good from 1993 backwards to earlier statistical reference years.

### Coherence - cross domain

The total data of the income distribution statistics are consistent with Statistics Finland's statistics based on total data. The statistical data of the sample data and the statistics on living conditions and households’ assets have been formed in an integrated manner by means of data collected with interviews and total data in the Survey on income and living conditions.

Besides the income distribution statistics, Statistics Finland's households’ assets, households’ consumption and national accounts also contain income concepts.

There are no considerable conceptual differences between the sample data of the income distribution statistics and households’ consumption. Both follow the definition of disposable income that is accordant with international recommendations. The housing costs in the income distribution statistics and the consumption expenditure of housing in the households’ consumption are congruent. The data of the households’ consumption contain all consumption expenditure related to the housing costs of the household’s actual dwellings and free-time residences (incl. imputed consumption). The statistics use the gross rent principle and the Classification of Individual Consumption by Purpose (COICOP-HBS). In addition to the above-mentioned factors, the data of the statistics may differ for reasons related to sampling and production methods.

When comparing the income sums of the income distribution statistics for the whole country with the items of the national accounts’ income and use of income accounts, the differences in defining the sector, in certain definitions, and in the compilation methods of the statistics should be noted. Due to the differences, the figures of the national accounts and income distribution statistics on, for example, annual changes in households’ disposable income may differ considerably from one another.

In current transfers received in the income distribution statistics, social benefits are divided into target/main groups according to the ESSPROS classification (the European System of Integrated Social Protection Statistics). The classification is consistent with the Finnish Institute for Health and Welfare's statistics on social protection expenditure and financing, and it is used in the EU-SILC statistics. In the income distribution statistics, social assistance is included in other social security. In the Finnish Institute for Health and Welfare's statistics on social protection expenditure and financing, social assistance granted for housing expenditure is included in other income security benefits for housing, and social assistance granted for health expenses is included in other income security benefits during periods of illness as of the statistical reference year 2022. Student benefits are not at all in the ESSPROS statistics.

### Coherence - sub-annual and annual statistics

The income distribution statistics are annual statistics.

### Coherence -national accounts

The income distribution statistics describe the income and current transfers of the household sector and are thus an extension of the household sector’s income and use of income accounts of the national accounts. When comparing the income sums of the income distribution statistics for the whole country with the items of the national accounts’ income and use of income accounts, the differences in defining the sector, in certain definitions, and in the compilation methods of the statistics should be noted. Due to the differences, the figures of the national accounts and income distribution statistics on, for example, annual changes in households’ disposable income may differ considerably from one another.

In the national accounts, the disposable income includes imputed rent for owner-occupied housing, while the main income concept in the income distribution statistics (disposable cash income) does not include imputed rent.

There are significant conceptual differences in property income. The national accounts do not include holding gains, but they do include taxes paid on taxable realized capital gains. The income distribution statistics (total dataset) include realized capital gains (capital gains minus losses) as property income and the taxes paid on them as paid income transfers.

There are also significant methodological and conceptual differences in entrepreneurial income.

### Coherence - internal

The content of the statistics is uniform, except for the effects of differences arising from definitional differences in the data on household and income, and the effects of special sources of error included in the sample data.

The income data of the total data and sample data are otherwise the same, but the sample statistics contain income data missing from registers that are collected with interviews (interest income, certain current transfers between households).

## Source data and data collections

### Source data

*Source data*

The total data of the income distribution statistics are statistical data covering the entire household-dwelling population, which are compiled on the individual level from several administrative files and registers. Thus, the statistics contain detailed data on the income of all household-dwelling units and persons belonging to them.

The following administrative and statistical registers have been used in the compilation of the total data:

- The Population Information System of the Digital and Population Data Services Agency and Statistics Finland's population and dwelling data resource the Tax Administration's tax database
- The Social Insurance Institution of Finland's pension and benefit database (health insurance compensation and rehabilitation register, registers of child maintenance allowances, financial aid for students and housing allowances)
- Data on preventive and supplementary income support collected by the National Institute for Health and Welfare (THL) from municipalities
- The register of pension contingency of the Finnish Centre for Pensions
- Statistics Finland’s Register of Completed Education and Degrees
- The State Treasury's database on the military injuries indemnity system
- The Financial Supervisory Authority's data (earnings-related unemployment allowances)
- Statistics Finland's Business Register
- The Employment Fund’s (formerly the Education Fund) data

The sample data of income distribution statistics is based on a representative sample survey. The basic sample data of the income distribution statistics are compiled by combining the data collected from households by interviews and the register data of total data for the acceptably interviewed sample. A majority of classification data on households and the income data that are not available from registers have been collected by interviews in the Survey on income and living conditions.

*Frame*

The target population of the total data is Finland's dwelling population at the end of the statistical reference year (31 December). The household-dwelling population is formed by all persons living permanently in dwellings. Good two per cent of the entire population are excluded from the statistics. They include persons registered as permanently resident at institutions (e.g. long-term residents of old people's homes, care institutions, prisons or hospitals), homeless persons, persons residing abroad and persons registered as unknown.

The total data are compiled by combining administrative and register data sources to persons on the basis of personal identity codes. The income of a household-dwelling unit is formed by adding up the income of persons belonging to the same household-dwelling unit.

*Sample frame*

The target population and reference period (the last day of the statistical reference year) of the sample data are the same as in the total data. The sampling frame consists of total data based on the Population Information System of the Digital and Population Data Services Agency and Statistics Finland's population and dwelling data resource. The reference period of the population of the sample data is 31 December. The sampling frame is formed before the end of the statistical reference year, as a result of which the sampling frame contains slight errors (13.3.1 Coverage error). The sample is checked from the total data updated before data collection and the error caused by over-coverage is corrected before the sample is drawn.

The sampling frame is used for different purposes of sampling, in the sample data of the income distribution statistics for forming household-dwelling units and sampling categories.

The population information system of the Digital and Population Data Services Agency is generally exhaustive and up-to-date as concerns persons. Data on population changes are updated in real time. Statistics Finland uses the data weekly for its personal and dwelling databases, which are used for statistical purposes, e.g. monthly for the publication of preliminary statistics on the population by municipality and sex.

*Sample data of the income distribution statistics*

The sample survey of the income distribution statistics follows a rotating panel design of four years. Each panel comprises four survey rounds.

The sampling design is stratified sampling. The random sample of persons (5,500) including their household-dwelling units is drawn with non-proportional quota from the strata formed in the overall frame. The frame covers the target population almost without errors (see sampling frame). Until the statistical reference year 2020, the draw was two-phased and the person sample (5,500 or 5,000) was drawn from a so-called master sample. The master sample, which consisted of 50,000 persons (exceptionally 100,000 in 2016, 2019 and 2020), was drawn in the first phase of sampling by systematic sampling from the overall frame.

The strata used are 12 socio-economic groups. Socio-economic groups are formed based on taxable income usually according to the household's (household-dwelling unit) highest earning income type and income level (for example, entrepreneurs are an exception).

In 2019, a draw of an additional sample of 500 persons was included in the first survey round of the sample. Since 2020, the sample size of the first survey round has been 5,500 persons and their household-dwelling units.

As a result of the panel design, an additional sample of 16-year-olds is selected for the second to fourth survey rounds following the sampling of the second phase.

The sample persons (and their household-dwelling units) refer to the population registered as permanently resident in Finland on 31 December. The sample unit is a person aged 16 or over.

### Data collection

The total data of the income distribution statistics are statistical data covering the entire household-dwelling population, which are compiled on the individual level from several administrative files and registers. Thus, the statistics contain detailed data on the income of all household-dwelling units and persons belonging to them.

The main data collection method for the data collected with the interviews of the statistics on living conditions is a computer-assisted telephone interview (CATI) administered by an interviewer and web interview (CAWI). Only a small part of the interviews (around one to two per cent) are collected with a computer-assisted personal interview (CAPI).

### Frequency of data collection

The basic data for the income distribution statistics are collected annually.

### Common units proportion / A3

In the sample data of the income distribution statistics, the share of units included both in the data collection and in administrative sources was around 100 per cent of the sample persons and persons belonging to their households.

### Cost and burden

In Statistics Finland's income distribution statistics, a considerable cost burden is caused by data collected from households with interviews. These data are not available with other methods or there are no administrative data sources available for forming them. The response burden is related to the interview data collection.

Statistics Finland's Data Collection Department is responsible for the interviews. The interviews are computer-assisted and conducted with the help of Blaise questionnaire software mainly as telephone interviews. The interview language is either Finnish, Swedish or English depending on the interviewee’s choice (since the statistical reference year 2014). In 2022, the average duration of an interview was roughly 34 minutes.

## Methods

### Data compilation

In the sample data of the income distribution statistics, households and persons receive a weighting coefficient with which their data are raised to represent the data of the basic population. First, design weights are formed for households relying on the sample selection probability of the sample person. A non-response correction is performed for the approved sample by inverse inclusion probability. The weights corrected for non-response are calibrated with the CALMAR macro to correspond to the key known distributions of the population from the total data. The procedure aims at reducing the bias caused by the selectivity of non-response and produce as exact estimates as possible for the main income variables.

In the calibration of the weights for the 2022 material, the following data were used:

- area (division of regions, where Helsinki and the rest of the Greater Helsinki region separately; statistical grouping of municipalities) size of household-dwelling unit age and gender groups of members level of education of persons aged 16 or over
- Total sums of the main income items: wages and salaries, entrepreneurial and property income, unemployment allowances (basic unemployment allowance and labour market allowance, earnings-related share), pensions, interest on housing and student loans, number of income recipients
- (earnings-related unemployment allowance, wage and salary income, pension income)
- number of persons belonging to low-income household-dwelling units in the household-dwelling population in the total data on income distribution (register-based income concept)

Of the calibration data, the number of persons belonging to low-income household-dwelling units was applied in the statistical reference year 2015 and the level of education in the statistical reference year 2016 to correct the increased bias caused by higher non-response. The effect on the educational distribution of persons aged 16 or over was significant: the number of persons with only comprehensive school or no education data grew and that of persons with university degrees decreased. By contrast, changes in median income and annual changes in population groups were small. The income relations between population groups did not change. The calibration change did not affect the comparability of key indicators.

The sum of the weighting coefficients of the sample households that responded acceptably is an estimate of the total number of households in the population at the end of the statistical reference year. Starting from the statistical reference year 2021, register data sources based on total data are used in the estimation of the total number of households. These are estimated to describe the number and structure of households more accurately than before. Prior to that, a so-called master sample drawn from the population at the end of the statistical reference year was primarily used in the estimation. Due to the new method, the total number of households at the end of the year differs slightly more from the number of household-dwelling units.

### Data validation

The correctness of the data formed for the total data of the income distribution statistics is ensured by checking the correctness and congruence of the data used from different source data for the derived classifications and variables. Checks are also performed in sample data once the total data have been combined with the sample.

As regards population data, the quality of the total data is examined, for example, in the quality description of Statistics Finland's statistics on dwellings and housing conditions. The coverage of income data in the total data is good relative to the used income concept (disposable monetary income). The data do not include income items that are entirely excluded from registers or that are not considered to be income. The coverage and quality of income data are studied by comparing total data with other statistical sources, such as the statistics of the Tax Administration, the Social Insurance Institution, the Finnish Centre for Pensions and the National Institute for Health and Welfare, and data on the household sector in Statistics Finland's national accounts. Comparisons are conducted regularly every year and more detailed information on them can be requested from Statistics Finland.

The main source of error in the sample data is unit non-response, which is corrected with weighting based on the sampling design. Besides non-response and random variation, the quality of the results is also affected by coverage errors (the frame population differs from the target population) and measurement errors (the measured value of the result variable differs from its actual value). Only a small proportion of income items are collected with interviews (e.g. interest income subject to withholding tax). The electronic data collection form contains plausibility and logicality checks of the data. The data are processed after the data collection with necessary checks and editing at unit level, mainly with automatic procedures, following the standard rules. Item non-response is imputed with the hot deck method. Some of these error sources can cause systematic errors. Systematic errors are estimated by comparing the estimates with the data concerning the entire population available from the total data and other registers and with corresponding data from other statistics. Comparisons are conducted annually and information on them can be requested from Statistics Finland.

### Documentation on methodology

The data content of the sample data of the income distribution statistics is based on the ESS EU-SILC statistics (EU-SILC, Statistics on Income and Living Conditions, Regulation No 1177/2003 and 1700/2019 of the European Parliament and of the Council).

The income data used in the classifications are based on data formed for the needs of the income distribution statistics. These income data follow the international recommendations of income distribution statistics: OECD (2013) OECD Framework for Statistics on the Distribution of Household Income, Consumption and Wealth, OECD Publishing; UNECE (2011) Canberra Group Handbook on Household Income Statistics, Second Edition 2011.

## Principles and outlines

### Legal acts and other agreements

The compilation of statistics is guided by the Statistics Act. The Statistics Act contains provisions on collection of data, processing of data and the obligation to provide data. Besides the Statistics Act, the Data Protection Act and the Act on the Openness of Government Activities are applied to processing of data when producing statistics.

Statistics Finland compiles statistics in line with the EU’s regulations applicable to statistics, which steer the statistical agencies of all EU Member States.

Further information: Statistical legislation

The data content of the sample data of the income distribution statistics is based on framework Regulation 1177/2003 and 1700/2019 of the European Parliament and of the Council concerning Community statistics on income and living conditions (EU-SILC).

### Confidentiality - policy

The data protection of data collected for statistical purposes is guaranteed in accordance with the requirements of the Statistics Act (280/2004), the Act on the Openness of Government Activities (621/1999), the EU's General Data Protection Regulation (EU) 2016/679 and the Data Protection Act (1050/2018). The data materials are protected at all stages of processing with the necessary physical and technical solutions. Statistics Finland has compiled detailed directions and instructions for confidential processing of the data. Employees have access only to the data essential for their duties. The premises where unit-level data are processed are not accessible to outsiders. Members of the personnel have signed a pledge of secrecy upon entering the service. Violation of data protection is punishable.

Further information: Data protection | Statistics Finland (stat.fi)

### Confidentiality - data treatment

The processing of the data is limited by user licences to the producers of the statistics. All persons employed by Statistics Finland have signed a pledge of secrecy, where they have obliged to keep secret the data prescribed as confidential by virtue of the Statistics Act or the Act on the Openness of Government Activities.

The compilation of statistics is steered by the Statistics Act (280/2004). Alongside the Statistics Act, the EU’s General Data Protection Regulation EU 2016/679 and the national Data Protection Act are applied to the processing of personal data. Confidentiality of data collected for statistical purposes is decreed in the Act on the Openness of Government Activities (621/1999).

Sample data of the income distribution statistics are combined with the service set of Statistics Finland's income distribution statistics. The service data do not contain direct identifiers. To ensure data protection, the values of income variables which make identification easier are made less detailed.

Sample data of the income distribution statistics and statistical data on which the statistics on living conditions are based are released to Eurostat, the Statistical Office of the European Union, for the EU-SILC statistics (EU-SILC, Statistics on Income and Living Conditions). The statistical data do not contain direct identifiers. In addition, protection measures common to the countries and, where necessary, nation-specific measures, are applied to the data. Eurostat releases data from the EU-SILC statistics for research use upon application. Researchers handling the data sign a pledge of secrecy.

Statistical protection methods are described, for example, in the Handbook on Statistical Disclosure Control (2010).

### Release policy

Statistics Finland publishes new statistical data at 8 am on weekdays in its web service. The release times of statistics are given in advance in the release calendar available in the web service. The data are public after they have been updated in the web service.

Further information: Publication principles for statistics at Statistics Finland

### Data sharing

Besides Statistics Finland, regional data from the total data of the income distribution statistics are also published as tabulated data in the statistics and indicator databank SOTKAnet maintained by the National Institute for Health and Welfare (THL).

The income data of the income distribution statistics are used for Statistics Finland's statistics on living conditions. The sample data of the income distribution statistics and the statistics on living conditions are based on the same sample data. The data are used for the international ESS EU-SILC statistics (EU-SILC, Statistics on income and living conditions). Eurostat, the Statistical Office of the European Union, is responsible for compiling statistics on the EU-SILC and for the release of its statistical data for research use. Research use requires an application for licence to use statistical data.

In addition, sample data from the income distribution statistics is supplied to the OECD (OECD IDD) and at set intervals to the Luxembourg Income Study's (LIS) international database. They publish internationally comparable data on their statistical pages.

### Other

Data on the income distribution statistics are available as chargeable special compilations, such as table data, through Statistics Finland's research services. Data collected for statistical purposes must be kept confidential by virtue of Section 24 of the Act on the Openness of Government Activities (621/1999). The response data are only used for statistical purposes. The research data are protected in accordance with the data protection regulations of Statistics Finland and responses given by individual households cannot be distinguished from the statistical tables.

### Accessibility and clarity

Statistical data are published as database tables in the StatFin database. The database is the primary publishing site of data, and new data are updated first there. When releasing statistical data, existing database tables can be updated with new data or completely new database tables can be published.

In addition to statistical data published in the StatFin database, a release on the key data is usually published in the web service. If the release contains data concerning several reference periods (e.g. monthly and annual data), a review bringing together these data is published in the web service. Database tables updated at the time of publication are listed both in the release and in the review. In some cases, statistical data can also be published as mere database releases in the StatFin database. No release or review is published in connection with these database releases.

Releases and database tables are published in three languages, in Finnish, Swedish and English. The language versions of releases may have more limited content than in Finnish.

Information about changes in the publication schedules of releases and database tables and about corrections are given as change releases in the web service.

### Micro-data access

A service data set is compiled annually based on the sample data of the income distribution statistics, and it is released as anonymised unit-level micro data (so-called service data) for scientific research use, statistical surveys and microsimulation through Statistics Finland's research services. The use of service data is subject to licence. The application must contain the purpose for which the data will be used, a research plan and the signed pledges of secrecy from the persons participating in the research. The service data are chargeable.

Data collected for statistical purposes must be kept confidential by virtue of Section 24 of the Act on the Openness of Government Activities (621/1999). The response data are only used for statistical purposes. The research data are protected in accordance with the data protection regulations of Statistics Finland and responses given by individual households cannot be distinguished from the statistical tables. According to Section 13 of the Statistics Act (280/2004), Statistics Finland may, on the basis of a separate application for licence to use statistical data, release data for scientific studies and statistical surveys without data enabling direct identification. The Statistics Act prohibits the use of data collected for statistical purposes in an investigation, surveillance, legal proceedings, administrative decision-making or other similar handling of a matter concerning the enterprise. Guidelines 6 February 2020 10 (16).

National data containing sample data of the income distribution statistics and data of the statistics on living conditions are released to Eurostat, the Statistical Office of the European Union, for the international, comparative ESS EU-SILC micro data. Eurostat releases anonymised micro data (EU-SILC Users' Database) for scientific research use based on an application for licence to use statistical data. The data obtained through Eurostat include data from countries conducting the EU-SILC survey. Finland’s data are available through Eurostat at a longer time lag than from Statistics Finland. Further information about the ESS EU SILC micro data is available on Eurostat's web pages.

### Data revision - policy

Revisions – i.e. improvements in the accuracy of statistical data already published – are a normal feature of statistical production and result in improved quality of statistics. The principle is that statistical data are based on the best available data and information concerning the statistical phenomenon. On the other hand, the revisions are communicated as transparently as possible in advance. Advance communication ensures that the users can prepare for the data revisions.

The reason why data in statistical releases become revised is often caused by the data becoming supplemented. Then the new, revised statistical figure is based on a wider information basis and describes the phenomenon more accurately than before.

Revisions of statistical data may also be caused by the calculation method used, such as annual benchmarking or updating of weight structures. Changes of base years and used classifications may also cause revisions to data.

The preliminary data of the income distribution statistics become revised for the statistical reference year if the data sources used for the statistics are updated, or there is a need for revision due to detected errors or deficiencies before the final data are published.

Methodological changes to the statistical reference year and the revisions to time series data they cause are planned in advance. The time series is revised if the effect on key result data of the statistics is statistically significant.

### Relevance

The relevance of the income distribution statistics is evaluated based on feedback received from users, monitoring of the use of statistical data (StatFin tables) and separate data requests.

### Quality documentation

The quality documentation of the income distribution statistics complies with the guidelines of Statistics Finland's Official Statistics of Finland (OSF).

### Quality assessment

Quality assessment, see OSF quality criteria and recommendation on quality description.

### Quality assurance

Further information: Quality management | Statistics Finland (stat.fi)

### User access

Data are released to all users at the same time. Statistical data may only be handled at Statistics Finland and information on them may be given before release only by persons involved in the production of the statistics concerned or who need the data of the statistics concerned in their own work before the data are published.

Further information: Publication principles for statistics

Unless otherwise separately stated in connection with the product, data or service concerned, Statistics Finland is the producer of the data and the owner of the copyright. The terms of use for statistical data.

