Statistics Explained

Archive:Morbidity statistics methodology pilot studies - examples

This article has been archived. For information on Health statistics please see Health


This article presents the first part of the report 'Morbidity statistics in the EU' (available as downloadable PDF) on a possible methodology for collecting and compiling European Union (EU) morbidity statistics based on pilot studies on diagnosis-specific morbidity statistics conducted by sixteen EU Member States from 2005 to 2011. The report summarises the results of the Eurostat 'Morbidity Task Force'[1] set up to analyse the results of the pilot studies and to formulate a set of recommendations on the feasibility of a regular morbidity statistics data collection, focusing on possible sources and best national estimates on incidence and prevalence of a selected list of diseases. The report was presented to the Working Group on Public Health Statistics in December 2013.

The article provides an overview of the main findings, problematic aspects and proposed solutions for moving towards a Eurostat morbidity data collection, by presenting some prototypal situations the Eurostat 'Task Force on Morbidity' faced during the analysis of potential sources and estimates with regard to their accessibility, usefulness, overall quality and comparability.

Another introductory article on morbidity statistics discusses the current demand for health indicators in the context of the EU policies, the reasons why morbidity statistics may improve the different dimensions of health and the steps and methods needed to establish a routine data collection of morbidity statistics.

Full article

Current ‘experiments’ and promising developments

The European Statistical System (ESS) is undergoing a process of modernisation, and one of the pillars of this is the improved use of administrative sources. From this standpoint the pilots performed in 16 Member States (MS) show a reasonably complete range of the methodological challenges that need to be addressed in order to establish an EU data collection on morbidity.

The results provided by the different pilot countries show in some cases the best national estimates for the requested indicators on incidence and prevalence as from the morbidity short list range can potentially be provided by linking individual data from different registers at population level. However in other cases data are available based only on the existing national data source on a specific disease. Finally, in several cases the estimates show that the methodologies followed permit a preliminary comparison across EU Member States.

Some results achieved by the pilots and lesson learned

In this article a selection of examples[2] is given as an illustration of the results obtained and to demonstrate the potential for regular reporting according to the short list. Also these examples outline some of the issues with the methodology for the collection of diagnosis data in MS and reporting to Eurostat, as described in the guidelines. For each of the diseases chosen a particular aspect is highlighted.

The detailed results of the analysis are presented in Annex 1. Annex 8, page 87 shows the summary of pilot data and age-standardised estimates reported (range and ratio).

The importance of unambiguous diagnosis and definition: the case of schizophrenia

The detailed analysis of the pilot studies shows the importance of clear, univocal diagnostic criteria for the selected disease and suitable national sources for obtaining quite comparable estimates across countries. This is the case for schizophrenia (F20-F29), where estimates for age-standardised rates for period prevalence have been made available by nine countries. Only 2 countries (Cyprus, Malta) have no data for schizophrenia. Eight countries (Czech Republic (only crude rates provided)), (Germany not shown [3]), Estonia, Finland, Hungary, Lithuania, Poland, and Romania) used health insurance data, of which Poland and Finland used a combination of insurance and hospitals data and Romania DRG-based data. Five countries (Austria, Latvia, The Netherlands, Slovenia, and Slovakia) used health statistics based sources; best national estimates were provided by three of these countries. The case of Latvia is interesting because a specific ‘Register of patients of the State Mental Health Agency’ exists and provides results of the same scale as those derived from the insurance datasets. However the full comparability of these figures is not guaranteed because some of the identified sources are episode-based (Hungary) and some others are clearly stated as person-based (Poland).

Figure 1: Schizophrenia, 2005 (period prevalence)

Why is this example important?

It shows that comparable estimates can be obtained:

  • 1. In the case of clear case definition and diagnostic criteria;
  • 2. when suitable, even if different types of sources are available and used alone (Diseases-specific register for Latvia, or insurance data for Estonia, Hungary, Lithuania, Poland, or ambulatory care providers data for Slovakia) or in combination (Finland: linkage of Hospital Discharge Register for health institutions and Social Insurance Institution data on disability allowances). On the other hand purely administrative data such as hospital in-patient based data (Slovenia and Romania) do not provide realistic estimates for schizophrenia.
  • 3. In the case of schizophrenia, the treatment requires the use of specific medicines and the prescriptions are recorded by health insurance data, which are therefore a suitable source, as identified by some pilot countries in their national contexts.
  • 4. As prevalence of schizophrenia is more or less the same across populations (around 1 %), the systems in place in several counties seem to provide similar information.

How to decide whether a source is good enough: the case of Parkinson’s disease

Where the relevant source on prescriptions exists, it is likely that for selected diseases best estimates can be provided if the pharmacological category of the drug used for treatment is completely disease-specific. An example of this is Parkinson’s disease in Belgium. However, the results of this approach showed in reality that these drugs are indeed overused for (mis-diagnosed) cases of parkinsonism. This negative outcome has been ruled out thanks to the availability of data from specific studies with confirmed diagnoses of Parkinson’s disease done in Belgium. One common problem faced by the pilot countries was that if only one data source was available, it was not possible to validate it against any known ‘golden standard’ and to make an-evidence-based decision on rejecting/accepting/integrating it. In future the decisions should be made based on epidemiological knowledge and agreed criteria.

On the other hand, even in cases where more than one source was evaluated, as was done by Germany, a final consensus on which one was the best source in order to provide best national estimates was not achieved.

These issues enforce the need to further develop methodologies to consolidate the valuable experiences done so far. Based on the example reported below on Parkinsonism, it is evident that MS are in the best position to assess the quality of each data source.

Figure 2: Parkinson’s disease, 2005 (period prevalence)

Why is this example important?

It shows that:

  • 1. pilot countries have made efforts in trying to distinguish those cases where the identified source seemed to be a reliable one in the beginning of the exercise.
  • 2. The availability of a comparative source from an ad hoc study allowed an over reporting problem in one of the pilot countries to be identified; it is important to use a multi-source approach that can validate the final estimates obtained from the source(s)considered the most reliable; this can in turn lead to statistical computation of the best national source(s) for a specific disease.

The good news from the registers of pathologies: the case of incidence of malignant neoplasms

The most commonly used source for estimating cancer incidence is usually cancer registers established in almost all countries. For malignant neoplasms all of the pilot countries could provide the requested estimates, as this is a very well established data collection followed and disseminated by International Agency for Research on Cancer (IARC). Although the registration of new cancer cases is usually mandatory, the main threat to the validity of cancer incidence data could be the incomplete reporting of new cancer cases by health professionals and the inability to account for those cases of cancer which are observed after death.

Figure 3: All malignant neoplasms (cancer), 2005 (incidence by person)

However, sometimes the reasons for selecting one source over another are not clear. For example in the case of cancer registry in Poland, data from this source are constantly lower when compared to other sources i.e. hospital data. The preference almost always goes to the cancer registry because the diagnoses are confirmed, while the data from hospital may be less precise. However, in this case the cancer register has questionable coverage: under reporting is estimated on the basis of the indicators in other countries with a similar economic development level (it was estimated on the level of 17 % at the time of the pilot exercise). The cause of under reporting is non-compliance to obligation to register cancer incidence cases by the doctors; only the estimation of total number of all cancers is published in Poland.

Why is this example important?

It shows that:

  • 1. Even for estimates derived from the same kind of source, i.e. disease-specific registers, the coverage and comparability issues are far from being solved.
  • 2. As the data collection and sources are well established, the question remains ‘are the observed differences real ones’?

The limitations of the registers of pathologies: the case of prevalence for malignant neoplasms

The most commonly used source for estimating overall cancer prevalence was cancer registers. However, some limitations appear evident in estimating prevalence across the pilot countries. The age-standardised rates for Poland and Slovenia from the national registers seem to underestimate the prevalence. Similarly, for Cyprus who has provided these data according to European Core Health Indicators (EHIS) survey. Estonia used the national insurance data as a source. The main difficulties of working with measures of prevalence are due to different definitions used by the pilot countries with respect to the period to be considered; the cases to be included; follow-up procedures to ensure that recovered persons/deaths are not counted, and so on. These limits apply in many cases to register-based data for prevalence for other diseases that are curable.

Figure 4: All malignant neoplasms (cancer), 2005 (period prevalence)

Why is this example important?

It shows that:

  • 1. prevalence data are different for cancers due to some variations in the definitions and estimation methods applied.
  • 2. The issue of coverage is of course relevant for prevalence as already mentioned for incidence.
  • 3. Other sources could be potentially available beside the registers, Estonia chose the source ‘insurance’ for the estimation of total cancer prevalence.
  • 4. In some cases there might be an overestimation of prevalence due to the lack of appropriate procedures (including inability to access other sources by the cancer registries) to exclude those cases not to be counted as prevalent in a specified period of time.
  • 5. Epidemiological registers should follow people with a disease until relapse or death, but generally no information is collected if the person is fully recovered and when this happened.

Diseases of primary importance for Public Health: the case of acute myocardial infarction (AMI)

Only 7 out of the 16 participating countries were able to provide age-standardised AMI incidence estimates, and 10 provided prevalence estimates despite AMI being a health problem of paramount importance in terms of frequency, seriousness, social and economic costs, amenability to medical intervention and priority-ranking by policy makers and the community. If these results could be encouraging for some other diseases, the expectations on the possibility of having a more complete insight into this disease were partly not met.

Some of the notable difficulties faced during the pilots were: differences in definition (International Classification of Diseases (ICD) coding-based or based on diagnostic criteria) changes in definition, and the diversity of the various diagnostic assays may have heavily affected AMI incidence rates and AMI-Case Fatality Rates. With the projected increase of those aged ≥ 65 years to 20 % of the adult population in developed regions of the world by 2025, the burden of AMI will be felt even more acutely in the years to come.

Since both case-fatality rates vary by hospital and by sex, it appears that medical practice varies as well[4].

AMI mortality in non- hospitalised cases is very high in the first two hours, implying the need to combine causes of death statistics and hospitalisation data to obtain incidence figures.

Most non-fatal AMI-cases are referred to hospital for treatment. Hence combining hospital-data of non-fatal cases with CoD data might be possible for an important proportion of EU member states; this approach could however face difficulties as discharge data in several countries are not person-based, but episode-based. As shown in the figure below on incidence by person, three pilot countries (Belgium, Finland, Poland) could link/merge two different sources, resulting in quite similar figures.

Figure 5: Acute myocardial infarction, 2005 (incidence by person)

Acute myocardial infarction, 2005 (incidence by person).png In general it seems that incidence per person was the most difficult indicator to estimate, as in the case of AMI the pilot countries had problems with definitions for diseases with ‘attacks’ (such as for asthma and Chronic Obstructive Pulmonary Disease).

Ten Countries provided age-standardised AMI period prevalence rates. In the case of Cyprus and Malta data sourced from self-reported surveys (Health Interview Survey (HIS)) were used, probably leading to an important overestimation. The graph below shows reasonably comparable figures for the other countries.

Figure 6: Acute myocardial infarction, 2005 (period prevalence)

Why is this example important?

It shows that:

  • 1. Even for a disease which represents one of the major causes of death in Europe, we still base our knowledge on partial and not fully comparable data. This information gap must be filled in the forthcoming years.
  • 2. Based on EUROCISS project, the indicator should be calculated as ‘Age-standardised attack rate by sex’ with linkage between data on hospital care and causes of death or disease-specific register data (ECHI indicator 24). However, the pilot results for incidence by person suggest permitting the inclusion of different kind of sources.
  • 3. If EHIS data are excluded from this example, the prevalence estimates show relatively similar values, suggesting that more in-depth analysis is required for assessing the real comparability of estimates.

Estimates derived from hospital discharge data: the case of femur fracture

This injury occurs frequently, particularly among older age groups, and can be a potentially deadly injury. It is important to measure the incidence of fracture of femur, as it creates a significant burden on the health care system in terms of hospitalisation, rehabilitation, long term consequences, on-going care, and potentially high delayed fatality rates. Next to this femur fracture can also be seen as a proxy of accidental falls in the elderly.

Figure 7: Fracture of femur, 2005 (incidence by episode)

Why is this example important?

It shows that:

  • 1. Although five countries (Belgium, Finland, The Netherlands, Slovenia, Slovakia) used the same source (hospital data) for femur fracture incidence, the estimates show a high level of variation. As femur fracture is an injury which needs to be treated in hospital, this source is expected to be the best one.
  • 2. The registration and coding of diseases in hospital discharges data needs to be investigated as there could be data quality issues (as reported by The Netherlands).
  • 3. Estimates derived from reimbursement-driven sources (Estonia, Lithuania) are reasonably similar to data derived from non-reimbursement driven insurances sources.
  • 4. Countries have different data collection and coding practices regarding the number of primary and secondary diagnosis, and the pilot countries may have used only primary diagnoses or a limited number of secondary diagnoses to identify the cases for the short-list of diseases.

Should we use data from the Health Interview surveys for producing morbidity statistics?

Despite the clear instructions in the guidelines that only diseases diagnosed and reported by physicians should be used, some pilot countries were confronted by the fact that HIS (or EHIS) data were the only available sources for providing information on some of the selected diseases.

The Task Force conclusion on this aspect reiterates that health interview surveys are not recommended for the compilation of morbidity statistics for a number of reasons. These include the information collected from such a surveys is very subjective; the respondents may interpret the specific medical questions differently and could mix up symptoms and diseases; the response rates can be low; and the surveys can be limited to specific diseases and may exclude younger ages. Also institutionalised people are often excluded from all surveys leading to underestimated prevalence figures.

The evidence from the pilot studies shows that for those countries that provided HIS/EHIS data only the level of estimates are not comparable to those of the other pilot countries. One example of this is given by the indicator on eating disorders where it is clear that the self-reported estimates provided by Malta are many times higher than those provided by other diagnoses-based sources.

In general, a substantial underestimation of this condition is expected (as well as for migraine, visual impairments, hearing loss, back pain, and other conditions for which a contact with health care providers is rarely required, or where these are not reported as the most relevant diagnoses by physicians).

Figure 8: Eating disorders, 2005 (period prevalence)

On the other hand, the example on migraine highlights that HIS data can be a relevant source in those cases where the disease can be assimilated with symptoms of different underlying aetiology when self-reported. In the case of migraine/headache syndromes the physicians will most likely code the underlying condition thus resulting in lower estimates for this group of diseases.

Figure 9: Migraine and other headache syndromes, 2005 (period prevalence)

Why are these examples important?

They show that:

  • 1. Self-reported health status usually differs from morbidity estimates, generally (but not always) showing higher rates for the specified diseases;
  • 2. The two examples highlight the importance of correctly addressing the questions at population level: what is the real proportion of the population suffering from migraine/headache? Even in presence of a correctly diagnosed condition the perceived quality of the health status may differ and the identification of the appropriate source(s) is crucial.
  • 3. Striking differences in the levels of the provided estimates show that considering just EHIS data could significantly impact the allocation of economic resources for tackling some diseases. On the other hand, for such conditions diagnosis-based data are not good enough for getting accurate estimates from the identified sources, and further analyses and methodological improvements should be worked out.
  • 4. The inclusion of co-morbidities would be of particular of importance, as those conditions and diseases are often reported as secondary diagnoses only.

What we could gain from low-prevalence diseases: the case of multiple sclerosis

The case of multiple sclerosis is relevant in the sense that the estimates are derived from several different sources and show a consistent higher prevalence in females, in accordance with the scientific literature.

Figure 10: Multiple sclerosis, 2005 (period prevalence)

Why is this example important?

It shows that:

  • 1. With the possible exception of hospital data, different population-based sources could be used for estimating multiple sclerosis prevalence.
  • 2. The gender pattern of higher prevalence in females compared to males is consistently observed in all of the reporting countries. Despite this evidence, other problems such as under coverage or similar biases cannot be ruled out, but it confirms a general pattern known from the literature, and upon which some methodological refinements could be built.
  • 3. The estimates computed from the pilot studies show that the proposed approach could represent a considerable improvement compared to the traditional tools such as HIS surveys, particularly for a disease with such a relatively low rate of prevalence. In fact, for low-prevalence neuro-degenerative diseases such as Parkinson’s disease or multiple sclerosis (or even for epilepsy), substantially large sample sizes are required, with considerably higher costs for performing the data collection.

When data linkage seems to be the solution: the case of Dementia (including Alzheimer)

Prevalence estimates for dementia including Alzheimer disease were requested in the morbidity short-list. The importance of linkages can be clearly seen in the following example from the Finnish pilot study. The numbers of patients with dementia (including Alzheimer’s disease) were similar in the hospital discharge register covering health and social welfare institutions (79 656) and in the disability allowance register (80 612), but merging these data sources with ID number gave a significantly higher number of people with dementia (134 284) increasing the estimates based on a single source by 69 % and 67 %, respectively. As reported in the graph below the estimates from FI are much higher compared to those of other pilot countries regardless of the type of source used.

Linkage of individual records however is not frequently feasible in other countries, due to legal restrictions, high costs or time-consuming linkage processes. In some countries, these kinds of data linkages can be done by using unique personal identity code (deterministic record linkage). Alternatively, probabilistic record linkages can be done by using available variables, such as sex, birth date, name and address of the registered person. The latter option may cause a minor bias, but the successful linkages often reach a success rate of 95 % or more.

Figure 11: Dementia (incl. Alzheimer’s disease), 2005 (period prevalence)

Why is this example important?

It shows that:

  • 1. The possibility of linking relevant data sources indicates high potential for the establishment of an EU regular data collection on morbidity and for its success and reliability.
  • 2. Other similar approaches, such as merging of aggregated data should be further investigated, as these could be feasible in other EU countries.

Data linkage: always as promising as it promises? The case of osteoporosis

A certain level of under-registration of osteoporosis could be expected if a single source is used, as not all patients receive treatment for this condition and they may purchase medication privately for their own use. Also, it is possible that only serious cases of osteoporosis are traditionally registered, as this condition can be considered co-morbidity.

The data from Finland were derived from the linkage of the hospital discharges register and the disability allowances register. As osteoporosis is often a symptom-free disease which is diagnosed only after a major fracture, the register data is not a reliable source for this disease.

The linkage carried out by FI using data on the special reimbursement of medicines and the disability allowances has included those people fulfilling two different criteria:

  • 1. those entitled to receive the benefit and
  • 2. those who purchased a relevant drug related to each special reimbursement right. Similar exercises have been done by FI to produce estimates on diabetes, depression and other affective disorders, Parkinson, multiple sclerosis, epilepsy, glaucoma, hypertension, ischemic heart diseases, heart failure, asthma, Chronic Obstructive Pulmonary Disease (COPD), rheumatoid arthritis, musculo-skeletal and connective disorders (including osteoporosis), and renal failure.

It is expected that:

  • 1. some other countries may have used different criteria (for example eligibility for reimbursement only), and
  • 2. the lists of pharmaceuticals recognised as eligible for reimbursement may differ among countries, thus introducing variations in the way estimates are calculated resulting in potential inconsistencies that could affect comparability.
Figure 12: Osteoporosis, 2005 (period prevalence)

A further observation worth noting from this example is the high levels of age-standardised rates observed for HU females. The source used in Hungary is reimbursement-driven insurance data and the estimates based on this source approach those derived from the EHIS data.

Why is this example important?

Because it shows that:

  • 1. Statistical elaborations or combination of sources are not always the ‘perfect’ solution for morbidity statistics.
  • 2. The diseases where ‘high severity’ prevalence is low, but which are quite common in the population (as it seems the case from the EHIS data) would require some ad-hoc methods or dedicated sources in order to be correctly detected and estimated.
  • 3. For those diseases mainly treated by (primary) ambulatory care services it appears that countries have limitations in accessing this information, with The Netherlands being the only pilot country that provided the estimates from the theoretically best source.

When episodes count: the case of Tuberculosis

The majority of the countries that participated in the pilot have a disease-specific register on tuberculosis, as in most countries reporting tuberculosis is part of a compulsory notification or surveillance system. Multiple episodes of tuberculosis may occur during the same year, and the result of the pilots show that this is indeed the case especially for males and particularly in Estonia, Hungary, Latvia, Poland, Lithuania. The purpose of collecting incidence by episode for a disease is mainly to address the burden on the health system and its capability in terms of preventive measures and efficacy of treatment. One possible limitation of estimating incidence by episode for TB lies in the possible different definitions adopted by the pilot countries: relapsed/recurrent episodes (and reinfections) should be counted based on laboratory confirmation and clear personal identification. Moreover, people with a continuing episode of TB that requires a treatment change should be considered as prevalent cases, not incident ones.

Figure 13: Tubercolosis, 2005 (incidence by episode)

Why is this example important?

It shows that:

  • 1. Alternative sources to diseases registers can also be explored for selected infectious diseases.
  • 2. Incidence by episode may be meaningful provided that the definition of ‘episodes’ is clear enough (and distinguished from ‘re-infections’) and feasibility of collecting data according to the definition is at an acceptable level in the pilot countries.

About this article

This article and the companion article Morbidity statistics methodology pilot studies - introduction are extracted from the report on Morbidity Satistics in the EU as prepared by the Morbidity Task Force. The full report and annexes have been published on April 2014 in the Statistical Working papers collection.

The article Morbidity statistics methodology pilot studies - introduction (Chapter 1 in the report) is an overview of the current demand for health indicators in the context of the EU policies. Reasons on how the morbidity statistics will improve the different dimensions of health are provided as well. Besides these aspects, the steps and methods followed for establishing the routine data collection of morbidity statistics strand are provided.

The article Morbidity statistics methodology pilot studies - examples (Chapter 2 in the report)provides an overview of the main findings, problematic aspects and proposed solutions for moving towards a Eurostat morbidity data collection. In order to present the most relevant aspects in a readable format, it was decided to present some prototypal situations which the TF faced during the analysis of the sources and estimates in view of their accessibility, usefulness, overall quality and comparability. Case studies dealing with the quality of the identified sources and estimates are shown in the form of questions in order to make the report more readable.

Direct access to

Other articles
Tables
Database
Dedicated section
Publications
Methodology
Visualisations





Health status (t_hlth_state)


Health status (hlth_state)
Healthy Life Years (hlth_hly)
Self-perceived health and well-being (hlth_sph)
Self-reported chronic morbidity (hlth_srcm)

<notes>

Notes

  1. Authors and members of the Task Force ‘Morbidity’: Monica Pace (Eurostat, Directorate Social statistics and Seconded National Expert from the Italian National Institute of Statistics), Hartmut Buchow (Eurostat, Directorate Social statistics), Margarida Domingues de Carvalho (Eurostat, Directorate Social statistics), Willem Aelvoet (Belgian Federal Public Service Health), Jacques Bonte (Private Expert), Gráinne Cosgrove (Irish Department of Health), Rita Gaidelyte (Lithuanian Institute of Hygiene), Mika Gissler (Finnish National Institute for Health and Welfare), Georgeta-Marinela Istrate (Romania National Institute for Statistics), Merike Rätsep (Estonian National Institute for Health Development), Ieva Strele (Riga Stradins University, Riga, Latvia), Bogdan Wojtyniak (National Institute of Public Health-National Institute of Hygiene, Warsaw, Poland).Former members of the Task Force ‘Morbidity’: Prof. Howard Meltzer (Department of Health Sciences College of Medicine, Biological Sciences and Psychology University of Leicester, UK (member until September 2012)), Anne Fagot-Campagna (French Institute for Health Surveillance (member until March 2012)), Jean-Marc Schaeffer (Eurostat, Directorate ‘Social Statistics’ (member until April 2012).).
  2. The sources presented in the following graphs have been grouped according to criteria agreed within the task force: for the description of the different categories of sources, please refer to Table 36, page 87 in Annex 1.
  3. The data for Germany are not shown in this report, based on a specific request made by the German Institutions that conducted the pilots.
  4. Schiele F, et al. Eur Heart J 2005; 26(9):873-880.

<notes>