EFPIA publication parsing

Our database covers 11 European countries.

For 6 of them we have complete data:

  • Portugal, Denmark, Romania and Belgium because it comes from a public database, from which we extracted everything,
  • Ireland and the United Kingdom, because the EFPIA data is centralised on national platforms, where the data is downloadable or scrape-able.

For the 5 other countries – Germany, Italy, Spain, Sweden and Switzerland -, we rely on PDF files disclosed by each pharma company, that we manually listed, and automatically read via a python program we wrote (see tech doc). It was not possible to get all the data for such countries, either because the publication was not found, was difficult to parse, or was a dedicated website (not a PDF). To minimise this problem, we put an extra effort on the top 20 companies[1]ranked by the average of the rank that each company had in the countries where we had data.

The results of the parsing for these top 20 companies are as follow:


1 ranked by the average of the rank that each company had in the countries where we had data