Data Format

To standardise the data coming from different kind of transparency documents in Europe, we developed a data format. This format is open source and can be seen on our gitlab repo.

Objectives we had when designing this format:

  • Versatility: we should be able to represent any link of interest
  • Each information is linked to the source it was first published in

The main object of this format is a link, which represent a link of interest between two entities, the source and the recipient. Some link have no recipient entity, because they concern multiple and undisclosed recipients (R&D line of the EFPIA disclosures for example). Each link belong to a publication, which has a url to the source of the data. Entities can have relation with other entities (entity_relation), which allows us to represent that Bayer Germany is linked to Bayer Spain for example.


Because we aggregate data from distinct sources, all of them have their own categorisation logic. We tried to stick as much as possible to the categorisation in use in the US Open-Payment database, as described here. Nevertheless, we were forced to introduce new categories. Also, the data of some countries have no category information at all.