Now that we have had the preliminaries in the previous articles in this series, it’s time to get down to the modelling itself. For that, I’m going to present a set of use cases and then work out the model.
We are going to work from the perspective of “Banks and financial institutions”, that need to assess the counterparty risk of counterparties. One of the aspects in that assessment is the risk associated with a particular legal form of not getting your money back. Specifically, we will look at world-wide counterparty risk on legal forms for an international bank.
Note: all images were generated using dbdiagram.io.
Conventions
In this model we will use “abbreviation” a lot. You could make the case that legal forms have acronyms rather than abbreviations (an acronym is an abbreviation that is formed from the initial letters of other words and pronounced as a word (e.g. NASA, EU, FIFA)), as most legal forms are acronyms. However, as I am not sure that all legal form name abbreviations are acronyms, we will use abbreviation for this model. If you feel that it is more appropriate to use acronym, you should do so.
Simple case: one country with one language
The simple case is as follows: we have a stakeholder (see previous article) that needs some information about legal person. I won’t get into the details of legal persons that may or may not exist, at this point we’ll just assume we have one. We want to register the legal form of the legal person, but:
- We are in the same country
- We have only one language
- We understand the risks that we run for this specific legal form
- We recognize a legal form by the abbreviation, like “LLC” or “Inc”.
This means we can restrict ourselves to only registering the following information:
The basic information required is the standard name for the legal form. For some legal forms, legal entities are required to communicate their legal form in their name by adding the standard abbrevation for that legal form. They are not allowed to communicate a misleading abbreviation: you cannot call yourself a Limited Liability company by adding Ltd. to your company name, if you are not in fact registered as such. So the official abbreviation you are required to carry in the name is also helpful to register. But since we only deal with one language, we are now using the abbreviation as our code and save an attribute. This will come back to haunt us, of course, as shortcuts usually do.
Do note that, depending on the specific rules and regulations and customs of the registrar, the full name of a legal person may include the legal form abbrevation itself or it may be something that is not part of the official name itself, but used as a postfix or prefix to the name in all official communications. We will ignore this bit from here on.
Also note that there are legal forms without abbreviation, and potentially there are legal forms without abbreviation or even a full name. We will ignore this for now, but we’ll get back to it later.
We can also add the registrar where the legal form was registered, but this is not part of the legal form itself. We could set this on the relationship between Legal Person and Legal Form, but that would imply you could have multiple Legal Forms for the same company with different registrars, which is unlikely to be a valid activity, so care must be taken to restrict the relationship to the appropriate form. What you should probably do is to add the information about the registrar to the Legal Person itself, because you create the legal person by registering it at a registrar.
This gives rise to the following diagram:
Next step: multiple languages
If there are multiple official languages in the country, you will have abbreviations for the legal form in multiple languages. They may be different, but the worst case is having two equal abbreviations that actually mean something different depending on the language. Fortunately very few countries let that happen. Still, it’s a theoretical possibility we need to cater for in our model.
Most of the time a legal person will probably carry only one abbrevation appended to the original name (if required) but not several. However, in the correspondence in other languages one may want to point out the legal form in a language the reader will understand. So having the abbrevation and full form name available in each official language is very helpful in those cases. This is why we now turn the Legal Form Code into a surrogate key and an actual code, and the Legal Form can then have several Abbreviations and Names per language.
We choose the ISO standard ISO 639, set 1 (ISO 639-1 alpha 2) for our languages. It contains the most used languages on the planet. But if you are dealing with “legal forms through the ages”, you should probably use the full ISO 639 standard. Since this is a bit out of scope, I won’t go into further detail. Rest assured that it is likely a new article to do it full justice.
This brings our diagram up to the following standard:
While the Legal Form entity looks a bit empty for now, any information that will be relevant to this legal form, regardless of the name or language, can be placed into it. An example might be a flag that indicates if there is transparent personal liability, which means that a given legal form allows a creditor to recuperate losses from the personal finances of the owners of the legal person, which is often the case for the “self-employed” legal form. You can have various kinds of transparancy, for instance from a tax-perspective, or from a risk-assessment perspective where you need to know if you can recover losses from (legal or private) persons that have ownership.
Transliterations: language and script
We have already included translations by incorporating language codes in the key for the Legal Form Name. But now we also want to deal with the situation where we can have multiple scripts within the same language.
If you only deal with the situation that you work for a single country in a single script that also happens to use only the characters of the Latin alphabet, good for you: you can skip this bit. But even when you stick to European countries, that’s just not the situation. The nations in Eastern Europe tend to have cyrillic scripts. For business purposes these are often transliterated into different scripts, which can give rise to certain interpretation issues. There are many official and unofficial ways to transliterate from Arabic to Latin, for instance. This gives rise to many misunderstandings. Therefore, we also want to maintain both the original and transliterated versions of the name and abbreviation.
Usually there is just one script per country, but there are exceptions to that rule. Like Mongolia, where they have two different scripts, and neither one of them is Latin. The former nations in Eastern Europe such as Serbia and Bulgaria also use two scripts, but one of them is Latin. And then there are the African languages that can be written in a Latin script (with some difficulty), but also have modern counterparts that use a different script, such as Dinka.
There is an ISO standard for script codes: ISO 15924. It contains a lot of script codes, but quite a few are of only historical interest – which could still be relevant for historical research into old legal forms.
However, since most cases for our current use case (which is to register legal forms that are actually in use, world-wide) can be dealt with by choosing an original script and a script that everything is transliterated to (for us, Latin), we are not going to add the additional complexity for one single remote use case. That is a trade-off we make here, and you should document such choices when modelling your data. So we will just store the original name and acronym or abbreviation in the original script in separate fields, and leave it at that for now.
At this point, our diagram should look like this:
However, we’re not done yet.
Legal basis for legal form
Every legal form has to have a basis in law. The problem (which we’ll get to in a later stage) is that you need the concept of jurisdiction to get it entirely correct, but for now it’s enough to say that every legal form refers to one ore more paragraphs in a law. And it should be referenced because you can have legal forms that have no name, but do have a legal basis. The legal basis can be done as a URL if you can refer to a stable website, or as a copy of the text. It is, however, on a per-language basis. So we cannot add it to the Legal Form entity, and we also cannot add it to the Legal Form Name entity.
We now have two options. Either we rename “Legal Form Name” to something that indicates it’s the language-specific legal form information, or we add another entity to hold the information about the “Legal Form Legal Basis”. But there is a problem with adding another entity: you could potentially run into the scenario that you have a legal basis for one given combination of legal form and language, and legal form names for another combination. Even if your fields are mandatory, you can still get null values if you combine the two entities into a coherent view. If you want to make sure you always have both at the same time, you must combine them into one entity.
However, in our case the fields CAN be empty. It is possible to have a legal form without abbreviation, such as “sole entrepreneurship”. It’s also possible (but highly unusual) to have a legal form with a legal basis, but no actual name. This happened in the Anacredit regulation where the European Central Bank had a different opinion on legal forms from the Dutch Central Bank and created new legal forms that had no name, based on different paragraphs in Dutch law.
This means we should be able to accomodate both scenarios and in that case you can either leave all fields optional, or separate the entities.
You could introduce a separate entity here, “Legal Form per Language”. But since we still need to accomodate various legal frameworks, we shall leave that for later. So we end up with the following model for now:
In our next chapter we shall discuss change over time, and the concept of jurisdictions. Legal forms relate to language, but are only valid within a certain geographic boundary. That leads to some fairly nasty complications.
Edits:
11-jun-2024: I slightly changed the text for transliteration (clarified difference with translation) and updated the image to reflect the fact that the original script usually needs a unicode datatype.