[topicmapmail] PSIs - alternatives
Thompson, Miles
MThompson at creditsights.com
Tue Jun 20 23:32:17 EDT 2006
On 6/20/06, Patrick Durusau <patrick at durusau.net> wrote:
> Reinventing the notion of a universal name
> set that everyone will use, with or without repositories, is simply a
> non-starter. Even if it is prefaced with the "sacred" prefix http:// <http://> .
Alex wrote:
> The library world has three very different mammoth naming conventions,
> and neither of them work except on a rather high level, as with
> anything else.
Examples from the real world are good I think.
Here's an example from the area of finance.. where there are also a whole raft of different naming conventions just for the purpose of trying to uniquely identify different companies - for example:
- Tickers (in various exchanges)
- QSIPS (as issued by the SEC)
- Official "Full Registered Names"
- Unique identifiers as used internally, and sometimes published by major banks or trading platforms (eg Reuters, Bloomberg, or Lehman Brothers identifiers)
- The unique identifier as used internally on various database systems and not published wider than that...
To give a flavor of things lets talk about tickers for a second.
Tickers as company identifiers are really useful - for humans - real human beings like to use them to gain quick access to a particular company, where, beiing humans they know the context and are tolerant of all sorts of weird 'context sensitive' rules. However they are a total pain in the neck for us computer people thus:
- Tickers need to be scoped by the relevant Exchange "IBM" on the NYSE does not necessarily refer to the same entity as "IBM" on NASDAQ (in practice, 99% of the time, it would though)
- Ticker assignments can change over time. For instance when companies are delisted, or go bankrupt they actually change the ticker that represents them by adding a dot and a letter to the end ("IBM.D"). The original ticker ("IBM") can then, sometimes, be assigned to other companies that have no relation to the former owner, after a period of time.
- Sometimes the "same company" may have multiple tickers on the same exchange, because of a stock split or similar. (Thank goodness the reverse is not true, a ticker, for a given exchange, on a given day, always represents only one company).
All the other naming schemes suffer from the same general limitations (yes, even QSIPS) to a greater or lesser extent.
--
So given that, let me talk about what is good, and what is bad about the way PSIs work for us in practice.
PSI's are really good in practice because.
- The whole concept very nicely seperates out the concept of identify from subject. This helps a lot in helping to make sense of things. In particular the merging of entities allows us to define a range of different PSI ranges, one for each naming schemes, and handle the problem of identity matching - ie stating that "QSIP: 124323" refers to the same entity as "NYSE Ticker:IBM" in a way that is sepeate from the import or export of data process, per se.
- It doesn't limit us to matching everything to yet another 'definitive' set of identifiers.. that is there can be nice not quite perfectly lined up way that sometimes they match up and sometimes they dont. (not sure how to put that mathemtiocally)
- The lack of a central repository is actually really good for us, because it means that I wouldnt' feel too scared to come up with our own complete set of Published Subject Indictors to represent what, say, the NYSE exchange ticker "IBM" means. However in a centralised repository situation, my company would never be so presumptive.. therefore we would be relying on waiting for the organisation in question (ie the NYSE) to get the whole TM PSI idea before we could use them. By making it distributed, and non-definitive I don't have to worry about all that and can just get on with it.
PSI's (and actually I'm talking about Merging and Subject Identity generally here) really suck because:
- In one context we want to merge the meaning of two PSIs.. that is we might decide that in this context the ticker IBM.1 and IBM.2 (assuming a stock split) really refer to the same company so, in the context of talking about who is the CEO they should be merged... however in another context, for instance, if wanting to make market assertions about what the price to earnings ratio is for 'IBM the company' (because I'm afraid thats how people think about it) we need those two PSI's to actually represent two different subjects. Its kind of annoying how with TM things are either merged or not. (I realise that we could keep a bunch of seperate topic maps and only merge them or seperate them depending on the current discourse, but the expense of the merging operation, in practice, makes that seem like a bridge too far and you end up resorting to just having associations between topics saying basically "this is the same company as that in a certain context").
- Similar problems with Companies and their subsidaries which sometimes you want to analyse 'togehter' and sometimes 'apart'.
---
So yeah.. and of course thats just companies.. which you would think would be very tractable. Don't even get me started on things like 'industry' classifications. (I should be glad I'm not a librarian I think).
==
Hopefully this is a helpful addition to the discourse.
Miles Thompson
sentient percipient
More information about the topicmapmail
mailing list