Registry schema cleanup

Ian Wadham iandw.au at gmail.com
Tue Nov 18 15:31:40 PST 2014


Hi guys,

On 19/11/2014, at 9:34 AM, Joshua Root wrote:
> On 2014-11-19 06:48 , Clemens Lang wrote:
>> I'd like to clean up the database schema of the registry. We have a couple
>> of fields in there that are currently unused, as well as a few possible
>> indices that could improve performance and at least one index that is
>> actively harmful.
>> 
>> In detail:
>> 
>> I'd like to drop the `url' column from the ports table. All ports I have
>> installed have this set to NULL and I couldn't find a use-case where it
>> was being used. I tried installing a port from a URL, but that doesn't
>> set the field either.
>> Even worse, the `url' column is part of a unique index `UNIQUE(url,
>> epoch, version, revision, variants)', which is bad, because if `url' is
>> always NULL, that means you can never install two ports with the same
>> tuple of (epoch, version, revision, variants). I'm surprised we haven't
>> hit this yet.
> 
> SQLite considers NULLs to be distinct in this case.
> <https://www.sqlite.org/nulls.html>

FWIW I think the operative phrase on that page is:
"-- I have about decided that SQL's treatment of NULLs is capricious and cannot be
deduced by logic.  It must be discovered by experiment."

For this kind of reason, fellow relational database designers/programmers and I have
always eschewed the use of nulls (wherever I have worked, since about the 80s).

You cannot know for sure what they are going to do nor how they are going to perform
resource-wise. An index with its leading term being null smells distinctly fishy to me - a
change in internal implementation could change the performance, perhaps radically.

It has always been the case, in my experience and AFAICR, that you can use a
prescribed default value, such as "", 0 or -1, in place of a null (missing/unknown value)
and thus get completely predictable actions and performance. And it is not often
(with apologies to the purists) that you really need to distinguish between null and
empty values. Even then there is usually a simple, practical way around the problem.

My 2c.

Cheers, Ian W.



More information about the macports-dev mailing list