Always use a Surrogate Primary Key
Many years ago, when I was still in possession of my hair, I entered a subcontract to develop a system to replace paper-based case files for a quango in South Africa.
The main contract was held by a large development shop that specialised in a related, but different field. They had given one of their systems analysts the task of analysing the system (duh!) but had no spare capacity to code it, so they farmed it out to me.
It was brilliant, I thought. Here’s a tight, clean spec, generated by someone who knows what he’s doing. There will be no messing around, I can just cut the code, take the money and run.
Can you hear the Jaws soundtrack yet?
Turns out the tight, clean spec was written by a guy I’d never meet or even get to speak to. And I had questions, because in those days I still believed in fourth normal form. I mean, I still do, but I now understand that 4NF is even less popular than my political inclinations, which are embraced by roughly 1 in 8,000,000,000 people.
The core of the system was the case table, everything else hung off the case table. And the case table had a composite primary key a mile long, that included an incident date (not the system date, a manually input date.)
At this point, even I could hear the Jaws soundtrack. I jumped on the phone to the main contractor and told him I thought this was a bad idea.
“What happens if they need to change the incident date? If they do that, all the dependent data will be orphaned,” I said.
“They never will. By the time a case is opened, the incident will have been checked with the police, two different lawyers, hospitals and doctors. That date is cast in stone.”
“OK, but what about human error?”
“That will be reviewed and corrected immediately. Look, just write it like this, I haven’t got time to debate it.”
“OK.”
Three months later, the system was complete, the training was done, documentation delivered, and I was mooching around the client’s office, flirting with one of the staff and ostensibly delivering the first day of “go-live hypercare”.
There was another lady in the office who was recognised as the guru in how the manual system worked, and she called me over to her desk.
“I have a problem: I changed the incident date on this claim, and all the information related to the case disappeared!”
lolwut?
“Why did you change that, I thought it was cast in stone, and had been checked by seven different parties before being recorded?”
“That’s as may be,” she said, pointing at a stack of about a dozen thick files, “but this pile of cases here are today’s cases that need their incident date changing. There are about this many every day.”
So, the moral of the story is, you never know what will happen in the future, don’t use referenced composite primary keys because you can orphan dependent data.
And even if you don’t have dependent data, you never know what will happen in the future, you may add dependent data, so don’t use composite primary keys.
TL;DR Always inject a surrogate primary key. Use secondary indexes for access.
Post Script: I subsequently found out from the resident expert that she’d never been asked for any input.