@jespah,
Against the wails and gnashing of teeth from the admin office, i obtained permission to redo the entire name/address database so that every word was a separate field in the db. That was when i got the services of the student workers, who were now actually doing useful work. Previously they had been folding fliers and stuffing envelopes, something which volunteers could serve (most of them little old ladies for whom, i suspect, this was their best opportunity to get out of the house). Just in re-doing the data base to separate everything into individual fields, we dropped the address list from more than 85,000 entries to 73,000.
When the Director came to me with the notorious letter, i was able to tell him that we had already eliminated almost 12,000 duplications. It was then that he gave exclusive use of the student workers' time. If you wanted to employ one of the student workers, you had to ask me, and i required that you justify the use of their time.
Managing a data base is not something where the tech geeks can really help you, other than to assure that you know how to use a db. Our first step was to make an alphabetic listing of all street type fields. So, if one of the fields read "Av.e" rather than "Ave.," it popped right up. You could then give it a global command to convert all entries reading "Av.e" to "Ave." Then i gave the db a global command to delete all periods in the street type field (it took me one go through with "Ave." to realize the utility of that). We then took on "street" and it's abbreviations, "circle" and it's abbreviations, etc.
But you couldn't make field entries in the master db unique. So, we then exported all the address field information to a "dump" db (one i would erase when it was no longer needed) in which the field information ("number," "street," "street type," "city" and "zip coce") would be combined in a single field, which had to be unique. Then the work gets tedious. Having alphabetized that list, we then identified apartment buildings and condos using the Haines and the Criss-Cross guides, and either added the correct apartment number if we knew it, or arbitrarily assigned one in a special field which was not exported to the printing list for addresses. (So, if you lived in Apartment 15 of the building, that would get exported to the printing list--if we didn't know your apartment or condo number, that would go in a special number field which did not get exported to the printing list.)
You also obviously can't make the name fields unique, so, once again, we exported the name fields to a new dump db, where they were combined with the address list, and required those fields to be unique--so more than one Jespah Hamster would get eliminated after the address list had been cleaned up. Then the tedious work would begin, where we would look for things like Jespah Hamtser, or Jepsah Hamster, etc. There were always two student wokers in the office with me, and sometimes three. We used eight IBM PC XTs, and the master lists were on my (then newest and most powerful) IBM PC AT. It took us just over five months to overhaul the address list, and we reduced it from just over 85,000 entries to just under 54,000 entries.
The Director's secretary and her "extra help" student workers (kind of a cross between a temp employee and a student worker) were really pissed when they were required to submit their work to the master db, but it kept us from recreating the duplications. They really hated it that they would get beeped at by the the computer if they typed "Ave." (because of the period), but it was just one more step in preventing duplications.
Shortly thereafter i transferred to the family shelter--and man was i ever glad to get out of that office. I never want to run a data base again in my life.