Cleaning up your Contacts with Contact Cleaner

Posted in Open Source, Zimbra Web Client by John Holder on the February 8th, 2008

Anyone who’s had an iPhone, BlackBerry, or Windows Mobile phone knows that depending on how many accounts you have with different e-mail providers, you can end up with 5 different contact entries for one person. Raja, an Engineer here at Zimbra, wrote this cool Contact Cleaner zimlet which was is in Zimbra 5.0. I asked him to write up a blog post, and make a video. Enjoy!


What is 'Contact Cleaner' Zimlet?


Its a Zimlet that deletes and merges duplicate contacts. It goes beyond just identifying contacts with all matching fields(aka Duplicates with Perfect Match), but is smart enough to identify "most-likely" duplicates where most of the fields(firstname,lastname,email etc) are same but few different ones.

Why you might end up with duplicate contacts?
As you know, Zimbra's AddressBook can sync contacts from various sources, like: Blackberry, iSync,Outlook etc, which is very flexible to load contacts from various sources, but that also means, if any of these break and fail to recognize pre-existing contacts, you might end up with duplicate contacts.
This kind of failure usually happens when you upgraded one of the softwares that participates syncing. Softwares could be: BlackBerry, Zimbra-outlook-connector, Outlook-itself, iSyncConnector, MacOS, Zimbra-server etc.
Apart from this, you might manually import .csv from your colleague/friend or drag-dropped from shared-contacts and might endup with duplicates.
Finally, Since virtually everyone uses multiple email addresses, this ultimately also contributes to duplicate contacts for the same person.


How the Zimlet solves the problem?

This Zimlet scans the address book to see if there are any Contacts that are duplicates or most-probably a duplicate of another. Further, it classifies all such duplicate contacts into 3 broad categories, Duplicates with Perfect-match(actual duplicate), Partial-match(most-likely duplicate) and Duplicates-with-Conflicts(50/50 chance that its a duplicate).

1. Duplicates with perfect match(simplest form and usually most duplicates fall into this category)


These are the duplicates where every field matches. obviously, these are safe to delete. In this case, Contact-cleaner simply moves all the duplicates to Trash while keeping only one of them.

E.g.


















Duplicate1:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com













Duplicate2:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com













Merged or Resulting
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com

2. Duplicates with Partial Match: most-likely duplicates(not all fields have matched)

– These are the duplicates where one of the duplicate contacts has some extra information(like: email2 information). These are usually safe to merge, and the merged or the resulting contact will be a super-set of both the duplicates.
E.g.


















Duplicate1:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com
Email3: 
















Duplicate2:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2: 
Email3:john.doe@foo.com

















Merged or Resulting
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com
Email3:john.doe@foo.com

3. Duplicates with Conflicts(50/50 chance that its a duplicate) AND Automatic-merging will loose data, needs users attention):

These are contacts where duplicates have different values in the same field(although they had enough matches to be considered as duplicates)
E.g. You have one contact with 4 fields(like firstname, lastname email,email2)) and other contact with 4 fields(firstname, lastname email, email2)
Suppose if email2 in this case has two different values, then they become duplicates with conflicts.

E.g. Duplicates with 3 conflicting fields(email2, phone and city)






















Duplicate1:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:John@joe.com
Phone:650-123-4567
City:San Mateo



















Duplicate2:
Firstname:John
Lastname:Doe
Email:john@foo.com
Email2:john.doe@foo.com
Phone:888-888-8888
City:Sunnyvale

… in this case, you have 3 options to fix:

OPTION 3.1: Ignore merging:

You might have two people with same firstName and LastName or for any other reason you decided not to merge. Selecting this would ignore merging.

OPTION 3.2(Automatic): Add Conflicting email info to 'email2' and 'email3' fields and the rest of the conflicts to 'Notes'-section:

In this case, Zimlet will use one of the duplicate contact's conflicting info as original and automatically
adds all the conflicting info(say conflicting phone number, conflicting City etc) to Notes-section.

Secondly, a special consideration is given to conflicting Emails-fields, such that, zimlet tries to squeeze in the conflicting emails into empty-email field(say email3 field). if we have email1@foo.com and email2@foo.com


Leave a Reply

|  Blog Home

Subscribe

Zimbra RSS Feed

Subscribe by Email



Categories


Archives