Testing Senzing's Entity Resolution Workbench
Jeff left IBM in 2016 to start a new venture called Senzing. Senzing has built the first real-time AI software product for Entity Resolution (ER), a space that Jeff is the world's #1 expert in. Senzing's new offering has huge implications in the post-GDPR world and has the potential to increase trust in Blockchain networks. Jeff recently gave a keynote at the IBM Think conference where he described what Senzing does and its potential applications (including as part of IBM Blockchain). I strongly recommend watching it.
When I spoke with Jeff yesterday, he asked that I give Senzing's ER workbench a try and provide feedback. So that is what I did earlier today. Here are my first impressions.
Questions for Jeff
- Currently Senzing only runs on Windows. When will it be offered on other operating systems (especially MacOS)?
- Why do I need to download the workbench? Can I not just have a Cloud based version?
- I found the workbench very easy to use. The instructions were clear and the steps to get from start to finish were intuitive.
- I uploaded a csv file of all my Google contacts. I could not believe I had 2,451 contacts in my Google contact list! Clearly I have a lot of spring cleaning to do.
- The csv file upload process was straightforward and quick. On that point though, the workbench currently only works with csv files. Any plans to directly connect to other data sources?
- The ER process is very quick. After uploading your data, ER is a one-click process. Very cool.
- The user interface could use an upgrade.
Results of the ER process
- The workbench identified 32 duplicates, 4 possible duplicates and 6 possibly related entities.
- The results had a lot more detail than Google contact's duplicate function provides
- Interestingly of the 6 possibly related entries, entities 5 and 6 both related to my wife. I was a little surprised that the workbench did not merge them both and give me 5 possibly related entities instead of 6
- Apart from this, it was really interesting to see how the workbench linked different entities
Single Search Function
- The Single Search Function (SSF) is very cool. I only tried it with the name field since it was the most intuitive one for me to try it with.
- One potential bug(?) I noticed is that you have to type full name of a contact in order for the SSF to work correctly. Partial name (just first or last name) searches resulted in (0 results found) errors.
- Also, I wish there was an option to merge the various contacts. While this may not be the focus of the workbench, sometimes it is useful if you want to clean up an address book. For example, in Google contacts, after it displays the duplicates, it gives you the option of merging all contacts. That gives the process a logical end point IMHO.
Compared to Google contacts' duplicates function
- It is probably unfair to compare Senzing's workbench to Google contacts' duplicates function but I did it and I might as well write about it.
- Google contacts identified 8 duplicates (Senzing identified 32 duplicates, 4 possible duplicates and 6 possibly related entities).
- The results were not nearly as sophisticated as those from Senzing in terms of the information provided.
- Also, Google got several duplicates wrong. Some were clearly not related. For example, for one contact, I had an old phone number and a new phone number saved. Even though the person who now owned the "old" phone number was clearly different from my friend (based on a Google update they had posted about where they were and a new photograph), Google suggested they might be the same person. Senzing did not.