Monday, July 26, 2010

On the role of intelligent classification

Last week, as I savored the opportunity to deliver a grueling day-long sales training session, a colleague sent me an interesting article about the future of automated document review. I've discussed here before the notion of automation, but in terms of the process of email classification and capture, as opposed to that of litigation review. But fundamentally the question at hand is the same: can computer-based artificial intelligence make an accurate "enough" decision about the importance and relevance of a given document?

It's worth appreciating that, like document review, email classification and capture is not really about achieving 100% accuracy – unless everything is saved (which we learned last week, still remains wholly unnecessary) there will always be some degree of discrepancy in terms of having retained or provided a fully complete collection of every fundamentally relevant piece of information. Rather, what is ultimately most critical is the capacity to demonstrate good faith and honest effort in doing what is deemed appropriate.

The article considers a simple study that pitted two groups of human reviewers up against an array of automated document review technology and compared the results. In this particular exercise, the human and computer review "teams" were to consider a set of 5,000 documents from a historical legal matter, and try their best to make consistent decisions with that of the original legal review team. Interestingly, both human and machine achieved about 75% accuracy in terms of making the "correct" decision, with the automated function faring even slightly more accurate. The author provides some worthwhile considerations in terms of factors that should have readily affected the human reviewers capacity – but that is not the element in which I am especially interested.

Rather, the article concludes with a suggestion that rather than replacing human reviewers, automated tools will perhaps increasingly adopt a supplementary position – providing an overlay that helps to reduce the total number of attorneys required for at least first pass review. I've long taken the same stance in terms of considering intelligent classification capabilities as an element of an email management solution. Studies like this demonstrate that it is in fact plausible for artificial intelligence to make reasonably accurate decisions about what information is necessary to keep. I don't think that premise has ever really been in question (at least, for those who are cognizant of not underestimating how quickly technology can evolve and improve) – but let's remember that the goal of an email management system is twofold. It is not just about capturing and retaining business relevant content – even more so, it is about disposing of transitory, non-business relevant content in a timely manner.

When we consider this goal, the importance of human involvement becomes more apparent. Because despite being a liability in terms of corporate litigation risk, email is still fundamentally the lifeblood of organizational productivity – and users need this information to perform their jobs effectively. Furthermore, if an automated tool is not just making determinations about what to keep – but also what to dispose of – the specter of whether 75% accuracy is "good enough" becomes much more grim. When an automated tool invariably makes a determination that some valuable email was in fact not, and automatically purges it from a users mailbox, the ensuing anger and frustration on behalf of the mailbox owner should not come as much of a surprise. And of course, at this moment it's inevitable that the user will do two things: 1) begin plotting counter-measures to circumvent the system to ensure this does not happen again and 2) warn his or her peers that the same thing could happen to them at any moment, stirring up unrest.

Thus we reach the same conclusion as the author of this article – rather than outright replacing their human counterparts, automated document review tools should be considered for their ability to simply overlay the process, and make life both easier and more efficient. Looking to the future of email management technologies, I remain dubious about the viability of fully automated, intelligent classification capabilities. This skepticism is not rooted so much in doubts that the technology will merely "work" – but rather in appreciating that unless machines also somehow replace users as the creators and consumers of information, its unimaginable that a system can truly know what one needs to do one's job effectively, at a given point of time. In that regard, it's kind of like appreciating time as a fourth spacial dimension. While the meaning of information may remain largely static, its importance as a business artifact may change markedly over the course of weeks; I may know that what is innocuous at face value today, may become hugely important to me tomorrow, based on knowledge that lives outside the confines of the literal letter of a discrete email message.

Rather I like to think in terms of "suggested classification" – where intelligent and analytics-based tools are employed to consider messages as they accumulate within the mailbox, but users fundamentally remain in control, considering recommendations provided by the system that make manual decision-making easier, yet ultimately retaining the power to determine what is and is not of value.

0 comments:

Post a Comment