Taking the Cost and Risk Out of the Inbox: July 2010

Monday, July 26, 2010

On the role of intelligent classification

Last week, as I savored the opportunity to deliver a grueling day-long sales training session, a colleague sent me an interesting article about the future of automated document review. I've discussed here before the notion of automation, but in terms of the process of email classification and capture, as opposed to that of litigation review. But fundamentally the question at hand is the same: can computer-based artificial intelligence make an accurate "enough" decision about the importance and relevance of a given document?

It's worth appreciating that, like document review, email classification and capture is not really about achieving 100% accuracy – unless everything is saved (which we learned last week, still remains wholly unnecessary) there will always be some degree of discrepancy in terms of having retained or provided a fully complete collection of every fundamentally relevant piece of information. Rather, what is ultimately most critical is the capacity to demonstrate good faith and honest effort in doing what is deemed appropriate.

The article considers a simple study that pitted two groups of human reviewers up against an array of automated document review technology and compared the results. In this particular exercise, the human and computer review "teams" were to consider a set of 5,000 documents from a historical legal matter, and try their best to make consistent decisions with that of the original legal review team. Interestingly, both human and machine achieved about 75% accuracy in terms of making the "correct" decision, with the automated function faring even slightly more accurate. The author provides some worthwhile considerations in terms of factors that should have readily affected the human reviewers capacity – but that is not the element in which I am especially interested.

Rather, the article concludes with a suggestion that rather than replacing human reviewers, automated tools will perhaps increasingly adopt a supplementary position – providing an overlay that helps to reduce the total number of attorneys required for at least first pass review. I've long taken the same stance in terms of considering intelligent classification capabilities as an element of an email management solution. Studies like this demonstrate that it is in fact plausible for artificial intelligence to make reasonably accurate decisions about what information is necessary to keep. I don't think that premise has ever really been in question (at least, for those who are cognizant of not underestimating how quickly technology can evolve and improve) – but let's remember that the goal of an email management system is twofold. It is not just about capturing and retaining business relevant content – even more so, it is about disposing of transitory, non-business relevant content in a timely manner.

When we consider this goal, the importance of human involvement becomes more apparent. Because despite being a liability in terms of corporate litigation risk, email is still fundamentally the lifeblood of organizational productivity – and users need this information to perform their jobs effectively. Furthermore, if an automated tool is not just making determinations about what to keep – but also what to dispose of – the specter of whether 75% accuracy is "good enough" becomes much more grim. When an automated tool invariably makes a determination that some valuable email was in fact not, and automatically purges it from a users mailbox, the ensuing anger and frustration on behalf of the mailbox owner should not come as much of a surprise. And of course, at this moment it's inevitable that the user will do two things: 1) begin plotting counter-measures to circumvent the system to ensure this does not happen again and 2) warn his or her peers that the same thing could happen to them at any moment, stirring up unrest.

Thus we reach the same conclusion as the author of this article – rather than outright replacing their human counterparts, automated document review tools should be considered for their ability to simply overlay the process, and make life both easier and more efficient. Looking to the future of email management technologies, I remain dubious about the viability of fully automated, intelligent classification capabilities. This skepticism is not rooted so much in doubts that the technology will merely "work" – but rather in appreciating that unless machines also somehow replace users as the creators and consumers of information, its unimaginable that a system can truly know what one needs to do one's job effectively, at a given point of time. In that regard, it's kind of like appreciating time as a fourth spacial dimension. While the meaning of information may remain largely static, its importance as a business artifact may change markedly over the course of weeks; I may know that what is innocuous at face value today, may become hugely important to me tomorrow, based on knowledge that lives outside the confines of the literal letter of a discrete email message.

Rather I like to think in terms of "suggested classification" – where intelligent and analytics-based tools are employed to consider messages as they accumulate within the mailbox, but users fundamentally remain in control, considering recommendations provided by the system that make manual decision-making easier, yet ultimately retaining the power to determine what is and is not of value.

Thursday, July 15, 2010

An intervention for compulsive hoarding

During most of my sessions I typically spend a few minutes outlining the "guiding principles" that served as the blueprint for our Open Text Email Management product. In effect – these are also synonymous with the business change such a project seeks to enable. On one hand, email management is about identifying and retaining business value and business record email. But perhaps most important, it is also about automating and safeguarding the deletion of unimportant, low-value email – because ultimately, this is the content that puts the most monetary burden on our litigation budget as it unnecessarily and markedly drives up the costs of identification, collection, preservation, and review.

So – not surprisingly, an email management project has to endeavor to ultimately dispose of anything deemed unnecessary to keep; but companies often struggle with the obvious consequence. What if we are accused of failing to preserve evidence? How can we possibly know that something appearing mundane and irrelevant today could take on a critical importance many years from now?

From the perspective of a records management solution provider, one of the most compelling aspects of the late 2006 amendments to the Federal Rules of Civil Procedure was Rule 37(e), the so-called Safe Harbor provision. In short – 37(e) proposed that an organization should not be sanctioned for failing to preserve evidence, so long as the associated information loss occurred as the result of routine, good faith operations of an electronic information system. I concluded a posting made earlier this week reiterating the importance of acting in good faith – and interestingly today a case crosses my inbox considering this very topic.

It seems in this particular matter, Oslon v. Sax, the defendant terminated who they believed to be a thieving employee – in turn, the employee filed a wrongful termination lawsuit, claiming her dismissal was rife with discrimination. Her alleged misdeeds were reportedly recorded on videotape back in July 2008, and not surprisingly the tape itself was requested during the ensuing discovery exercise. Even less surprisingly, the defendant responded that the tape had since been erased, given that their policy was to record over surveillance footage after a month. In the scuffle that followed, the defendant pointed to Rule 37(e), arguing that 1) the data was not erased maliciously and it was their standard practice to record over surveillance footage after such a time period (in fact, such a practice was not only used consistently in all of the defendant’s retail stores – but was also an accepted industry standard), and 2) that they were unaware of any possible pending litigation until receiving a letter from the plaintiff’s counsel some seven months later in February 2009.

Most interestingly – while the court decided the defendant should have indeed appreciated the possibility of litigation way back in August 2008 (and thus had a duty to preserve the associated evidence), the court also decided that sanctions were not appropriate given that there was no indication the defendant had acted in bad faith.

From my perspective, the primary benefit of 37(e) is that it not only protects organizations from undeserved penalties, it also frees them from the burden of a retention policy characterized purely by paranoia. Without this rule – and, of course, an ability to see into the future – organizations have no choice but to save everything forever. This particular opinion is crucial, as it reiterates once again that organizations must not be compelled to meticulously save every shred of paper or every kilobyte of data – in the unlikely event that it may be relevant to some litigation in years to come. Rather, establishing a sound retention policy, ensuring that it is enforced in a consistent manner, and ultimately acting with good intentions are fundamental safeguards against accusations of nefarious evidence destruction.

Wednesday, July 14, 2010

The contigency volume

An idea I’ve discussed throughout this blog, and generally in all the sessions I present, is that of "transitory" email – that which has no long-term business relevance – and which consequently, companies often endeavor to purge in a timely manner. Yesterday I considered the premise of the "exception driven model", which maintains email is transitory by default, unless some overt action is taken to dictate otherwise. Such an approach is, of course, aggressive – and oftentimes may leave organizations concerned that their policy may appear too lax in ensuring potentially discoverable messages are in fact retained.

Let's remember however, that we must balance such a concern with the principle that no organization is mandated to save everything forever. At some point, information must be disposed – what is important, is that the disposal of this information is performed in a consistent and responsible manner.

To alleviate the concern that such an exception-driven approach is too aggressive, I have noticed in recent months that companies are interested in establishing and relying on what I am beginning to label a "contingency volume". The basic premise underscoring the contingency volume is that it is designed by nature to capture 100% of all email inbound, outbound, and internal within the organizations email environment. Such email is then saved for a strict period of time, regardless of what other action is performed against that content (that is, regardless of whether it is saved as a record, or allowed to be deleted per the transitory item policy).

The contingency volume may exist in parallel to a "managed email volume". The retention policy on this contingency archive may be, for instance, 13 months – and this provides organizations with an assurance that no matter what, they have a complete picture of all email correspondences from the past 13 months, all of which is readily accessible and may be preserved in the event of litigation. Concurrently, email messages contained in the contingency volume may also have been ingested into the managed email volume as formal corporate records – or they may have been deleted from user inboxes either via outright action or automatically.

In this example, after 13 months elapses, messages that have reached this age are purged from the contingency volume (unless, of course, on litigation hold). From this point forward, any messages still existing in the environment are stored within the managed email volume. In all cases these messages will have also an explicit and oftentimes much more granular retention policy enforced against them – for instance, content associated with a project may be retained for 3 years following the conclusion of this project. Fundamentally, the "managed email volume" is the long-term repository of record for corporate email; it provides a comprehensive view of everything that was deemed business relevant and appropriate to keep. But the contingency volume provides organizations with a comfortable "fall back" – the capacity to demonstrate they acted in good faith and took extra and reasonable measures to ensure email retention, regardless of any other governing factor, be it user discretion or machine-based analysis.

Tuesday, July 13, 2010

Exceptional retention strategies

I have the good fortune to spend a significant amount of time with customers and prospects discussing the fundamental philosophies around email archiving and retention – setting aside technology and considering the underlying principles that ultimately drive the most granular functional details. An interesting principle I have noticed emerge in recent months is what I characterize as an "exception driven model" for email classification.

First and foremost – it’s worth reiterating that some clients adopt a strategy of saving "generously" – that is, capturing and retaining a very broad range of most, if not all, email communications that pass through the mail server. Of course not every organization adopts such a cautious approach; many endeavor to capture and save only those messages which are business relevant or business records – during the normal course of business operations at least. Now for organizations such as these, a critical element of the strategy is the process of determining which messages are actually worth saving, and the consequent act of capturing and retaining them. The massive volume of daily email often makes it a dubious prospect to consider each and every message for retention, particularly if some user thought and action is a part of the process.

This aforementioned "exception driven model" is all about setting an expectation for how information is considered by default. Organizations that adopt an exception driven model assert that messages are transitory (that is, not appropriate for retention) by default, and the explicit act of dictating otherwise must first occur prior to any message being saved. If a user-directed approach to classification is employed, it may be the act of dictating a message’s category that then triggers its subsequent capture. If a more automated approach is taken, whatever criteria that triggers the autoclassification must first be met, prior to the message being captured and retained. Otherwise all content in a mailbox is by default transitory, will not be automatically archived, and most likely will be purged from the mailbox after some pre-determined and relatively short time period, such as 90 days.

This sort of exception driven model is particularly compelling in user-directed scenarios, as one cannot easily expect users to manually deal with a hundred, perhaps even hundreds of discrete items per day. Reducing the classification and retention of emails to exceptions only – supplemented by a simple decision tree that ultimately dictates actual relevance – is increasingly becoming a cornerstone of many email retention strategies I observe.

In short, such a strategy is characterized by the notion that emails are transitory by default, and without measured and overt action to dictate otherwise, they will be purged from the mail environment in a timely manner. Of course – such a practice is only appropriate during the normal course of business operations. In any scenario where information is potentially discoverable or otherwise likely to receive greater scrutiny, it is crucial to suspend any such actions and implement a legal hold process. A company should not be expected to save everything forever – but at the end of the day, acting transparently, consistently, and with demonstrable good faith will go a long way to determining how one’s retention policies are considered.

Monday, July 12, 2010

On email classification and flexibility

Considering the date of my last posting – I am obviously not a blogger by nature – but I do appreciate the opportunity to share my thoughts and opinions on the topic of email archiving and retention. Some time having past since I last posted, my perspective has evolved somewhat – in particular with regards to the concept of email classification and more specifically, its day-to-day implementation.

I’ve long been a proponent of user-directed classification, and I think the arguments of the customer base I’ve spent so much time with (and who in turn have refined my perspectives) still hold true. The belief that the creators and consumers of information are best equipped to make accurate decisions about its importance, and that the contextuality and elusiveness of correspondences make it a dubious distinction that a purely machine-directed tact could be wholly sufficient – these fundamental tenets often underscore custodial models to email classification.

Concurrently, the argument of content volume is difficult to deny – who truly has time to consider and deal with every single message that crosses his or her inbox, making some measured decision about its relevance? And furthermore, clearly some functional roles are better suited for such a task than others – knowledge workers like myself may find it second nature to manage information proactively, but certainly other roles within a company are bound to boast less inclination or opportunity.

Consequently, I suspect companies will increasingly look for flexibility and a middle ground in terms of email classification – combining the notion of measured decision-making on behalf of users, with the capacity to apply broad retention rules to information by default. The application of such broad rules will oftentimes be simplistic and based on reliable characteristics such as geographic location or functional role of the mailbox owner. Such a baseline approach to retention may perhaps be a necessity for certain roles with organizations, and companies may also find it effectively combined with a more granular, user-directed approach to email classification.

But always should companies take care to avoid retaining too "generously", and saving content without context. The pitfalls of arbitrary and indiscriminate retention can be devastating – as terabytes of email can accumulate both quickly and relentlessly, and without contextual retention it can become difficult – if not impossible – to ever trigger a disposition.

Taking the Cost and Risk Out of the Inbox