Data Diving

Joel Henry (second from left) helps demonstrate software donated by Symantec to UM. Henry’s own software helps search troves of legal textual data.

UM employee launches company that searches massive amounts of legal information

By Jacob Baynham

In 2009, when computer science Professor Joel Henry went to the UM administration to ask if he could use his sabbatical to begin a law degree, then-Provost Royce Engstrom had two questions: “Are you sure you want to do that?” and “What does your wife think?”

Henry already had his doctorate in computer science. He had teaching and administration duties at UM. He had, in short, enough on his plate. But he was intrigued by the intersection of computer science and the law. And this was a man who once rowed an aluminum boat 61 miles around Yellowstone Lake during a weeklong camping trip so that he could see its wildest and least-visited places.

“As an engineer,” he says, laughing a little masochistically, “I like challenges.”

Two and a half years later, Henry had his law degree and still had his marriage, too. Well before he finished law school, he could see plenty of opportunities to combine his computer science skills with law. Once he passed the bar exam, he set to work exploring the field that had interested him most – electronic discovery. 

The intersection between technology and the judicial process is a messy one, mostly because technology usually moves faster than the law. “As a society and a legal community, we struggle with how to apply the law into this digital world,” Henry says. So he thought he’d help the law catch up. 

Electronic discovery concerns the retrieval of digital information – emails, Word documents, PDFs, texts, social media posts and such – for the purposes of a court case. Suppose a company was accused of wrongdoing and a lawyer needed to investigate the conduct of six of its employees from the past year. The emails of those employees would amount to dozens of gigabytes of data.

“No human can ever read every single one of those in any time that meets a legal schedule,” Henry says. “Plus, that would be a terrible thing to sentence someone to.”

To save time, Henry turned to his expertise in software development. “A computer doesn’t get tired at 4:30 on Thursday afternoon or drink coffee at 8 in the morning to get themselves awake,” he says. But for all their speed and efficiency, computers still need to be told what to look for in vast amounts of textual data.

And therein lies the problem. A simple keyword search, like we do on the Internet every day, wouldn’t be powerful enough to find all the data relevant to a case. Type “mole” into Google, for example, and you’ll get recipes for mole sauce, information on furry rodents, dermatology references and statistics on the Scottish soccer player, Jamie Mole.

Henry set to work creating a software algorithm that was smart enough to determine the meanings of words based on their context. It needed to search for concepts, not just words. It was a lot more complicated than Control + F.

Henry spent more than a year perfecting the algorithm. When he had it, he filed for a patent and started a company – Agile Data Solutions. His partners include a former chief information officer and one of Henry’s law professors, Sam Panarella. The software, called START (Smart Technology Assisted Review Tools) has three components – first it collects all the necessary data on the server, then it searches for relevant content, and finally it organizes the information into a single accessible file.

Henry’s company isn’t the only one out there offering a smart way to search troves of textual data. The industry leaders use a technique called machine learning. In this process, for every 100,000 emails, a user picks 5,000 at random and marks them relevant or irrelevant. The computer uses that information to sort the rest of the emails and picks out another 5,000 for a user to check its accuracy. These cycles continue until the computer learns to hone in on the right information. The method is laborious and expensive, costing up to $100,000 per project.

Henry’s technique, on the other hand, is more linear. It works like this: a user reads an email and marks it relevant or irrelevant. The software then immediately marks every email like it in the same way by comparing their content. For example, if one email says “Let’s meet for a beer after work,” and another says “See you at Draught Works at 5,” the software is smart enough to understand that they have an equivalent meaning. It even works with acronyms and abbreviations. For every email the user marks, the computer marks many more on its own, greatly speeding up the review process.

“I tell our clients the computer is making them into a super user,” Henry says. “They start marking emails, and it’s as if they’re marking hundreds of them, thousands at the same time.”

It’s also small enough to be installed on a laptop or desktop at a legal firm. So far Henry has sold the software to legal firms in Missoula; Portland, Ore.; Seattle; and Spokane, Wash. He also sells it to businesses with in-house legal counsel.

One of the best examples of the software’s potential came earlier this year, when Henry was helping a government institution facing a wrongful termination lawsuit. Henry helped the legal team search more than 160,000 emails. In two and a half hours, they found three emails that resolved the case. The supervisor had made a mistake, and wrongly terminated the employee. The institution settled the case out of court, saving a lengthy, expensive trial.

Henry presents in UM’s new Cyber Innovation Laboratory. Henry thinks there will be an increasing demand in the future for smart searches of big data. “This is an exploding problem,” he says. “It’s going to be a problem that the legal field is really going to grapple with for years to come. We all generate enormous amounts of data every single solitary day, whether it’s email, text messages, Facebook posts, blog posts or tweets.”

That demand is already putting a strain on Henry’s company. Unlike other sales, being in the software business requires constant user training, technical support and tweaks in programming. Every time Henry installs the software and trains another user, he has another person who might call him later with questions or requests for adaptations of the software. Henry currently employs a staff of 10 – all current or former UM students. They don’t have an office; instead they meet in an online meeting space, and their main business line is directed to the cell phone of whoever is on duty.

“In today’s world with software development, I don’t really need to have my people all in the same office,” says Henry, who also now serves as UM’s information technology legal adviser. “It’s a neat new world in terms of running a business.”

Henry and his staff currently are helping a local law firm prepare a case by searching 30 gigabytes of contract documents – more than 234,000 pages in all. Of all that information, 80 to 90 percent will be irrelevant. But hiding somewhere among them will be the handful of pages that could make or break a legal case. And hunting them down is an exciting problem for a man like Henry, who likes a good challenge.

For more information email

Next >