What wins a case? This is one of the fundamental questions of the legal profession. This is the question we are trying to answer at LawBot. We are building a matrix plotting English legal data and dozens of its features. We are training a model on these feature vectors to identify correlations between individual features and winning or losing. We then want to provide a service that populates a user's feature vector by interacting with the user in a chatbot format. The service then makes a prediction about the user's likelihood to win the case and then matches the user with the appropriate lawyer. The lawyer is sent an automated summary based on the feature vector. This is the LawBot pipeline.
Through an iterative process of training a model based on reasonable assumptions and readjusting the assumptions based on the model's performance, we are aiming to predict correctly the outcome of a case in 95% of cases.
1. An unsupervised learning approach to show data structures
This approach will show the correlation of individual words or phrases to winning or losing a case. In this approach we train a model on two groups of cases: cases that were won and cases that were lost. We begin by removing stopwords from the data. We then tokenize the entire data using the natural language toolkit (NLTK). We also lemmatize the data using NLTK's WordNetLemmatizer function. We load the individual tokens into a mapping of all words and give them unique identifiers. We then load the tokens into vectors and assign a label to them: 1 for winning, 0 for losing.
We import sklearn's linear model and train a model using logistic regression on a sample of the data. This gives us a classification rate to guide further iterations. We then iterate through the entire word mapping and determine the model coefficient. Every time a word has a higher coefficient than our selected threshold value, we print out the word to a local file.
The output file contains words that correlate strongly with either winning or losing a case. These words provide a starting point for making assumptions about which features are important for winning or losing cases.
This approach follows the same paradigm as the UCL paper published last year. While we think this is a good start to understanding the problem of winning or losing cases, for the purposes of our service this approach runs into significant problems. These problems will be detailed below.
2. Using analytical legal skills to make assumptions:
Computers can't (and shouldn't) do all the work for us. Our analytical legal skills are another source of assumptions. Here it is important to keep in mind that it is not necessary for the program to understand the details of litigation. The program only needs to be able to correctly identify specific features in a data set and then make predictions based on those features. So the assumptions we make using our analytical skills should not be concerned with legal technicalities but with simpler features that we can train a program to recognize.
One such assumption that our analytical legal skills should not lead us to make is: "When the defendant has breached the duty of care it will be more likely for the applicant to win." This assumption is self-evidently true. But while it is possible to train a computer to recognize breaches of duties of care in the strict confines of English case law data, it is very difficult to do so in the free-flowing conversation we envisage our users to have with the chatbot.
Our analytical assumptions therefore must be simpler. One assumption we currently hold (perhaps soon to be discarded) is that emotional content is positively correlated to winning a case. Another assumption we currently hold is that the age of the applicant has an impact on winning or losing a case.
These two approaches allow us to make a number of assumptions that we can then test. Both the unsupervised learning and the analytical approach reinforce one another.
Once we have made our assumptions we must test them. To do this we build a large matrix plotting whether the individual features we think are important are present in the different documents or not. We then populate each feature vector using a program that scans the documents for patterns indicating the presence of specific features.
Once the matrix is fully populated we train different models on a sample of the data. We use sklearn to show which model works best and which features are important. Based on these results, our assumptions will either be verified or rejected. We then go back to the previous stage and make better assumptions based on the results.
Through an iterative process we aim to arrive at a model that has a high MCC and scores well on other binary classification tests.
Our feature vector approach is used commonly in commercial recommender systems for music or on dating sites. For much of the natural language processing research in the legal world other approaches are common. Last year's UCL paper took an approach that emphasised term frequency. Many NLP systems such as spam classifiers successfully use this approach.
We think this approach is not the best one for the service we want to provide. This is because training a model using a term frequency approach requires a lot of data of the kind we are trying to make predictions about. This works well for emails because there are a lot of openly available email databases and because the data about which predictions are being made (real emails) is similar to the data used to train the model.
But the data we are trying to make predictions about is conversational interaction with a chatbot about legal problems. Needless to say there are no data sources for this type of data. We thus have to work around the data problem by identifying key features in the data we are making predictions about and the data we are training the model on. In the former case we will do this through open ended descriptions of the problem from the user side followed by specific questions about particular features (OkCupid is a great example of these question-answer interactions.) As mentioned above, we use another program to identify these features in our own data to train the model.
We are also considering automatically populating parts of the user's feature vector by using the data stored on their social media accounts. This would proceed with the user's consent using the relevant APIs.
The result then is a prediction about the likelihood of the user winning their case and a feature vector with detailed information about the user and their case. These data-rich profiles can then be used to refer the user to a lawyer specializing in this type of case. The lawyer would receive a summary of the user and their case. This commercialization route would fundamentally disrupt the gateways to legal representation in England.
We want to launch our service on Facebook over the summer. Currently, we are engaged in finding the right features and training a powerful model. We will be posting periodic updates on our website. We also need to study for our law school exams :(
For questions or contact: firstname.lastname@example.org.