Paradigm shifts: random thoughts on predictive coding, data privacy, IBM, neuroscience and other stuff as we close out the year

Fifty years ago, Thomas Kuhn, then a professor at the University of California, Berkeley, released a thin volume entitled The Structure of Scientific Revolutions. I recently attended an MIT/Harvard symposium on the effects of the book’s publication.

Kuhn challenged the traditional view of science as an accumulation of objective facts toward an ever more truthful understanding of nature. Instead, he argued, what scientists discover depends to a large extent on the sorts of questions they ask, which in turn depend in part on scientists’ philosophical commitments. Sometimes, the dominant scientific way of looking at the world becomes obviously riddled with problems; this can provoke radical and irreversible scientific revolutions that Kuhn dubbed:

“paradigm shifts”

Yes, introducing a term that has been much used and abused.

Paradigm shifts interrupt the linear progression of knowledge by changing how scientists view the world, the questions they ask of it, and the tools they use to understand it. Since scientists’ worldview after a paradigm shift is so radically different from the one that came before, the two cannot be compared according to a mutual conception of reality. Kuhn concluded that the path of science through these revolutions is not necessarily toward truth but merely away from previous error.

And oh, the technology we have today. Except … the proliferation of technology has dramatically infiltrated all aspects of modern life. In many ways the world is becoming so dynamic and complex that technological capabilities are overwhelming human capabilities to optimally interact with and leverage those technologies. It seemingly takes only a tiny group of engineers to create technology that can shape the entire future of human existence with incredible speed.

From my own perspective, well, like … wow. When I travel now I no longer carry a laptop and travel only with my iPad and use IBM SmartCloud Desktop to access all my at-home desk top applications, from any location, although I also carry a mini Seagate 500GB GoFlex with my critical documents which I can wirelessly transfer to my iPad. I do have everything loaded up in Microsoft’s SkyDrive but find the GoFlex is more handy. I also have the new collapsible Apple Bluetooth keyboard which comes in nicely when working on a train or in a hotel room.

And I use IBM Content Navigator which allows our staff to access, manage and work with any of our content directly from mobile devices, tablets, desktops, laptops any time, anywhere. Content collaboration, production imaging, and other kinds of core ECM capabilities to capture, activate, socialize, analyze and govern content. Very cool. Very productive.

Now this past year: some paradigm shifts “in progress”, yes? Some random thoughts:

PREDICTIVE CODING

IT’S A LOVE FEST! Sharon Nelson’s description (click here). As she says “PC/CAR/TAR by any other name has become a love fest at technology and e-discovery conferences”. She always has the perfect turn of phrase. I especially liked her reference to “Keeping Up With the Kardashians” when she discussed the imbroglio earlier this year involving Recommind, Judge Peck and Ralph Losey.

Although my favorite comment came from a senior litigation partner from [Mega Law Firm] at ILTA this summer who said“This ain’t law. It is backroom IT process. These e-discovery judges only seem to have time to write 50 page opinions defending themselves and attending sponsored tech shows. Are you going to see Denny Chin [judge in the Google book digitization case] jump off the bench and join Google at a conference on the wonders of digitization, or Bill Alsup [judge in that mega copyright case involving Oracle/Google] jump off the bench and join Google at a conference on the copyrightability of APIs? No. That is real law. This is all vendor run, vendor driven”.

Ouch. Doubt if he was invited to the e-discovery practice group Xmas party at his firm. And a response to his diatribe requires another post.

Personal note: I like Ralph. I entered the e-discovery area only in 2002 (I think maybe before we actually used the term but before the philosophical debates on whether you need to use a hyphen) and his blog was the first I read (and keep on reading) to get up to speed. My God, it seems like only yesterday, sitting on my grandfather’s knee, as Grandpa read to me from Ralph’s book “The Child’s Guide to Ediscovery”. Ralph … being a techie … had this amazing pop-up of the EDRM. Imagine a child’s delight as George Socha and Tom Gelbmann sang their way through the model. Their doo-wop song “Dedup Dedup” still rings in my ear. Yes, it is an old book (George has a full head of hair).

Yep, as Sharon says, the TAR market is a gold rush and thars lots of gold in them thar hills.

Ok, in the grand world of technology this is pretty primitive stuff. It ain’t the Large Hadron Collider at CERN and it ain’t the new technology that allows a paralyzed woman’s thoughts guide her robot arm and it ain’t the sophisticated technology behind 3D printing.

But, hey. Humans are not exactly known for their predictive skills if one believes Daniel Kahneman’s argument in Thinking, Fast and Slow or Nicholas Taleb’s assertions in his new book Antifragile: Things That Gain from Disorder. Document review processes including advanced computer analytics can produce more accurate results than reviews using only keyword search and human review. Proven.

Why are lawyers reluctant to embrace it? Use it? My neuroscience professors tell me lawyers have an “overresponsive” amygdala. Our brains are wired to look for negative information. The amygdala is the danger center. Our senses are routed through it before they get to the cortex. When we heard a rustle in the branches, we thought “tiger”, not “wind”. That’s why, in the news, if it bleeds it leads. We are a tough creature who has traveled here by a very long road. Our nature has been shaped by many millions of years of struggle, fear, and pain.

I will discuss this at greater next month in my paper “Contract attorneys, technology assisted review … and chocolate cake(?): the neuroscience of the document review room” (I am consulting Sharon for a more pithy title).

For a very good overview on TAR here is a video I did with David Horrigan of 451 Research earlier this year at LegalTech:

And to really learn this stuff go to Rob Robinson’s Complex Discovery website (click here) where he has amassed scores of links on TAR.

And lest we forget, the visual representation of all this data has gone through a number of phases, with its goals switching back and forth between analysis and presentation over time. There has been some great work done by Exterro, FTI Technology, and StoredIQ which will the subject of a new post early next year.

ARTIFICIAL INTELLIGENCE

Boy, AI was all over the blogosphere this past year. It popped up at every neuroscience event I attended. And it still all remains a bit of a wonder.

The mind is synchronized, but no one knows exactly why. There is a brilliant series on AI, neuroscience and the mind running across Wired Magazine, the MIT AI Labs blog and the Cambridge University neuroscience sites which I will post next year. But in a nutshell:

Many computer scientists take it on faith that one day machines will become conscious. Led (somewhat) by Ray Kurzweil, proponents of the so-called “strong-AI school” believe that a sufficient number of digitally simulated neurons, running at a high enough speed, can awaken into awareness.

The human brain contains more cells than there are stars in our galaxy, and the most important cells are neurons, which are nerve cells responsible for transmitting and processing electro-chemical signals at up to 320 km/h. This chemical signalling occurs through synapses-specialised connections with other cells, like wires in a computer. Each cell can receive input from thousands of others, so a typical neuron can have up to ten thousand synapses-i.e., can communicate with up to ten thousand other neurons, muscle cells, and glands. Estimates suggest that adult humans have approximately 100 billion neurons in their brain, but unlike most cells, neurons don’t undergo cell division, so if they’re damaged they don’t grow back-except, apparently, in the hippocampus (associated with memory) and the olfactory bulb (associated with sense of smell). The process by which this occurs is unclear, and this image was taken during a project to determine how neurons are born-it actually depicts newborn nerve cells in an adult mouse’s brain.

New York University neurologist E. Roy John has established that the hallmark of consciousness is a regular electrical oscillation, or gamma wave, readily detected by electrodes attached to the scalp. Wolf Singer and his colleagues at the Max Planck Institute for Brain Research in Frankfurt, Germany, confirmed that brain cells flicker in time with the gamma wave. This flickering takes place among widely dispersed neurons throughout the brain with no apparent spatial pattern. What keeps these ever-shifting, widely distributed groups of cells in sync? Neurochemical reactions take place too slowly to explain the phenomenon. This mystery alone seems to demand a wholesale rethinking of AI’s underpinnings.

And in a recent trip to the Swiss Artificial Intelligence Lab (IDSIA), I saw the team’s research work into artificial neural networks (NNs) which have won scores of international awards. They were the first to achieve human-competitive performance on various benchmark data sets. There is a very good post on this in Ray Kurzweil’s blog (click here) and Ray’s blog is really the “go to” blog to follow all of these AI developments. Given he is soon to become an employee of Google (click here) I can only hope the blog continues.

My bookshelves … and iPads … runneth over with this material so next year I shall publish an index on some of the key/most relevant material … call it “AI/neuroscience for dummies” … on my Tumblr blog.

Suffice it to say companies are into AI big time. For just one tiny example look at social media marketer Salorix. It has a product it calls “Amplify” which is a machine learning program that focuses on social networks. The purpose of Amplify is simple: it searches social media for conversations related to your business. But the program goes beyond just looking for people talking about you. It looks for conversations that are relevant to the products and services you provide – giving you an opportunity to market your product. The aim? Allows brands to build preapproved messages and target people that would be interested in them.

Amplify is built around different types of product verticals, in recognition of the way people talk about things. People talk different ways about different products. The way they talk about cars is different from the way they talk about insurance or electronics. Every industry is modeled to make it easier to figure out the tone of conversations and even identify sarcasm. Context is crucial, too. When people are talking about apples and oranges, the program figures out whether they’re talking about fruit or about tech companies. So ….. Amplify’s big kicker is its ability to identify influential people talking about conversations where their product is relevant – and, for example, send them a targeted tweet. The program even learns about what those influential people are interested in hearing or talking about, so that someone who cares about the style of their car but not its gas mileage aren’t going to get tweets about how fuel efficient a car is.

AUTONOMY AND HP

Briefly … ever so briefly … let me say I never bought into the rap of ediscovery pundits who, at the time, saw the deal as a “game changer”, that H-P had now become “next generation” and wrote that in my blog which you can access here, plus a follow-up last month which you can access here.

IBM AND STOREDIQ

No big surprise. A good fit. StoredIQ has always pitched itself as a Big Data play, its solutions giving businesses control over the growing volumes of unstructured data spread out over PCs, storage systems, and other hardware, both for management and regulatory purposes.

The deal comes as IBM tries to grow its storage software ops in the face of sputtering hardware sales. The acquisition will strengthen its information lifecycle governance business and thereby help companies to efficiently use and govern their unstructured data and mitigate unnecessary cost and risk.

StoredIQ has more than 120 customers across several domains including financial services, healthcare, government, manufacturing and other sectors.

The companies know themselves quite well. As an IBM PartnerWorld member, StoredIQ has leveraged that partnership to develop and market information management solutions in the IBM commercial marketplace. The StoredIQ solution is IBM “Ready for Tivoli” and IBM Information Archive certified. StoredIQ supports the complete retention and litigation hold capabilities of IBM’s archive class storage systems.

And it all reminds me of a conversation I had several years ago with Nick Patience … then a founder/principal of 451 Research, now with Recommind … who said that as the ediscovery/information ecosystem matured it will be natural for the goliaths to enter, if not dominate. It is a place they need to be. Just look what Google is doing on search (pick up the current issue of Wired which devotes a huge part of the mag to Google search) both legal and BI.

For IBM, a brilliant move. IBM has a unique business model that is very difficult to replicate. It offers the broadest platform for Big Data analytics. It’s success in offering integrated hardware/software/services solutions makes it a lot like Apple. And for investors (I have held the stock for years) the fact that 60% of its profits come from recurring streams provides predictability.

Full disclosure: IBM was my first IP client when I worked on Wall Street and I have stayed in touch, especially at conferences like MWC. And a great annual read is the IBM 5 in 5 … IBM’s list of innovations that Big Blue bets have the potential to change the way we work, live and interact over the next five years.

DATA PRIVACY

Privacy. I remember those days. People still think it exists. How quaint. And it is not just that the U.S. government has abolished it (read here and here) but that the technology has simply made it impossible. The selling and trading of data is a multi-billion dollar industry. There is too much money at stake for participants to allow that to change. I became very attuned to data as the new “asset class” when I attended Davos last year.

I am sure you have seen this:

But in today’s hyper-connected, always-on society, the whole notion of isolation is quickly becoming obsolete. That organizations can tell whether or not someone is pregnant based on their buying habits is well-covered territory. In the U.S., the idea that organizations are always watching and learning from their customers is now just a part of life. Just read this recent Wall Street Journal article (click here).

Big data makes it possible for analytics run against aggregated customer data to potentially be reverse engineered to reveal potentially identifiable information. The process of sifting through and analyzing structured and unstructured data to gain new insights about customers is hardly new. Big data represents the evolution of the technology that allows you to perform these tasks in real time across massively distributed platforms and retain the data far longer.

And the longer the data is retained, the greater the risk of private or personally identifiable information – names, social security numbers, addresses, driver’s license numbers – being leaked or stolen.

But go to any data privacy conference these days and you all have is lawyers and policy wonks discussing the political, legal, sociological, and psychological issues. There is never a technical/scientific speaker to discuss how technology has simply eroded our privacy … some of it willingly. In Europe the privacy and data protection commissioners and mandarins are slowly realizing their ambitious plans need to be reworked. One EU exception: ENISA, which is the European Network and Information Security Agency. I attended one of their conferences this year and they discussed “the reality” of what can and cannot be done given their attendance at numerous scientific events dedicated to privacy and privacy eroding technologies.

And when it comes to the cloud … yikes! Best story I have is from an RSA Conference earlier this year. The speaker was the head of a state’s law enforcement coordinating unit talking about the use of the cloud to monitor/track the location of state police cars and local police cars so as to best respond to “situations”. Great idea. But a problem. Turns out the system was easily hackable allowing just about anybody to know exactly where the police were deployed. Oops. Just maybe helpful to criminals? Yes, cloud security is getting better but as this speaker says “the rush to get this cloud technology out the door, to impetus to secure very lucrative contracts … well, we need more due diligence”.

BIG DATA

I bought Rick Smolan’s new book The Human Face of Big Data which looks at how humanity is impacted by the unparalleled ways we can now collect, analyze, and use data. I downloaded the app, subscribed to the tracking feature, etc. It is great fun.

Without a doubt, more things can be quantified than ever before. The myriad ways that benefits society is only hinted at in Smolan’s book. With the wealth of data we can now collect and analyze in increasingly sophisticated ways, we have only scratched the surface as to the vast number of advances we might find.

However, in any era with rapid technological change, it’s easy to start slipping into what has been termed “technological determinism,” to start speaking of the technology as if it drives culture and humanity, rather than thinking of technology as a tool.

Its benefits in the health/medicine field are dramatic as I recently noted in a post.

But there is a significant gulf between collecting Big Data and being able to confidently act upon it. In that gulf you’ll find data scientists developing and refining models, identifying questions, and communicating the insights and predictions that emerge. And they are in short, although at a recent General Counsel roundtable in Brussels attendees spoke about “repurposing IT personnel” to fill the void.

Effectively conveying these conclusions requires more than simply plotting points on a graph or map: it’s a narrative process. Every year I go to Le Web (becoming quite the venue to attend for ediscovery vendors these last 2 years) and the emphasis among data scientists was that data science is about creating narratives. It is about creating analogies, about using complex data to tell stories, be it in a legal context, a business intelligence context, or a medical context. The value of data science lies in untangling subjective areas where ambiguity reigns. As DJ Patil (Chief Data Scientist in Residence at Greylock Partners) said at Lde Web ”everyone right now is regimented into this idea that a data scientist is a statistician and a math person, very cold, very regimented. Subjective areas are where data science shines. It allows us to ask questions. Data allows you to ask questions-it facilitates a conversation. The point is to have a debate.”

I will have more in my Le Web wrap up next month, but if you know the work of Hans Rosling and Eric Rodenbeck you know his drift. And get hold of the IDC/EMC Digital Universe study Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. It is chock full of … well, data … on the explosion of digital information, driven by social data, sensor data, and the Internet of Things.

There is a good section on on how, as the data deluge grows, so does the skills gap. This year’s Digital Universe study projects that by 2020, the number of servers will grow 10x and information managed by enterprise data centers will grow 14x, yet the number of IT professionals will grow by less than 1.5x, creating a huge technology skills gap. As Paul Barth showed in the Harvard Business Review article There’s No Panacea for the Big Data Talent Gap it continues to be difficult to find data scientists.

So a Merry Christmas and Happy New Year (I know, so socially incorrect!!) and my sincere thanks to our clients, sponsors, supporters and friends over the past year. Have a happy and safe holiday season.

Gregory P. Bufithis, Esq. Founder/CEO

The Project Counsel Group

Paradigm shifts: random thoughts on predictive coding, data privacy, IBM, neuroscience and other stuff as we close out the year

Leave a Comment

Recent Posts