We're delighted to welcome Vincent Spinella-Mamo to the IPfolio editorial desk today. Vince earned a Doctor of Philosophy in Applied Physics from Georgetown University, followed by a J.D from George Washington University. He’s currently managing IP for one of Silicon Valley’s most intriguing stealth startups.
After his well-received presentation this summer at one of our IPforward dinners, we invited him to adapt it for our blog audience. We’re excited to share with you Vince’s tips on patenting algorithms and some of his most successful claiming strategies.
Neural Networks Inside Machine Learning
Constantly learning about new technologies is one of my favorite parts of life as a patent attorney. A recent focus has been learning about inventions that incorporate neural networks, which are a key foundation of machine learning. For the unfamiliar, neural networks are a subset of machine learning algorithms that are biologically inspired by neuronal connections in the brain.
As illustrated above, neural network nodes attempt to “activate” similar to what a neuron does to transmit information. A number of inputs (numerical values) may be introduced to a node and, based on the weighted sum of the inputs, the node may be “activated” using an activation function. This activation function generally scales the output from 0 to 1 and incorporates some threshold such that the node only outputs a 0 or 1. Examples of activation functions include the sigmoid function, Rectified Linear Unit (ReLU), and hyperbolic tangent.
During training, input (having an expected known output) is passed through the network and an error is calculated between the expected output and the output from the network. These errors are then “back propagated” through the network such that the weights at every node are adjusted to produce the correct output. Once trained, the network may be used to provide estimates of new input data, referred to as being in “inference” mode. By leveraging the highly parallel computing of Graphics Processing Units (GPUs), trained neural networks may provide an output in mere milliseconds
Architecture refers to the manner in which nodes are connected together. At a very high level, (see illustration below) inputs are introduced to a neural network at one or more input nodes (e.g. pixel data from an image) to provide outputs (e.g., prediction of a classification of an object in the image). The input and output layers may be separated by one or more hidden layers. The connections between the layers and number of nodes per layer are varied based on the problem. (BTW, a great guide for various connections can be found here).
Algorithms such as these are solving problems at scale in a variety of industries, from character recognition to stock trading and self-driving cars. Consequently, they’re valuable inventions to be protected. Based on my experience drafting machine learning patents and analyzing others in the space, here are some considerations you should think about:
§ 101 Considerations
In preparing and prosecuting patent applications relating to these neural networks, Alice is a vital starting point. To recap, Alice Corp. v. CLS Bank International (or simply Alice) is a 2014 Supreme Court case, which reminds us that patent eligibility rests on whether or not the invention claimed is directed to abstract idea and, if so, to what extent additional limitations steer the claimed invention back into patent eligible waters. Generally, the eligibility test is applied in two steps:
Step 1 – Is the claim at issue directed to a judicial exception, such as an abstract idea?
Step 2 – Do the claims contain an element or combination of elements to ensure that the patent in practice amounts to significantly more than a patent upon the [ineligible concept] itself?
Since the Supreme Court did not provide much guidance, applications of this “rule” at the USPTO have been highly varied. After the decision in Alice, Office Actions issued citing rejections under § 101 skyrocketed. Despite multiple guidelines from the USPTO, Examiners continue to reject claims that even hint at software under § 101.
Fortunately, there have been multiple cases since Alice that have provided techniques for overcoming these rejections. Here are some relevant quotations from post-Alice cases that are applicable to machine learning:
Enfish, LLC v. Microsoft Corporation – [Finding eligibility under Step 1]
“The specification’s disparagement of conventional data structures, combined with language describing the ‘present invention’ as including the features that make up a self-referential table, confirm that our characterization of the invention for purposes of the § 101 analysis has not been deceived by the ‘draftsman’s art.’” Further, “the claims are directed to a specific implementation of a solution to a problem in the software arts.” (Emphasis added). Additionally, “Claims ‘purporting to improve the functioning of the computer itself’ or ‘improving an existing technological process’ might not succumb to the abstract idea exception.”
McRO, Inc. v. Bandai Namco Games America, Inc. – [Finding eligibility under Step 1]
“We therefore look to whether the claims in these patents focus on a specific means or method that improves the relevant technology or are directed to a result or effect that itself is the abstract idea and merely invoke generic processes and machinery.” Further, that “[t]he specific structure of the claimed rules would prevent broad preemption…” (Emphasis added)
Bascom Global Internet Services, Inc. v. AT&T Mobility Corp. – [Finding eligibility under Step 2]
“The claims do not merely recite the abstract idea…Nor do the claims preempt all ways of filtering content…[T]he patent describes how its particular arrangement of elements is a technical improvement over prior art.” (Emphasis added)
Amdocs (Israel) Limited v. OpeNet Telecom, Inc. – [Finding eligibility under Step 2]
Found eligibility when the claim was “tied to a specific structure”, “narrowly drawn not to preempt,” and “purposefully arrange[d] the components in a distributed architecture to achieve a technological solution to a technological problem.” (Emphasis added)
Generally, these holdings have common elements of requiring a technical solution to a technical problem and avoidance of preemption. For those seeking additional arguments and resources for relevant case law post-Alice, Fenwick publishes a great tool, which can be found here.
Drafting the Specification in Light of Alice
"I try to focus the application on the specific technical obstacles and the technical solutions to such obstacles."
Of course, Alice is not unique to neural networks and machine learning algorithms. However, I’ve cited the above cases to emphasize elements which inform how I draft the specification for neural network-based applications. Generally, I try to focus the application on the specific technical obstacles and the technical solutions to such obstacles. Also, because neural networks running on GPUs can provide estimates in milliseconds once trained, use of such networks, in and of themselves, improve the functionality of a computer.
In addition to the speed increases, various neural networks have outperformed more traditional algorithms with respect to prediction accuracy (several examples in the computer vision space include semantic segmentation, classification, and depth estimation from two-dimensional images). Both the increase in speed and accuracy provide an avenue to disparage other methods. To that end, I’ve applied the following strategies with respect to the specification:
- Enumerate various technical methods for accomplishing a technical solution, e.g. by alluding to the fact that such a technical solution may not be possible without the use of such a neural network
- Describe how technical solutions provide improvements to a computer (e.g. “such a neural network may increase performance of a computer by requiring less memory, providing faster response times, etc.”)
- To the extent possible, disparage other methods
Drafting the Claims in Light of Alice
Arguably, the easier of the two prongs of Alice is the first. In my opinion, it’s easier to explain to an Examiner that the claim is not directed to an abstract idea, than to argue that it has “significantly more.” To that end, I’ve incorporated several strategies in claim drafting with Alice in mind:
- Include limitations of tangible sources of data (e.g. physical sensors, data derived from physical sensors, etc.)
- Include limitations to control of tangible objects (e.g. control signals to an autonomous vehicle)
- Add sufficient technical limitations
§ 112 Considerations
As if the ambiguity in Alice wasn’t enough, the other important case to consider when patenting algorithms and drafting neural network applications is Williamson v Citrix Online, LLC. In Williamson, the CAFC overturned the strong presumption that claims that explicitly lack the term “means for” should not be interpreted under § 112, ¶ 6. Importantly, the court found that “§ 112, ¶ 6 will apply if the challenger demonstrates that the claim term fails to “recite sufficiently definite structure” or else recites “function without reciting sufficient structure for performing that function.”
Particularly with respect to the claim in Williamson, the court noted that, “[w]hile portions of the claim do describe certain inputs and outputs at a very high level…the claim does not describe how [they] interact with other components…” From that perspective, care should be given when claiming inventions related to neural networks. I’ve listed several strategies I use for ensuring proper structure below.
Claiming Strategies: Tight Structure
In a recent Law 360 article, it was suggested that to provide the structural limitations to avoid interpretation under §112, ¶ 6 it should be possible to particularly define the nodes themselves, as well as the connections in between nodes in a network. I refer to this as a “tight structure” claiming strategy, as it explicitly defines memory locations and connections. The article provides the following claim as an example:
1. A classification system, comprising:
a processor configured to execute instructions programmed using a predefined set of machine codes; and an ANN comprising:
first, second and third input nodes, wherein each input node includes a memory location for storing an input value; first, second, third and fourth hidden nodes, wherein each hidden node is connected to each input node and includes computational instructions, implemented in machine codes of the processor, for computing first, second, third and fourth output numbers, respectively; and first and second output nodes, wherein the first output node includes a memory location for storing a first output signal indicative of a first classification, and the second output node includes a memory location for storing a second output signal indicative of a second classification
While such a claiming strategy should avoid any issues under §112, the claim itself is very narrow. In my experience, most engineers and developers I speak with state that there is no singular way to determine connections between layers and, in some extreme cases, which layer to use and in what order. From this perspective, such a tight claiming structure would render the claim easily designed-around. Despite these shortcomings, there may be other reasons to prefer a tight structure for claiming a neural network. For example, if a simplistic neural network (i.e. one with few layers and/or connections) provides a unique advantage, it may be desireable to have an independent claim which claims the nodes and connections.
Claiming Strategies: Loose Structure
As opposed to the tight claiming strategy, I’ve prefered to use what I refer to as a loose structure. While the tight structure generally contemplates enumerating specific nodes and connections therebetween, the loose structure contemplates enumeration of certain required layers and a relative configuration of the layers with respect to one another, generally. Such a claiming strategy should overcome any §112 issues while preserving the flexibility of the claim to cover various architectures.
I actually find the loose structure preferable over the tight structure for many reasons. First and foremost, the loose structure covers various architectures. Further, as neural networks grow in complexity (sometimes with over one hundred layers), a tight structure may be virtually impossible to claim.
An example of the “loose structure” for claiming a neural network is illustrated in the following patent application from Google concept (US 2015/0340034).
1. A method comprising: receiving an audio input; processing the audio input using an acoustic model to generate a respective phoneme score for each of a plurality of phoneme label sequences; processing one or more of the phoneme scores using an inverse pronunciation model to generate a respective grapheme score for each of a plurality of grapheme label sequences; and processing one or more of the grapheme scores using a language model to generate a respective text label score for each of a plurality of text label sequences, wherein each text label score reflects a likelihood that the audio input is represented by the corresponding text label sequence.
2. The method of claim 1, wherein the acoustic model comprises one or more long short term memory (LSTM) blocks that receive the audio input and generate an output from the input and one or more connectionist temporal classification (CTC) layers that receive the output from the LSTM blocks and transform the output into the phoneme scores.
Such an overall network architecture can be seen in the figure above. Particularly, the claim indicates several required layers and a general configuration of the layers, but does not explicitly claim connections between nodes themselves. Importantly, the claim contemplates multiple LSTM blocks that can be arranged in series or parallel before sending the output to the CTC layers. These additional implementations would not be covered by the claim language of the tight structure, as above.
Claiming Strategies: Dependent Limitation
Another technique, with which I’ve had success, is where the neural network itself is immaterial to the novel aspect of the invention. For example, maybe there’s a particular sensor or hardware configuration that uses a neural network as one component, but does not derive its novelty from the neural network, per se. In such examples, I’ve found that it suffices to merely reference a broad machine learning algorithm in an independent claim and specify that it is an artificial neural network in a dependent claim. An example of such a claiming practice is illustrated in Microsoft’s issued patent below (Patent No. US 7,499,588):
1. An optical character recognition system that facilitates text recognition on a low resolution image, comprising: at least one processor that executes:
- a layout analysis component that determines a set of text lines in the low resolution image, the layout analysis component further segments each text line in the set of lines into individual text words, wherein the layout analysis employs at least two linear filters at each location of the low resolution image;
- a character recognition component that segments the individual text words into one or more character portions and provides an observation on the most probable character for each of the one or more character portions; and
- a word recognizer that employs dynamic programming mechanisms to ascertain words based upon a series of observations from the character recognition component.
11. The system of claim 1, the character recognition component employs a convolutional neural network to provide the observation.
Learning about these neural networks and prosecuting their related applications has been challenging and fun. I hope the techniques above are of some use to those patenting algorithms and drafting neural network applications.
For a solid non-expert education in machine learning, I highly recommend the Coursera course taught by Andrew Ng, an adjunct professor at Stanford who was formerly head of Baidu’s AI Group. I found the instruction and material very easy to follow. What I learned provided much insight into these algorithms.