Custom AI classification in practice – webinar Q&A follow-up
During our recent Custom AI Classifier webinar with Global Prior Art, we had an incredibly engaged audience who asked a wide range of insightful questions throughout the session. While we managed to address many questions during the webinar, we weren’t able to get to them all. To ensure everyone has access to the information, we’ve compiled a comprehensive Q&A with detailed answers to the remaining questions. A big thank you to everyone who joined and contributed to making the session so interactive!

Q: How does IPRally's AI classifier approach differ from official classification systems?
The similarities end with the name. IPC/CPC classifications are assigned by human examiners at patent offices and are subject to human error. They may also not align with how you define a specific technical field.
IPRally's AI classifier offers a fully customizable approach. It allows you to train a classifier using your own historical classification data, enabling the system to view the technology space from your perspective. With this flexibility, you can apply your own taxonomy, independently of official classification systems, language or technical jargon. Powered by IPRally's Graph AI, the classifier focuses on the technology, its elements, and their relationships, ensuring it aligns with the patterns in your training data. Ultimately, you define the classes that matter most to you.
Q: What are best practices for training data?
A: To train a classifier, you must specify the types of patents you want it to recognize and those that do not match your class. This requires a curated collection of patents, where relevant documents are tagged to represent the desired technology, while negative examples help define what falls outside the scope.
A strong training set consists of well-representative documents that closely match the target technology, with minimal outliers. This helps the classifier systematically and accurately assign technology tags within the specified area.
The more data that is fed into the classifier, the better it becomes at predicting tags for new patents. As a general guideline, a good training set should include at least 200 documents, with each class containing between 200 and 1,000 samples. However, the ideal size may vary depending on the provided material.
Once the training set is established and the classifier is created, you can still make adjustments. You can modify the classification sensitivity or add and remove documents from the training set as needed. Additionally, you can enable automatic re-training, which updates the classifier each time you confirm or reject predicted tags. If this is done within a search case, the documents are automatically added to the collection.
How is the classifier’s efficiency measured?
After training, IPRally allows you to adjust the classifier’s sensitivity, determining how generous or strict the classifier should be in assigning tags. This sensitivity adjustment allows users to tailor the classifier’s behavior to the specific needs of the project, whether they want a more inclusive classifier or one that predicts more confidently but less broadly.
The F1 score measures the classifier’s performance. A low F1 score suggests random or unreliable predictions, while a high score indicates greater accuracy.
You can improve the classifier by refining the training set – adding more relevant documents and removing outliers that might introduce noise or cause incorrect predictions.
Precision and Recall evaluate how well the classifier performs on the training set. Precision indicates the proportion of predicted tags that are actually correct. Recall measures the proportion of actual tags that the classifier successfully identifies based on the current settings.
Increasing classification sensitivity improves the chances of retrieving relevant results, but it may also introduce more noise. Finding the right balance is key to optimizing performance.
Q: Does the AI classifier analyze specific sections of a patent, such as claims, or the entire document?
A: You can classify documents based on either full specification or claims only based on your requirements. If you choose claims only, the classifier focuses on matching document features specifically mentioned in the claims, rather than considering the entire document content.
Q: What challenges arise when training an AI classifier in emerging technology areas with limited prior art?
A: Training an AI classifier in a domain with limited prior art presents challenges due to the lack of sufficient examples for training. AI classifiers rely on patterns within the data, so without a substantial number of relevant patents, they may struggle to accurately predict the right tags. In these cases, manual tagging and expert input become essential to label the initial dataset.
For technology areas with minimal prior art, it's advised to combine expert insights with any available data to manually label at least 10+ positive and 10+ negative examples for each tag. This is crucial for training a classifier that delivers meaningful results, as the classifier needs to learn from real, human-labeled examples.
Q: What distinguishes a database lookup from a manual review in patent analysis?
A: A database lookup offers an automated, broad view of a patent space by retrieving patent counts based on search criteria. However, it often doesn't thoroughly assess the relevance of each result, potentially leading to inaccuracies. In contrast, a manual review is a more comprehensive process, where each patent is examined to ensure its relevance and accuracy. This step ensures that only the most pertinent patents are selected for further analysis.
While database lookups can give a quick overview, manual review ensures precision by applying subject-matter expertise to each document, making it especially important in large or complex patent landscapes. For instance, when using tools like IPRally's AI classifier, a manual review can be incorporated to refine the search and verify predicted results from the AI classifier (with tags being confirmed or rejected). This human oversight ensures a high-quality dataset for further research.
Q: Can non-patent literature (NPL) be utilized in training the AI classifier?
A: No, NPL is not used in training the classifier, but it can be leveraged to inform the classifier's learning process. For example, NPL could be useful in helping build knowledge graphs that represent technological relationships, which are then fed into the AI classifier. This process allows the AI to better understand how different patents relate to one another and the broader technological landscape, improving its accuracy and predictive capabilities. The AI classifier can incorporate information from such graphs as training data to make more informed predictions.
Q: How prevalent are trade secrets in the semiconductor industry, and how do they impact patent landscape analyses?
A: In industries like semiconductors, trade secrets are often used as a critical component of an organization's IP strategy. Trade secrets are kept confidential and are not disclosed in the same way as patents. Therefore, trade secrets cannot be directly analyzed or included in a patent landscape, making it difficult to get a full picture of a company’s innovations.
However, organizations can leverage their internal knowledge of trade secrets to inform their patent and technology research. PRally’s AI classifier can help analyze available patent data, but human insight and expertise are crucial in adding context where trade secrets might be influencing the market or specific technological advancements.
Q: In what scenarios might AI classifiers face difficulties?
AI classifiers may struggle in low-volume prior art areas or highly specialized technologies with few publicly available patents. The classifier’s performance will depend heavily on the quality and quantity of labeled training data. In such cases, it may be beneficial to combine multiple classifiers to cover different technological areas and cross-check the results.
For example, if a technology has a smaller set of patents, the classifier may not have enough data to reliably predict accurate tags, especially if the technology is niche or emerging. It's essential to continuously update and retrain the classifier as more data becomes available to improve its performance.
A: Can the system classify pharmaceutical patent applications?
Yes, the IPRally system can classify pharmaceutical patents based on specific technology tags like compound patents, synthesis, crystal forms, and more. However, the accuracy of classification depends on the quality and representativeness of the training dataset. For optimal results, it is advised to provide a well-curated set of patents that cover the different aspects of pharmaceutical technologies, as this will allow the classifier to more effectively categorize new applications.
Q: How do you ensure accurate assignment of patents to applicants, especially for large corporations with multiple subsidiaries like Samsung?
A: For large corporations, especially those with multiple subsidiaries, accurate patent assignment requires meticulous tracking of patent ownership. At IPRally, we use corporate trees and detailed background research to map out the relationships between a company and its various subsidiaries or related entities. This ensures that the IP assets are attributed correctly and that no relevant patents are overlooked. The AI classifier can assist by flagging relevant patents, but human oversight remains critical for accurate attribution.
Q: Where can I find the webinar recording?
A: You can find the on-demand webinar here.
Can I try IPRally?
Yes, we’re happy to offer you 3 days of free access to IPRally, including the AI classifier. No costs, no commitments – just the opportunity to see how it can change the way you search and classify.