Dattaraj Rao, Persistent Systems
Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
In recent years, knowledge graphs have become an important tool for organizing and accessing large volumes of enterprise data in diverse industries — from healthcare to industrial, to banking and insurance, to retail and more.
A knowledge graph is a graph-based database that represents knowledge in a structured and semantically rich format. This could be generated by extracting entities and relationships from structured or unstructured data, such as text from documents. A key requirement for maintaining data quality in a knowledge graph is to base it on standard ontology. Having a standardized ontology often involves the cost of incorporating this ontology in the software development cycle.
Organizations can take a systematic approach to generating a knowledge graph by first ingesting a standard ontology (like insurance risk) and using a large language model (LLM) like GPT-3 to create a script to generate and populate a graph database.
>>Don’t miss our newest special issue: Data centers in 2023: How to do more with less.<<
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
The second step is to use an LLM as an intermediate layer to take natural language text inputs and create queries on the graph to return knowledge. The creation and search queries can be customized to the platform in which the graph is stored — such as Neo4j, AWS Neptune or Azure Cosmos DB.
Combining ontology and natural language techniques
The approach outlined here combines ontology-driven and natural language-driven techniques to build a knowledge graph that can be easily queried and updated without extensive engineering efforts to build bespoke software. Below we provide an example of an insurance company, but the approach is universal.
The insurance industry is faced with many challenges, including the need to manage large amounts of data in a way that is both efficient and effective. Knowledge graphs provide a way to organize and access this data in a structured and semantically rich format. This can include nodes, edges and properties where nodes represent entities, edges represent relationships between entities and properties represent at-tributes of entities and relationships.
There are several benefits to using a knowledge graph in the insurance industry. First, it provides a way to organize and access data that is easy to query and update. Second, it provides a way to represent knowledge in a structured and semantically rich format, which makes it easier to analyze and interpret. Finally, it provides a way to integrate data from different sources, including structured and unstructured data.
Below is a 4 step approach. Let’s review each step in detail.
Step 1: Studying the ontology and identifying entities and relations
The first step in generating a knowledge graph is to study the relevant ontology and identify the entities and relationships that are relevant to the domain. An ontology is a formal representation of the knowledge in a domain, including the concepts, relations and constraints that define the domain. Insurance risk ontology defines the concepts and relationships that are relevant to the insurance domain, such as policy, risk and premium.
The ontology can be studied using various techniques including manual inspection and automated methods. Manual inspection involves reading the ontology documentation and identifying the relevant entities and relationships. Automated methods use natural language processing (NLP) techniques to extract the entities and relationships from the ontology documentation.
Once the relevant entities and relationships have been identified, they can be organized into a schema for the knowledge graph. The schema defines the structure of the graph, including the types of nodes and edges that will be used to represent the entities and relationships.
Step 2: Building a text prompt for LLM to generate schema and database for ontology
The second step in generating a knowledge graph involves building a text prompt for LLM to generate a schema and database for the ontology. The text prompt is a natural language description of the ontology and the desired schema and database structure. It serves as input to the LLM, which generates the Cypher query for creating and populating the graph database.
The text prompt should include a description of the ontology, the entities and relationships that were identified in step 1, and the desired schema and database structure. The description should be in natural language and should be easy for the LLM to understand. The text prompt should also include any constraints or requirements for the schema and database, such as data types, unique keys and foreign keys.
For example, a text prompt for the insurance risk ontology might look like this:
“Create a graph database for the insurance risk ontology. Each policy should have a unique ID and should be associated with one or more risks. Each risk should have a unique ID and should be associated with one or more premiums. Each premium should have a unique ID and should be associated with one or more policies and risks. The database should also include constraints to ensure data integrity, such as unique keys and foreign keys.”
Once the text prompt is ready, it can be used as input to the LLM to generate the Cypher query for creating and populating the graph database.
Step 3: Creating the query to generate data
The third step in generating a knowledge graph involves creating the Cypher query to generate data for the graph database. The query is generated using the text prompt that was created in step 2 and is used to create and populate the graph database with relevant data.
The Cypher query is a declarative language that is used to create and query graph databases. It includes commands to create nodes, edges, and relationships between them, as well as commands to query the data in the graph.
The text prompt created in step 2 serves as input to the LLM, which generates the Cypher query based on the desired schema and database structure. The LLM uses NLP techniques to understand the text prompt and generate the query.
The query should include commands to create nodes for each entity in the ontology and edges to represent the relationships between them. For example, in the insurance risk ontology, the query might include commands to create nodes for policies, risks and premiums, and edges to represent the relationships between them.
The query should also include constraints to ensure data integrity, such as unique keys and foreign keys. This will help to ensure that the data in the graph is consistent and accurate.
Once the query is generated, it can be executed to create and populate the graph database with relevant data.
Ingesting the query and creating a knowledge graph
The final step in generating a knowledge graph involves ingesting the Cypher query and creating a graph database. The query is generated using the text prompt created in step 2 and executed to create and populate the graph database with relevant data.
The database can then be used to query the data and extract knowledge. The graph database is created using a graph database management system (DBMS) like Neo4j. The Cypher query generated in step 3 is ingested into the DBMS, which creates the nodes and edges in the graph database.
Once the database is created, it can be queried using Cypher commands to extract knowledge. The LLM can also be used as an intermediate layer to take natural language text inputs and create Cypher queries on the graph to return knowledge. For example, a user might input a question like “Which policies have a high-risk rating?” and the LLM can generate a Cypher query to extract the relevant data from the graph.
The knowledge graph can also be updated as new data becomes available. The Cypher query can be modified to include new nodes and edges, and the updated query can be ingested into the graph database to add the new data.
Advantages of this approach
Ingesting a standard ontology like insurance risk ontology provides a framework for standardizing the representation of knowledge in the graph. This makes it easier to integrate data from different sources and ensures that the graph is semantically consistent. By using a standard ontology, the organization can ensure that the data in the knowledge graph is consistent and standardized. This makes it easier to integrate data from multiple sources and ensures that the data is comparable and meaningful.
Using GPT-3 to generate Cypher queries for creating and populating the graph database is an efficient way to automate the process. This reduces the time and resources required to build the graph and ensures that the queries are syntactically and semantically correct.
Using LLM as an intermediate layer to take natural language text inputs and create Cypher queries on the graph to return knowledge makes querying the graph more intuitive and user-friendly. This reduces the need for users to have a deep understanding of the graph structure and query language.
Traditionally, developing a knowledge graph involved custom software development, which can be time-consuming and expensive. With this approach, organizations can leverage existing ontologies and NLP tools to generate the query, reducing the need for custom software development.
Another advantage of this approach is the ability to update the knowledge graph as new data becomes available. The Cypher query can be modified to include new nodes and edges, and the updated query can be ingested into the graph database to add the new data. This makes it easier to maintain the knowledge graph and ensure that it remains up-to-date and relevant.
Dattaraj Rao is chief data scientist atPersistent.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even considercontributing an articleof your own!
Read More From DataDecisionMakers
Enterprise Knowledge Graph organizes siloed information into organizational knowledge, which involves consolidating, standardizing, and reconciling data in an efficient and useful way.Why are knowledge graphs key to working with data efficiently and powerfully? ›
Knowledge Graph technology means being able to connect different types of data in meaningful ways and supporting richer data services than most knowledge management systems. This information can then be used to extract and discover deeper and more subtle patterns.How can knowledge graph be used? ›
Knowledge graphs can help with, but not limited to, data governance, fraud detection, knowledge management, search, chatbot, recommendation, as well as intelligent systems across different organisational units.What is the use of knowledge graph in NLP? ›
A knowledge graph is a way of storing data that resulted from an information extraction task. Many basic implementations of knowledge graphs make use of a concept we call triple, that is a set of three items(a subject, a predicate and an object) that we can use to store information about something.How do you create a knowledge graph for a company? ›
- Step 1: Define Objectives. Before doing anything else, it's important to define the problems your knowledge graph is going to solve. ...
- Step 2: Engage Stakeholders. ...
- Step 3: Define Your Knowledge Domain. ...
- Step 4: Choose a Platform. ...
- Step 5: Building an Initial Framework.
A knowledge graph is made up of three main components: nodes, edges, and labels.How do graphs help to communicate information and make data easier to understand? ›
A graph is a visual representation of numerical data. Graphs provide a visual way to summarize complex data and to show the relationship between different variables or sets of data. Graphs are also an excellent way to demonstrate trends and relationships within the data.What is the advantage of using graphs to communicate your data? ›
One of the main advantages of using graphs and charts is that they can show complex data in a simple and concise way. They can help you highlight trends, patterns, relationships, comparisons, or contrasts that might be difficult to see or explain in text.How do graphs help make analyzing data easier? ›
They can make comparing multiple sets of data much easier, as sometimes trends and relationships are easily identified on the chart or graph. They also help to show the audience the data in a way that is easy to understand and remember.What is the best tool to create knowledge graph? ›
Tools for Creating Knowledge Graph
- Optimize your Structured Data Markup. ...
- Create and Optimize your Google My Business Listing. ...
- Fill Up your Information in Social Media. ...
- Set-Up or Optimize Your Wikipedia Page. ...
- Suggest an Edit to Google.
Knowledge graphs focus on data that is connected whilst relational databases focus on storing data without deep analysis. An easy example of using them together would be to use a relational database to store key information whilst using a knowledge graph for analysis.What is an example of a knowledge graph? ›
- A social network graph, which shows the relationships between people on a social networking site. ...
- A food web, which shows the relationships between different species of animals and plants in a given ecosystem.
What is a Knowledge Graph in Machine Learning? Knowledge graphs make it easier to feed better and richer data into ML algorithms. They do this by helping you leverage industry-standard models and ontologies, model your domain knowledge, and connect disparate data sources across the enterprise.How do you query a knowledge graph? ›
In simple words, a knowledge graph is a database in the form of connections between data points. The main attributes of the knowledge graph are nodes and edges. To create a knowledge graph from a text, you just need to know the basic grammar concepts like what is subject and object of a sentence, what is a verb, etc.How do you implement a graph in data structure? ›
- Add a node to the graph.
- Create an edge between any two nodes.
- Check if a node exists in the graph.
- Given a node, return it's neighbors.
- Return a list of all the nodes in the graph.
- Return a list of all edges in the graph.
If you want to show the relationship between values in your dataset, use a scatter plot, bubble chart, or line charts. If you want to compare values, use a pie chart — for relative comparison — or bar charts — for precise comparison. If you want to compare volumes, use an area chart or a bubble chart.Why is knowledge graph important? ›
In short, a Knowledge Graph puts your data into context via linking, as well as semantic metadata. Google started building and using its Knowledge Graph to empower its transition from strings to things. This has helped them shift away from only featuring text and allows them to better understand concepts.What is the most popular knowledge graph? ›
WordNet is one of the most popular and comprehensive lexical knowledge graphs for words in more than 200 languages. It provides definitions and synonyms to help users study the semantic relationships between words. WordNet is often used to improve the performance of NLP and search applications.What is the biggest knowledge graph? ›
Over 10 billion people, companies, products, articles, and discussions exist in the Diffbot Knowledge Graph — the largest in the world. If it's something you can find on a website somewhere, you'll find it (already clean and structured) in the Knowledge Graph.
Data visualization allows business users to gain insight into their vast amounts of data. It benefits them to recognize new patterns and errors in the data. Making sense of these patterns helps the users pay attention to areas that indicate red flags or progress. This process, in turn, drives the business ahead.What are the potential benefits of data visualization in big data analysis? ›
Big data visualization is the process by which large amounts of analyzed data are converted into an easy-to-comprehend visual format. By presenting complex data as graphs, charts, tables, diagrams, or other visuals, users are able to more-easily grasp the meanings behind the information, and do so quickly.Why is data visualization important in business? ›
Data visualization helps to reach decisions faster and enables viewers to glean far better insights about patterns and trends. With visualization, the benefits of data analytics are available to various roles throughout your organization, who may not be experts in the field.What are three benefits of graphic displays of data? ›
- Provide an immediate visual record of data.
- allow the ability to explore behavioral variations of data in realtime.
- serve as judgement aides that help interpret intervention results.
- allow for visual analysis.
- unbiased, independent judgement may be made from graphs.
Charts enable you to visually compare multiple sets of data. Charts can help people better understand and remember information. Many people understand a picture more quickly than blocks of text. A compelling chart can help you make your point more convincingly and lend credibility to your presentation.How can graphs help you solve problems? ›
The graph often served as a means of confirming a solution (what we may have already derived algebraically). In any case, whether the graphs give us finite solutions or not, they do provide us with a qualitative view of functions and serve to aide one in the process of solving real-world problems.What tools and techniques can you use to gather analyze and interpret data? ›
Data analysis tools include Excel, Python, R, Looker, Rapid Miner, Chartio, Metabase, Redash, and Microsoft Power BI. Data Interpretation: Now that you have your results, you need to interpret them and come up with the best courses of action based on your findings.What makes an effective graph data visualization? ›
Good data visualization should communicate a data set clearly and effectively by using graphics. The best visualizations make it easy to comprehend data at a glance. They take complex information and break it down in a way that makes it simple for the target audience to understand and on which to base their decisions.What is knowledge graph visualization? ›
Knowledge graph visualizations gives fraud investigators an intuitive interface to understand complex connections between people, accounts, transactions, and anything else they need to understand.What is the best tool for charts and graphs? ›
The best data visualization tools include Google Charts, Tableau, Grafana, Chartist. js, FusionCharts, Datawrapper, Infogram, ChartBlocks, and D3. js. The best tools offer a variety of visualization styles, are easy to use, and can handle large data sets.
The first step is creating a map between the local schema of the target data source and the reference ontology. The second step is materializing the source's data as KG statements or virtualizing the access to the source, defining a graph-based view over the legacy information.What are the five things every graph needs to be successful? ›
- Show the data clearly. Showing the data clearly includes ensuring the data points can be seen but also providing meaningful text on the graph itself. ...
- Use simplicity in design of the graph. ...
- Use alignment on a common scale. ...
- Keep the visual encoding transparent. ...
- Use standard forms that work.
- Training Courses and Workshops. A tried and true method of learning is taking training courses and workshops. ...
- Find a Mentor. ...
- Online Resources. ...
- Volunteering. ...
- Video Content. ...
- Webinars. ...
- Final Thoughts.
- A title which describes the experiment. ...
- The graph should fill the space allotted for the graph. ...
- Each axis should be labeled with the quantity being measured and the units of measurement. ...
- Each data point should be plotted in the proper position. ...
- A line of best fit.
Enterprise Knowledge Graph organizes siloed information into organizational knowledge, which involves consolidating, standardizing, and reconciling data in an efficient and useful way.What are the advantages and disadvantages of graph database? ›
|Query speed only dependent on the number of concrete relationships, and not on the amount of data||Difficult to scale, as designed as one-tier architecture|
|Results in real time||No uniform query language|
A knowledge graph works by showing the relationships between each statement in it. We call the statements (or the lines in the Excel table) “entities.” And that's where a knowledge graph reveals its smarts.What is the difference between database and knowledge graph? ›
Knowledge graphs focus on data that is connected whilst relational databases focus on storing data without deep analysis. An easy example of using them together would be to use a relational database to store key information whilst using a knowledge graph for analysis.What is the difference between virtual knowledge graph and knowledge graph? ›
A Knowledge Graph (KG) is, in our terminology, a graph using the RDF data model. A Virtual KG (VKG) is a virtual representation in RDF of non-RDF data, which is generally relational data. With a VKG, the data remains in the data sources in its original format but can be virtually represented as an RDF graph.What is difference between knowledge graph and knowledge panel? ›
We call these knowledge panels. They're designed to help you quickly understand more about a particular subject by surfacing key facts and to make it easier to explore a topic in more depth. Information within knowledge panels comes from our Knowledge Graph, which is like a giant virtual encyclopedia of facts.
Technologies for building knowledge graphs
Graph databases are purpose-built to store and navigate relationships. Graph databases make it easier to model and manage highly connected data, treat relationships as “first class citizens,” have flexible schemas, and provide higher performance for graph traversal queries.
In the US, the four main types of business entity are sole proprietorship, partnership, limited liability company (LLC) or corporation (whether designated as a C corporation or an S corporation).What is a knowledge graph example? ›
Some examples of knowledge graphs include the following: A social network graph, which shows the relationships between people on a social networking site. An example of a social networking graph would be a graph that shows the relationships between people on Facebook.What are three advantages of graph databases? ›
Some advantages of graph databases include: The structures are agile and flexible. The representation of relationships between entities is explicit. Queries output real-time results.Why should I use a graph database? ›
Graph databases are purpose-built to store and navigate relationships. Relationships are first-class citizens in graph databases, and most of the value of graph databases is derived from these relationships. Graph databases use nodes to store data entities, and edges to store relationships between entities.Does Google use a knowledge graph? ›
Google's search results sometimes show information that comes from our Knowledge Graph, our database of billions of facts about people, places, and things.