Introduction
Imagine asking your database to find "sentences about staying healthy on flights" and getting results that understand you're talking about hydration, movement, and energy management—without explicitly mentioning those exact words. This is the power of semantic search with vector embeddings, and Oracle 26ai makes it surprisingly straightforward to implement.
In this tutorial, I'll show you how to build a complete semantic search system using Oracle 26ai's AI Vector Search capabilities, integrated with Claude Desktop through the Model Context Protocol (MCP). We'll generate random sentences, create vector embeddings, and perform intelligent similarity searches that go far beyond simple keyword matching.
What we'll Build
By the end of this tutorial, we'll have:
• A working Oracle 26ai database connection via SQLcl MCP
• An ONNX embedding model loaded and ready to use
• A table with 20 sentences across four domains: Travel, Technology, Health, and Countries
• Vector embeddings for semantic search
• The ability to find similar sentences using natural language queries
Prerequisites
• Oracle 26ai database (Free tier works great!)
• Claude Desktop app
• SQLcl with MCP support
• The all-MiniLM-L12-v2 ONNX model in your DATA_PUMP_DIR (Free tier does not support DBMS_CLOUD package; therefore, I uploaded the model to the database directory)
Step 1: Configure MCP for SQLcl
First, we need to connect Claude Desktop to the Oracle Database using the MCP SQLcl server. This allows Claude to directly interact with your database through natural language.
MCP Configuration
Add this configuration to your Claude Desktop config file:
Location:
Windows: %APPDATA%\Claude\claude_desktop_config.json
Important Notes:
- Ensure SQLcl is in your system PATH
- Restart Claude Desktop after configuration
Creating a Named Connection
Before using MCP, create a saved connection in SQLcl:
sql /nolog
SQL> conn username@connection_string
SQL> save connection 26AIFreePDB-Dinusha
Once configured, Claude can connect to your database using: "connect to 26AIFreePDB-Dinusha"
Step 2: The Complete Project Requirements
Here's exactly what we'll build:
Project Goal:
Create 20 random sentences in the Oracle 26ai database and create an embedding for each record. Use the database connection – 26AIFreePDB-Dinusha.
A) Load the ONNX model ALL_MINILM_L12_V2 from DATA_PUMP_DIR directory.
B) Create table sentences_20 (id, sentence, embedding vector). Generate sentences with meaning related to Travelling, Technology, health, and countries.
C) Insert 20 random sentences (generated via PL/SQL word bank + DBMS_RANDOM)
D) Update embeddings for all rows using VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING sentence AS data)
Find similar sentences to the following:
1. While backpacking through Portugal, I took the early train to Porto and kept my itinerary flexible so I could chase sunsets along the coast.
2. To stay healthy on long flights, I drink extra water, walk the aisle every hour, and pack snacks that won't spike my energy.
3. A vector database paired with an AI model can turn messy text into embeddings that search feels instant and surprisingly accurate.
4. Japan and Canada both balance modern cities with vast nature, but the local customs and food make each trip feel completely different.
Step 3: Implementation
This is the prompt I gave in the Claude chat.
Below is the sequence of actions Claude carried out via MCP.
A) Load the ONNX Model
This loads the pre-trained all-MiniLM-L12-v2 model, which generates 384-dimensional vector embeddings optimized for semantic similarity.
B) Create the Table
The VECTOR datatype is new in Oracle 26ai and specifically designed for storing high-dimensional embeddings.
C) Generate Random Sentences with PL/SQL
Here's where it gets interesting. The model used PL/SQL word banks to generate meaningful sentences:
Sample code is:
DECLARE
TYPE t_strings IS TABLE OF VARCHAR2(200);
-- Travel vocabulary
travel_subjects t_strings := t_strings(
'Backpackers', 'Solo travelers', 'Digital nomads', 'Adventure seekers'
);
travel_verbs t_strings := t_strings(
'explore', 'discover', 'navigate', 'photograph', 'experience'
);
travel_objects t_strings := t_strings(
'ancient temples', 'coastal villages', 'mountain ranges', 'bustling markets'
);
-- Technology vocabulary
tech_subjects t_strings := t_strings(
'Machine learning models', 'Neural networks', 'Vector databases'
);
tech_verbs t_strings := t_strings(
'accelerate', 'optimize', 'process', 'generate', 'transform'
);
tech_objects t_strings := t_strings(
'semantic embeddings', 'natural language queries', 'real-time data'
);
-- Health vocabulary
health_subjects t_strings := t_strings(
'Regular exercise', 'Balanced nutrition', 'Quality sleep'
);
health_verbs t_strings := t_strings(
'improves', 'strengthens', 'reduces', 'enhances', 'supports'
);
health_objects t_strings := t_strings(
'cardiovascular health', 'immune function', 'mental clarity'
);
-- Countries
countries t_strings := t_strings(
'Japan', 'Canada', 'Portugal', 'Iceland', 'Norway', 'Thailand'
);
sentence_templates t_strings := t_strings(
'{travel_subject} {travel_verb} {travel_object} in {country}.',
'{tech_subject} {tech_verb} {tech_object} to enable faster results.',
'{health_subject} {health_verb} {health_object} when practiced consistently.'
);
v_sentence VARCHAR2(4000);
FUNCTION pick(arr t_strings) RETURN VARCHAR2 IS
BEGIN
RETURN arr(TRUNC(DBMS_RANDOM.VALUE(1, arr.COUNT + 1)));
END;
FUNCTION generate_sentence RETURN VARCHAR2 IS
template VARCHAR2(500);
result VARCHAR2(4000);
BEGIN
template := pick(sentence_templates);
result := template;
result := REPLACE(result, '{travel_subject}', pick(travel_subjects));
result := REPLACE(result, '{travel_verb}', pick(travel_verbs));
result := REPLACE(result, '{travel_object}', pick(travel_objects));
result := REPLACE(result, '{tech_subject}', pick(tech_subjects));
result := REPLACE(result, '{tech_verb}', pick(tech_verbs));
result := REPLACE(result, '{tech_object}', pick(tech_objects));
result := REPLACE(result, '{health_subject}', pick(health_subjects));
result := REPLACE(result, '{health_verb}', pick(health_verbs));
result := REPLACE(result, '{health_object}', pick(health_objects));
result := REPLACE(result, '{country}', pick(countries));
RETURN result;
END;
BEGIN
FOR i IN 1..20 LOOP
v_sentence := generate_sentence();
INSERT INTO sentences_20 (sentence) VALUES (v_sentence);
END LOOP;
COMMIT;
DBMS_OUTPUT.PUT_LINE('✓ Inserted 20 random sentences');
END;
/
This approach creates grammatically correct sentences with semantic meaning across four domains.
D) Generate Vector Embeddings
The VECTOR_EMBEDDING() function uses our loaded ONNX model to generate embeddings for each sentence. The ONNX model converts each sentence into a numeric vector where meaning is encoded in position and direction. Similar sentences land closer together in vector space, enabling semantic search.
Step 4: Performing Semantic Searches
Now comes the exciting part—searching by meaning rather than keywords. Here's how it searched for sentences similar to our four queries:
Query 1: Backpacking through Portugal
Query 2: Staying healthy on long flights
{
`sql`: `SELECT
ROUND(VECTOR_DISTANCE(
embedding,
VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'To stay healthy on long flights, I drink extra water, walk the aisle every hour, and pack snacks that won''t spike my energy.' AS data),
COSINE
), 4) AS distance,
id,
sentence
FROM sentences_20
ORDER BY distance
FETCH FIRST 3 ROWS ONLY`,
`model`: `Claude Sonnet 4.5`
}
Response
"DISTANCE","ID","SENTENCE"
0,5346,3,"To boosts metabolic rate, experts recommend Regular exercise along with regular checkups."
0,6426,5,"Hydration combined with proper rest can significantly boosts overall muscle recovery."
0,6452,6,"Balanced nutrition regulates cardiovascular health when practiced consistently over time."
Query 3: Vector database with AI model
{
`sql`: `SELECT
ROUND(VECTOR_DISTANCE(
embedding,
VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'A vector database paired with an AI model can turn messy text into embeddings that make search feel instant and surprisingly accurate.' AS data),
COSINE
), 4) AS distance,
id,
sentence
FROM sentences_20
ORDER BY distance
FETCH FIRST 3 ROWS ONLY`,
`model`: `Claude Sonnet 4.5`
}
Response
"DISTANCE","ID","SENTENCE"
0,4169,14,"AI algorithms use semantic embeddings to compress information in milliseconds."
0,4492,20,"AI algorithms analyze similarity scores to enable faster and more accurate results."
0,4666,18,"Neural networks use unstructured text to index information in milliseconds."
Query 4: Japan and Canada comparison
{
`sql`: `SELECT
ROUND(VECTOR_DISTANCE(
embedding,
VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'Japan and Canada both balance modern cities with vast nature, but the local customs and food make each trip feel completely different.' AS data),
COSINE
), 4) AS distance,
id,
sentence
FROM sentences_20
ORDER BY distance
FETCH FIRST 3 ROWS ONLY`,
`model`: `Claude Sonnet 4.5`
}
Response
"DISTANCE","ID","SENTENCE"
0,5771,9,"Cruise passengers sail along historic districts in Japan while they seek authentic experiences."
0,5858,7,"Both Thailand and Thailand provide mountain ranges but differ in cultural traditions."
0,5873,19,"Budget tourists sail along local festivals in Canada because they seek authentic experiences."
Similarity Search Results
Key Insights
1. Technology Domain Excels Tech queries achieved the lowest distances (0.41-0.47), likely due to:
• Specialized technical vocabulary
• Precise conceptual overlap (embeddings, AI, search)
• Less ambiguous semantic space
2. Consistent Health Matching Health queries (0.53-0.65) successfully identified:
• Proactive health strategies
• Expert recommendations
• Specific health practices (hydration, nutrition, exercise)
3. Travel Theme Recognition Travel queries (0.54-0.63) matched on:
• Independent exploration mindset
• Transportation methods
• Authentic experience seeking
• Geographic elements
4. Country + Culture Linking Country comparison queries (0.57-0.59) found:
• Exact country name matches (Japan, Canada)
• Comparative structures
• Cultural tradition themes
• Local experience focus
What Makes This Powerful?
Why This Matters?
Traditional keyword search would have completely missed these connections.
Consider:
• The Portugal query doesn't appear in any result, yet we found relevant travel experiences
• "Staying healthy on flights" matched hydration and exercise without mentioning "flights"
• The vector database query found results that understand the underlying concepts without exact word matches
This is semantic search in action—understanding meaning, not just matching strings.
MCP: The Secret Sauce
What makes this tutorial unique is the use of the Model Context Protocol (MCP). Instead of writing SQL scripts and running them manually, I simply told Claude:
"Create 20 random sentences in the Oracle 26ai database and create an embedding for each record..."
Claude, through MCP:
1. Connected to my Oracle database
2. Generated and executed the PL/SQL code
3. Loaded the ONNX model
4. Created the table
5. Generated random sentences
6. Created embeddings
7. Ran all four similarity searches
8. Analyzed and explained the results
This is the future of database development—conversational, intelligent, and remarkably efficient.
Conclusion
Oracle 26ai's AI Vector Search, combined with the power of MCP and Claude, makes building intelligent semantic search systems accessible to everyone. The combination of:
- Built-in vector datatypes
- ONNX model support
- Efficient similarity search
- Natural language development through MCP
...creates a development experience that feels like science fiction but is available today.
The randomly generated sentences with structured word banks created a diverse corpus that demonstrated Oracle 26ai's vector search capabilities across multiple domains. Even with template-generated content, vector embeddings identified meaningful semantic relationships based on conceptual overlap, structural similarity, and domain-specific vocabulary patterns.
Key Takeaway: Semantic search isn't just about finding documents—it's about understanding intent, context, and meaning. And with Oracle 26ai + MCP, it's never been easier to build.