embeddings
embeddings
¶
Text embedding utilities for checklist refinement.
Uses OpenAI's text-embedding-3-large model for computing semantic similarity between checklist questions.
get_embeddings(texts, model='text-embedding-3-large', api_key=None)
¶
Get embeddings for a list of texts.
Uses OpenAI embeddings via direct API call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
List[str]
|
List of text strings to embed |
required |
model
|
str
|
Embedding model to use (default: text-embedding-3-large) |
'text-embedding-3-large'
|
api_key
|
Optional[str]
|
OpenAI API key (uses OPENAI_API_KEY env var if not provided) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
numpy array of shape (len(texts), embedding_dim) |
Source code in autochecklist/utils/embeddings.py
cosine_similarity(embeddings)
¶
Compute pairwise cosine similarity matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embeddings
|
ndarray
|
numpy array of shape (n, embedding_dim) |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
numpy array of shape (n, n) with cosine similarities |
Source code in autochecklist/utils/embeddings.py
find_similar_pairs(similarity_matrix, threshold=0.85)
¶
Find pairs of items with similarity above threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
similarity_matrix
|
ndarray
|
Pairwise similarity matrix |
required |
threshold
|
float
|
Minimum similarity to include |
0.85
|
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, float]]
|
List of (i, j, similarity) tuples for similar pairs |
Source code in autochecklist/utils/embeddings.py
build_similarity_graph(questions, threshold=0.85, embeddings=None, api_key=None)
¶
Build similarity graph from questions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
questions
|
List[str]
|
List of question strings |
required |
threshold
|
float
|
Similarity threshold for edges |
0.85
|
embeddings
|
Optional[ndarray]
|
Pre-computed embeddings (optional) |
None
|
api_key
|
Optional[str]
|
OpenAI API key for embeddings |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[ndarray, ndarray, List[Tuple[int, int, float]]]
|
Tuple of (embeddings, similarity_matrix, similar_pairs) |