utils
utils
¶
Utility functions.
load_json(path)
¶
save_json(path, data, indent=2)
¶
load_jsonl(path)
¶
Load a JSONL file as a list of dictionaries.
Source code in autochecklist/utils/io.py
save_jsonl(path, data)
¶
Save a list of dictionaries to a JSONL file.
parse_checklist_items(response, max_items=None)
¶
Parse checklist items from LLM response.
Supports multiple formats: - [[item]] bracketed format (TICK, checklist-effectiveness-study) - Numbered list: "1. Question?" - Bulleted list: "- Question?" - RLCF weighted format: "- Question? (weight)"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
response
|
str
|
Raw LLM response text |
required |
max_items
|
int
|
Maximum items to return |
None
|
Returns:
| Type | Description |
|---|---|
List[ChecklistItem]
|
List of ChecklistItem objects |
Source code in autochecklist/utils/parsing.py
get_embeddings(texts, model='text-embedding-3-large', api_key=None)
¶
Get embeddings for a list of texts.
Uses OpenAI embeddings via direct API call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
List[str]
|
List of text strings to embed |
required |
model
|
str
|
Embedding model to use (default: text-embedding-3-large) |
'text-embedding-3-large'
|
api_key
|
Optional[str]
|
OpenAI API key (uses OPENAI_API_KEY env var if not provided) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
numpy array of shape (len(texts), embedding_dim) |
Source code in autochecklist/utils/embeddings.py
cosine_similarity(embeddings)
¶
Compute pairwise cosine similarity matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embeddings
|
ndarray
|
numpy array of shape (n, embedding_dim) |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
numpy array of shape (n, n) with cosine similarities |
Source code in autochecklist/utils/embeddings.py
find_similar_pairs(similarity_matrix, threshold=0.85)
¶
Find pairs of items with similarity above threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
similarity_matrix
|
ndarray
|
Pairwise similarity matrix |
required |
threshold
|
float
|
Minimum similarity to include |
0.85
|
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, float]]
|
List of (i, j, similarity) tuples for similar pairs |
Source code in autochecklist/utils/embeddings.py
build_similarity_graph(questions, threshold=0.85, embeddings=None, api_key=None)
¶
Build similarity graph from questions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
questions
|
List[str]
|
List of question strings |
required |
threshold
|
float
|
Similarity threshold for edges |
0.85
|
embeddings
|
Optional[ndarray]
|
Pre-computed embeddings (optional) |
None
|
api_key
|
Optional[str]
|
OpenAI API key for embeddings |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[ndarray, ndarray, List[Tuple[int, int, float]]]
|
Tuple of (embeddings, similarity_matrix, similar_pairs) |