Claude OCR Parser (Sonnet PDF Analyzer)
Functions
count_tokens_from_pdf_request(pdf_base64)
Counts tokens used for processing the provided PDF data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdf_base64
|
str
|
Base64-encoded PDF data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of input tokens used. |
Example
tokens = count_tokens_from_pdf_request(pdf_base64_data)
print(f"Input tokens: {tokens}")
estimate_price(input_tokens, output_tokens, model_name)
Estimates the API call cost based on the input and output tokens for the specified model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_tokens
|
int
|
Number of input tokens. |
required |
output_tokens
|
int
|
Number of output tokens. |
required |
model_name
|
str
|
Name of the model used for the request. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Estimated cost of the API call. |
Example
cost = estimate_price(1000, 500, "claude-3-5-sonnet")
print(f"Estimated cost: ${cost:.2f}")
extract_data_from_pdf(pdf_path, schema='invoice', prompt=None, model_name=MODEL_NAME, max_tokens=2048, verbose=False)
Extract structured information from a PDF file using Claude OCR API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdf_path
|
str
|
Path to the PDF file. |
required |
schema
|
str
|
Extraction model. Should be in config.ALLOWED_SCHEMA. Default is "invoice". |
'invoice'
|
prompt
|
Optional[str]
|
Custom prompt for extraction. If None, a default prompt is used. |
None
|
model_name
|
str
|
The model name to use for extraction. Default is Claude Sonnet. |
MODEL_NAME
|
max_tokens
|
int
|
Maximum tokens for the output. Default is 2048. |
2048
|
verbose
|
bool
|
If True, prints detailed prompt information. Default is False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseModel |
Structured response containing extracted information according to the specified schema. |
Example
result = extract_data_from_pdf(
pdf_path="invoice.pdf",
mode="custom-fedex-shipment",
verbose=True
)
print(result.model_dump_json(indent=2))