Document Parser API
Functions
document_parser(request)
Point d'entrée HTTP pour le parsing de documents.
For internal use only
curl -X POST "https://document-parser-41698605870.europe-west1.run.app" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_TOKEN_HERE>" \
-d '{"pdfurl": <PDF_URL>, "schema": "invoice", "key": <staff_key>, "additional_fields": {"added_field1": "str"}}'
For public commercial use
curl -X POST "https://document-parser-41698605870.europe-west1.run.app" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR_TOKEN_HERE>" \
-d '{"bucket": <BUCKET_NAME>, "object": "document_id/filename.pdf", "schema": "invoice", "additional_fields": {"added_field1": "str"}}'
Allowed structured output JSON are
"invoice": For invoice key information extractionNone: To let the LLM output a structured JSON output with unconstrained generated keys.
Note that additional fields can be provided to add fields to an existing structured output schema.
EXAMPLE OUTPUT
Invoice case
{
'all_info': <KEY/VALUE JSON DICT WITH UNCONSTRAINED KEYS>,
'document_info': {
'currency': "USD",
'due_date': 'Jul 05, 2023',
'invoice_date': "'un 20, 2023',
'invoice_id': "2-170-32026",
'issuer_address': null,
'issuer_name': 'FedEx',
'issuer_siren': null,
'issuer_tax_id': null,
'line_items': [
{
'description': 'FedEx Express Services',
'net_amount': 5304.48,
'quantity': 1,
'ref': null,
'tax_amount': 0,
'tax_rate': 0,
'total_amount': 5304.48,
'unit': null,
'unit_price': 5304.48
}
],
'receiver_address': '900 AMERICAN ROAD UNIT 4\nMORRIS PLAINS NJ 07950',
'receiver_name': 'PDP COURIER',
'receiver_siren': null,
'receiver_tax_id': null,
'total_amount': 5304.48,
'total_net_amount': 5304.48,
'total_tax_amount': 0,
'added_field1': 'blabla'
},
'markdown': <markdown contained in document>,
'bucket': <bucket_name_of_the_document>,
'object': <object_name_in_the_gcp_bucket>,
'cost': <llm_cumulative_estimate_cost>
}