Skip to content

Document Parser API

Functions

document_parser(request)

Point d'entrée HTTP pour le parsing de documents.

For internal use only

curl -X POST "https://document-parser-41698605870.europe-west1.run.app" \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer <YOUR_TOKEN_HERE>" \
 -d '{"pdfurl": <PDF_URL>, "schema": "invoice", "key": <staff_key>, "additional_fields": {"added_field1": "str"}}'

For public commercial use

curl -X POST "https://document-parser-41698605870.europe-west1.run.app" \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer <YOUR_TOKEN_HERE>" \
 -d '{"bucket": <BUCKET_NAME>, "object": "document_id/filename.pdf", "schema": "invoice", "additional_fields": {"added_field1": "str"}}'
Allowed structured output JSON are
  • "invoice": For invoice key information extraction
  • None: To let the LLM output a structured JSON output with unconstrained generated keys.

Note that additional fields can be provided to add fields to an existing structured output schema.

EXAMPLE OUTPUT

Invoice case
    {
        'all_info': <KEY/VALUE JSON DICT WITH UNCONSTRAINED KEYS>,
        'document_info': {
            'currency': "USD",
            'due_date': 'Jul 05, 2023',
            'invoice_date': "'un 20, 2023',
            'invoice_id': "2-170-32026",
            'issuer_address': null,
            'issuer_name': 'FedEx',
            'issuer_siren': null,
            'issuer_tax_id': null,
            'line_items': [
                {
                    'description': 'FedEx Express Services',
                    'net_amount': 5304.48,
                    'quantity': 1,
                    'ref': null,
                    'tax_amount': 0,
                    'tax_rate': 0,
                    'total_amount': 5304.48,
                    'unit': null,
                    'unit_price': 5304.48
                }
            ],
            'receiver_address': '900 AMERICAN ROAD UNIT 4\nMORRIS PLAINS NJ 07950',
            'receiver_name': 'PDP COURIER',
            'receiver_siren': null,
            'receiver_tax_id': null,
            'total_amount': 5304.48,
            'total_net_amount': 5304.48,
            'total_tax_amount': 0,
            'added_field1': 'blabla'
        },
        'markdown': <markdown contained in document>,
        'bucket': <bucket_name_of_the_document>,
        'object': <object_name_in_the_gcp_bucket>,
        'cost': <llm_cumulative_estimate_cost>
    }