Structured Data

Find structure in data

From the chatlas docs:

  1. Article summaries: Extract key points from lengthy reports or articles to create concise summaries for decision-makers.
  2. Entity recognition: Identify and extract entities such as names, dates, and locations from unstructured text to create structured datasets.
  3. Sentiment analysis: Extract sentiment scores and associated entities from customer reviews or social media posts to gain insights into public opinion.
  4. Classification: Classify text into predefined categories, such as spam detection or topic classification.
  5. Image/PDF input: Extract data from images or PDFs, such as tables or forms, to automate data entry processes.

Components

  • chatlas: Chat.extract_data() method
  • pydantic: data model from BaseModel, with optional Field descriptions

Simple example

import chatlas as ctl
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

chat = ctl.ChatOpenAI()
chat.extract_data(
  "My name is Susan and I'm 13 years old",
  data_model=Person,
)

output:

{"name": "Susan", "age": 13}

Add descriptions

  • Field(): add a description to the model
  • Type hint with None: to allow optional
import chatlas as ctl
from pydantic import BaseModel, Field


class Person(BaseModel):
    """A person"""

    name: str = Field(description="Name")
    age: int = Field(description="Age, in years")
    hobbies: list[str] | None = Field(
        description="List of hobbies. Should be exclusive and brief."
    )

chat = ctl.ChatAnthropic() # changed to Anthropic
chat.extract_data(
  "My name is Susan and I'm 13 years old",
  data_model=Person,
)

Article summary

Demo Chatlas docs: https://posit-dev.github.io/chatlas/structured-data/article-summary.html

Entity recognition

  • If you want a pandas dataframe as an output
    • Tou need to create the row-wise spec of data
    • Then create a list of your row data

Demo Chatlas docs: https://posit-dev.github.io/chatlas/structured-data/entity-recognition.html

Sentiment analysis

Demo Chatlas docs: https://posit-dev.github.io/chatlas/structured-data/sentiment-analysis.html

Classification

Demo Chatlas Docs: https://posit-dev.github.io/chatlas/structured-data/classification.html

Multi-modal input

  • Images
  • PDFs

Demo Chatlas Docs: https://posit-dev.github.io/chatlas/structured-data/multi-modal.html

Multi-modal input: Images

Demo: https://github.com/chendaniely/nydsaic2025-llm/blob/main/code/04-structured/02-image.py

Multi-modal input: PDF

Demo: https://github.com/chendaniely/nydsaic2025-llm/blob/main/code/04-structured/03-pdf.py