UnstructuredPDFLoader
Overviewā
Unstructured supports a common interface for working with unstructured or semi-structured file formats, such as Markdown or PDF. LangChain's UnstructuredPDFLoader integrates with Unstructured to parse PDF documents into LangChain Document objects.
Please see this page for more information on installing system requirements.
Integration detailsā
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
UnstructuredPDFLoader | langchain_community | ā | ā | ā |
Loader featuresā
Source | Document Lazy Loading | Native Async Support |
---|---|---|
UnstructuredPDFLoader | ā | ā |
Setupā
Credentialsā
No credentials are needed to use this loader.
To enable automated tracing of your model calls, set your LangSmith API key:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installationā
Install langchain_community and unstructured.
%pip install -qU langchain-community unstructured
Initializationā
Now we can initialize our loader:
from langchain_community.document_loaders import UnstructuredPDFLoader
file_path = "./example_data/layout-parser-paper.pdf"
loader = UnstructuredPDFLoader(file_path)
API Reference:UnstructuredPDFLoader