Do you need to file an issue?
Describe the issue
My 'documents.parquet' does not include a 'metadata' column.
Using the documentation it mentions specifying "metadata" for the input section, this can be used for JSON and CSV.
https://microsoft.github.io/graphrag/index/inputs/#metadata
The graphrag-input config does not include the 'metadata' field
https://github.com/microsoft/graphrag/blob/v3.1.0/packages/graphrag-input/graphrag_input/input_config.py
However passing in the list of metadata to the chunking config works "ok" - the metadata fields are pre-set in the text field.
Logs and screenshots
No metadata in the docs parquet

Metadata listed in the text column for the chunk

Steps to reproduce
Using the Config Classes specify metadata on the input config.
Run the graphrag pipeline - review the documents parquet for metadata column
GraphRAG Config Used
exchangeindexCalendarConfig = GraphRagConfig(
completion_models={"default_completion_model": completion_model_config},
embedding_models={"default_embedding_model": embedding_model_config},
input_storage=StorageConfig(base_dir=f"{str(temp_input_path)}"),
input=InputConfig(type="json", title_column="subject", text_column="displayTo", metadata={"parentFolderId": "str", "displayTo": "str"}),
chunking=ChunkingConfig(type="tokens", size=100, overlap=50, prepend_metadata=config.exchangeIndexCalendarMetadata),
cache=cache_config,
vector_store=vector_store_config,
output_storage=output_storage_config,
exchangeindexCalendarConfig = GraphRagConfig(
completion_models={"default_completion_model": completion_model_config},
embedding_models={"default_embedding_model": embedding_model_config},
input_storage=StorageConfig(base_dir=f"{str(temp_input_path)}"),
input=InputConfig(type="json", title_column="subject", text_column="displayTo", metadata=config.exchangeIndexCalendarMetadata),
chunking=ChunkingConfig(type="tokens", size=100, overlap=50, prepend_metadata=config.exchangeIndexCalendarMetadata),
cache=cache_config,
vector_store=vector_store_config,
output_storage=output_storage_config,
update_output_storage=StorageConfig(base_dir="/home/kevaughn/privateBranch/graphragData")
)
### Additional Information
graphrag 3.1.0
graphrag-cache 3.1.0
graphrag-chunking 3.1.0
graphrag-common 3.1.0
graphrag-input 3.1.0
graphrag-llm 3.1.0
graphrag-storage 3.1.0
graphrag-vectors 3.1.0
Do you need to file an issue?
Describe the issue
My 'documents.parquet' does not include a 'metadata' column.
Using the documentation it mentions specifying "metadata" for the input section, this can be used for JSON and CSV.
https://microsoft.github.io/graphrag/index/inputs/#metadata
The graphrag-input config does not include the 'metadata' field
https://github.com/microsoft/graphrag/blob/v3.1.0/packages/graphrag-input/graphrag_input/input_config.py
However passing in the list of metadata to the chunking config works "ok" - the metadata fields are pre-set in the text field.
Logs and screenshots
No metadata in the docs parquet

Metadata listed in the text column for the chunk

Steps to reproduce
Using the Config Classes specify metadata on the input config.
Run the graphrag pipeline - review the documents parquet for metadata column
GraphRAG Config Used