Skip to content

Commit 3e65e08

Browse files
committed
apply fix for byte buffer serialization
1 parent 4fb7518 commit 3e65e08

3 files changed

Lines changed: 63 additions & 4 deletions

File tree

README.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,56 @@
2020
Parquet Java (formerly Parquet MR) [![Build Status](https://github.com/apache/parquet-java/workflows/Test/badge.svg)](https://github.com/apache/parquet-java/actions)
2121
======
2222

23-
This repository contains a Java implementation of [Apache Parquet](https://parquet.apache.org/)
23+
This repository contains a **modified** Java implementation of [Apache Parquet](https://parquet.apache.org/). The changes
24+
in this version allow for the serialization of Java generic supertypes in a collection, without the correct type
25+
being lost on read.
26+
27+
See below for an example of this fix:
28+
```java
29+
class AbstractRecord<X> { TreeSet<X> recordSet; }
30+
31+
/**
32+
* The template type will be lost on write-out and deserialization will fail without this change
33+
* @param <Y> the concrete template type stored in {@link AbstractRecord#recordSet}
34+
*/
35+
class OutputRecord<Y> extends AbstractRecord<Y> {}
36+
```
37+
38+
### Releasing new versions
39+
40+
- Update main with the latest Parquet-java changes and rebase the forked changes
41+
- Ensure you have the upstream parquet fork as a git remote and fetch tags
42+
```shell
43+
git remote add fork-source https://github.com/apache/parquet-java
44+
git fetch --tags fork-source
45+
```
46+
- Check out a new release branch from the relevant avro release tag
47+
`git checkout -b release/1.0.0-1.15.0 apache-parquet-1.15.0`
48+
- Apply the most recent fork change to that branch
49+
`git cherry-pick <ref-from-main>`
50+
- Set the new project version. **If** adjusting the fork itself bump the base version (1.0.0)
51+
`mvn versions:set 1.0.0-1.15.0`
52+
- Deploy the final jars from `lang/java/avro`
53+
`mvn deploy -DskipTests -DaltDeploymentRepository=repository-id::repository-url`
54+
- Push the release branch to remote
55+
56+
---
57+
58+
<p align=center><ins><b>NOTICE</b></ins></p>
59+
60+
<p>This work was produced for the U.S. Government under Contract 693KA8-22-C-00001 and is subject to Federal Aviation Administration Acquisition Management System Clause 3.5-13, Rights In Data-General (Oct. 2014), Alt. III and Alt. IV (Oct. 2009).</p>
61+
62+
<p>The contents of this document reflect the views of the author and The MITRE Corporation and do not necessarily reflect the views of the Federal Aviation Administration (FAA) or the Department of Transportation (DOT). Neither the FAA nor the DOT makes any warranty or guarantee, expressed or implied, concerning the content or accuracy of these views.</p>
63+
64+
<p>For further information, please contact The MITRE Corporation, Contracts Management Office, 7515 Colshire Drive, McLean, VA 22102-7539, (703) 983-6000.</p>
65+
66+
<p align=center><ins><b>&copy; 2024 The MITRE Corporation. All Rights Reserved.</b></ins></p>
67+
68+
---
69+
70+
<p align=center>Approved for Public Release; Distribution Unlimited. Public Release Case Number 24-3517</p>
71+
72+
---
2473

2574
Apache Parquet is an open source, column-oriented data file format
2675
designed for efficient data storage and retrieval. It provides high

parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
import java.lang.reflect.InvocationTargetException;
3636
import java.lang.reflect.Method;
3737
import java.lang.reflect.Modifier;
38+
import java.nio.ByteBuffer;
3839
import java.util.ArrayList;
3940
import java.util.Collection;
4041
import java.util.HashMap;
@@ -141,6 +142,15 @@ public void add(Object value) {
141142
};
142143

143144
Class<?> fieldClass = fields.get(avroField.name());
145+
if ((null != fieldClass)
146+
&&
147+
/* Explicitly exclude ByteBuffers as parquet directly encodes them as byte[]s in the output data model - but the field class
148+
* for ByteBuffer is abstract - so if we don't exclude them here all ByteBuffer fields are reflectively populated with byte[]s */
149+
((Modifier.isAbstract(fieldClass.getModifiers()) && !fieldClass.isAssignableFrom(ByteBuffer.class))
150+
|| Modifier.isInterface(fieldClass.getModifiers())
151+
|| fieldClass.equals(Object.class))) {
152+
fieldClass = null;
153+
}
144154
converters[parquetFieldIndex] =
145155
newConverter(nonNullSchema, parquetField, this.model, fieldClass, container);
146156

pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818
<description>Parquet is a columnar storage format that supports nested data. This provides the java implementation.</description>
1919

2020
<scm>
21-
<connection>scm:git:git@github.com:apache/parquet-mr.git</connection>
22-
<url>scm:git:git@github.com:apache/parquet-mr.git</url>
23-
<developerConnection>scm:git:git@github.com:apache/parquet-mr.git</developerConnection>
21+
<connection>scm:git:https://github.com/mitre-public/parquet-java</connection>
22+
<developerConnection>scm:git:https://github.com/mitre-public/parquet-java</developerConnection>
23+
<url>scm:git:https://github.com/mitre-public/parquet-java</url>
2424
<tag>HEAD</tag>
2525
</scm>
2626

0 commit comments

Comments
 (0)