Skip to content

Cannot Insert data to table with a partitions in Spark in EMR #4212

@Lior-AI

Description

@Lior-AI

After 42f6982
I have successfully created a table with partitions, but when I trying insert data the job end with a success
but the segment is marked as "Marked for Delete"

I am running:

CREATE TABLE lior_carbon_tests.mark_for_del_bug(
timestamp string,
name string
)
STORED AS carbondata
PARTITIONED BY (dt string, hr string)
INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
select * from lior_carbon_tests.mark_for_del_bug

gives

+---------+----+---+---+
|timestamp|name| dt| hr|
+---------+----+---+---+
+---------+----+---+---+

And

show segments for TABLE lior_carbon_tests.mark_for_del_bug

gives

+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|ID |Status           |Load Start Time        |Load Time Taken|Partition|Data Size|Index Size|File Format|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+
|0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S        |NA       |NA       |NA        |columnar_v3|
+---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+

I took a looking at the folder structure in S3 and it seems fine

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions