You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I would like to start by apologizing if there is already a bug report on this subject, but I couldn't find it.
Lets go, I have a .csv file (sds_teste.csv) that I am using as a seed, like this:
So, I execute the command dbt seed -s sds_teste and the dbt is successful executed
But, when i execute a select to see the table created by dbt seed command, I can see that the table cannot read the special characters (accented letters)
I already try some things that I found around de internet, like to pass the encoding: utf-8, but i not found nothing that working.
My profiles.yml
Expected Behavior
The expected behavir is that the dbt seed would can read a .csv file in encoding utf-8.
Should be: A text with special characters, like Ã, Á, Í, or Ç
Instead of: A text with special characters, like �, �, �, or �
Steps To Reproduce
1 - Install the python 3.11.9 in a windows computer
2 - Create a python environment with python venv
3 - Install dbt-core==1.8.7 and dbt-athena-community==1.8.4 by pip install
4 - Create a dbt project
5 - Create a .csv file in the folder seeds/ and write some example with special characters
6 - Configure the profile.yml to connect a AWS Athena(storage: AWS S3)
7 - Run the dbt seed command
Relevant log output
============================== 13:38:20.928977 | c0ac6503-3146-44d1-84fa-552929776d79 ==============================
�[0m13:38:20.928977 [info ] [MainThread]: Running with dbt=1.8.7
�[0m13:38:20.929979 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'fail_fast': 'False', 'profiles_dir': 'C:\\Users\\julio.padoveze\\.dbt', 'log_path': 'C:\\Users\\julio.padoveze\\github\\prd-Copastur-dbt-source-qv\\logs', 'version_check': 'True', 'debug': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'empty': 'None', 'log_format': 'default', 'introspect': 'True', 'invocation_command': 'dbt seed -s sds_teste', 'static_parser': 'True', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
�[0m13:38:21.527840 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020521A27B10>]}
�[0m13:38:21.561425 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x000002051CDCDE10>]}
�[0m13:38:21.563666 [info ] [MainThread]: Registered adapter: athena=1.8.4
�[0m13:38:21.575796 [debug] [MainThread]: checksum: 4af21dafb485259c48497ac86b711ddb1982f3d0f1c0ca4e09356de488b753c0, vars: {}, profile: , target: , version: 1.8.7
�[0m13:38:21.689602 [info ] [MainThread]: Unable to do partial parsing because of a version mismatch
�[0m13:38:21.690599 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020521C79010>]}
�[0m13:38:22.937805 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020522118250>]}
�[0m13:38:23.152658 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020522129ED0>]}
�[0m13:38:23.152658 [info ] [MainThread]: Found 17 models, 9 seeds, 96 sources, 568 macros
�[0m13:38:23.153659 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020521F006D0>]}
�[0m13:38:23.155683 [info ] [MainThread]:
�[0m13:38:23.157687 [debug] [MainThread]: Acquiring new athena connection 'master'
�[0m13:38:23.160683 [debug] [ThreadPool]: Acquiring new athena connection 'list_awsdatacatalog'
�[0m13:38:23.161689 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m13:38:24.834482 [debug] [ThreadPool]: On list_awsdatacatalog: Close
�[0m13:38:24.840486 [debug] [ThreadPool]: Acquiring new athena connection 'list_awsdatacatalog_db_dbt_validation'
�[0m13:38:24.840486 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m13:38:29.476313 [debug] [ThreadPool]: On list_awsdatacatalog_db_dbt_validation: Close
�[0m13:38:29.495682 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'c0ac6503-3146-44d1-84fa-552929776d79', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000020521B74F10>]}
�[0m13:38:29.496682 [info ] [MainThread]: Concurrency: 10 threads (target='dev')
�[0m13:38:29.498679 [info ] [MainThread]:
�[0m13:38:29.512664 [debug] [Thread-1 (]: Began running node seed.qv.sds_teste
�[0m13:38:29.514172 [info ] [Thread-1 (]: 1 of 1 START seed file db_dbt_validation.db_qvextracao__sds_teste .............. [RUN]
�[0m13:38:29.516618 [debug] [Thread-1 (]: Acquiring new athena connection 'seed.qv.sds_teste'
�[0m13:38:29.518105 [debug] [Thread-1 (]: Began compiling node seed.qv.sds_teste
�[0m13:38:29.519110 [debug] [Thread-1 (]: Began executing node seed.qv.sds_teste
�[0m13:38:29.553617 [debug] [Thread-1 (]: seed by upload...
�[0m13:38:29.566419 [debug] [Thread-1 (]: Opening a new connection, currently in state init
�[0m13:38:32.830681 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" does not exists - Ignoring
�[0m13:38:32.832678 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" does not exist - Ignoring
�[0m13:38:32.834710 [debug] [Thread-1 (]: Dropping relation via Glue and S3 APIs
�[0m13:38:34.321827 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" does not exists - Ignoring
�[0m13:38:34.323331 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" does not exist - Ignoring
�[0m13:38:35.891369 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" does not exist and will not be deleted, ignoring
�[0m13:38:35.893875 [debug] [Thread-1 (]: Using athena connection "seed.qv.sds_teste"
�[0m13:38:35.893875 [debug] [Thread-1 (]: On seed.qv.sds_teste: -- /* {"app": "dbt", "dbt_version": "1.8.7", "profile_name": "qv", "target_name": "dev", "node_id": "seed.qv.sds_teste"} */
create external table `db_dbt_validation`.`db_qvextracao__sds_teste__dbt_tmp` (`id` string, `some_text` string)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
location 's3://prd-lakehouse-copastur-analytics/analytics/env=prod/top=copastur/domain=analytics/dbt/db_dbt_validation/db_qvextracao__sds_teste__dbt_tmp'
tblproperties (
'skip.header.line.count'='1'
)
�[0m13:38:36.637497 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Athena query ID 87dea510-7789-40ce-849c-4d2a4d6428bf
�[0m13:38:38.371586 [debug] [Thread-1 (]: SQL status: OK -1 in 2.477 seconds
�[0m13:38:38.386232 [debug] [Thread-1 (]: Skip partitioning: False
�[0m13:38:38.387230 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: get_work_group for %s
�[0m13:38:39.758786 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: S3 path does not exist
�[0m13:38:39.759788 [debug] [Thread-1 (]: Using athena connection "seed.qv.sds_teste"
�[0m13:38:39.759788 [debug] [Thread-1 (]: On seed.qv.sds_teste: -- /* {"app": "dbt", "dbt_version": "1.8.7", "profile_name": "qv", "target_name": "dev", "node_id": "seed.qv.sds_teste"} */
create table "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste"
with (
table_type='hive',
is_external=true,
format='parquet'
)
as
select
cast(nullif("id", '') as integer) as "id"
,
cast(nullif("some_text", '') as varchar) as "some_text"
from
"awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp"
�[0m13:38:39.966615 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Athena query ID 3ff995b0-dbb9-42b2-ab05-90844aeb9925
�[0m13:38:42.972477 [debug] [Thread-1 (]: SQL status: OK 2 in 3.212 seconds
�[0m13:38:44.460039 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: table_name : 490445445844.db_dbt_validation.db_qvextracao__sds_teste__dbt_tmp
�[0m13:38:44.462029 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: table type: TableType.TABLE
�[0m13:38:44.462029 [debug] [Thread-1 (]: Dropping relation via Glue and S3 APIs
�[0m13:38:45.992330 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: table_name : 490445445844.db_dbt_validation.db_qvextracao__sds_teste__dbt_tmp
�[0m13:38:45.993334 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: table type: TableType.TABLE
�[0m13:38:45.994409 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp" is stored in s3://prd-lakehouse-copastur-analytics/analytics/env=prod/top=copastur/domain=analytics/dbt/db_dbt_validation/db_qvextracao__sds_teste__dbt_tmp
�[0m13:38:46.756532 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Deleting table data: path='s3://prd-lakehouse-copastur-analytics/analytics/env=prod/top=copastur/domain=analytics/dbt/db_dbt_validation/db_qvextracao__sds_teste__dbt_tmp', bucket='prd-lakehouse-copastur-analytics', prefix='analytics/env=prod/top=copastur/domain=analytics/dbt/db_dbt_validation/db_qvextracao__sds_teste__dbt_tmp/'
�[0m13:38:49.383198 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: Deleted table from glue catalog: "awsdatacatalog"."db_dbt_validation"."db_qvextracao__sds_teste__dbt_tmp"
Special characters worked for me when I tried with dbt-duckdb, so this might be specific to the dbt-athena-community adapter rather than dbt-core. So I'm going to transfer this issue to that repository instead.
Example
See below for my output when using dbt-duckdb.
Create this file:
seeds/my_seed.csv
id,some_text
1,ABC
2,Ã Á Í Ç
Run these commands:
dbt seed
dbt show --inline 'select * from {{ ref("my_seed") }}'
See this output:
| id | some_text || -- | --------- || 1 | ABC || 2 | Ã Á Í Ç |
Now that you said this, I tested the same case with the Databricks adapter and it works correctly! It really sounds like a problem with athena adapter.
It works on my side with dbt-athena on Windows.
Could you please try to add this parameter to .vscode/settings.json and open a new terminal in vscode ?
Hello @e-quili , thank you so much for collaboration! I tested this solution and it works!!
I will use this in my local environment for developments, but I still think that there is a bug, once that anothers adapters can identify the Encoding of the .csv file. What do you think?
Is this a new bug in dbt-core?
Current Behavior
I would like to start by apologizing if there is already a bug report on this subject, but I couldn't find it.
Lets go, I have a .csv file (sds_teste.csv) that I am using as a seed, like this:
So, I execute the command
dbt seed -s sds_teste
and the dbt is successful executedBut, when i execute a
select
to see the table created by dbt seed command, I can see that the table cannot read the special characters (accented letters)I already try some things that I found around de internet, like to pass the
encoding: utf-8
, but i not found nothing that working.My profiles.yml
Expected Behavior
The expected behavir is that the dbt seed would can read a .csv file in encoding utf-8.
Should be:
A text with special characters, like Ã, Á, Í, or Ç
Instead of:
A text with special characters, like �, �, �, or �
Steps To Reproduce
1 - Install the python 3.11.9 in a windows computer
2 - Create a python environment with
python venv
3 - Install
dbt-core==1.8.7
anddbt-athena-community==1.8.4
bypip install
4 - Create a dbt project
5 - Create a .csv file in the folder
seeds/
and write some example with special characters6 - Configure the profile.yml to connect a
AWS Athena
(storage:AWS S3
)7 - Run the
dbt seed
commandRelevant log output
Environment
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
No response
The text was updated successfully, but these errors were encountered: