Winse Blog

走走停停, 熙熙攘攘, 忙忙碌碌, 不知何畏.

Postgresql入门

简单介绍下软件的安装,配置。同时实践下从mysql迁移到postgres。

安装配置

这里直接使用rpm包来安装。如果是centos6.6以下版本的系统需要更新openssl。

使用YUM安装看https://wiki.postgresql.org/wiki/YUM_Installation

1
2
3
4
5
6
7
8
9
10
11
[root@hadoop-master1 postgres]# ll
total 20708
-rw-r--r-- 1 root root  1593932 Dec 11 10:02 openssl-1.0.1e-42.el6.x86_64.rpm
-rw-r--r-- 1 root root  1085208 Dec 11 09:12 postgresql94-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root   541376 Dec 11 09:12 postgresql94-contrib-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root  1600736 Dec 11 09:12 postgresql94-devel-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root 11485008 Dec 11 09:13 postgresql94-docs-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root   198968 Dec 11 09:12 postgresql94-libs-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root    60688 Dec 11 09:12 postgresql94-plperl-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root    68884 Dec 11 09:12 postgresql94-plpython-9.4.5-1PGDG.rhel6.x86_64.rpm
-rw-r--r-- 1 root root  4556880 Dec 11 09:11 postgresql94-server-9.4.5-1PGDG.rhel6.x86_64.rpm
  • 安装命令:
1
2
3
4
# yum install -y openssl-1.0.1e-42.el6.x86_64.rpm 

# useradd postgres
# rpm -i postgresql94-*
  • 配置环境变量、初始化数据库、启动数据库:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# su - postgres
$ vi .bash_profile

export PGDATA=/var/lib/pgsql/9.4/data
PG_HOME=/usr/pgsql-9.4
PATH=$PG_HOME/bin:$PATH
export PATH

$ initdb

$ vi $PGDATA/pg_hba.conf
  host    all             all              192.168.0.0/16          md5

$ vi /var/lib/pgsql/9.4/data/postgresql.conf
  listen_addresses = '*'

# 切回root

# service postgresql-9.4 start
# chkconfig postgresql-9.4 on --level 2345

pg_hba.conf用来控制什么用于可以被远程访问。而postgresql.conf修改的监听的地址,默认是localhost改成*后就可以所有地址都可以访问了。

  • 建立库,创建数据库用户
1
2
3
4
5
-bash-4.1$ psql 

 create user dpi;
 create database dpi owner dpi;
 alter user dpi with password 'XXXX';

建表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE t_dta_illegalweb (
...
  day varchar(10) DEFAULT NULL,
...
);

create or replace function t_dta_illegalweb_insert_trigger()
returns trigger as $$
begin
  return null;
end; 
$$ language plpgsql;

CREATE TRIGGER trigger_t_dta_illegalweb_insert
    BEFORE INSERT ON t_dta_illegalweb
    FOR EACH ROW EXECUTE PROCEDURE t_dta_illegalweb_insert_trigger();

后面会使用分区表,先把触发器都建好。把框框搭好,后面修改就行了。

数据迁移

  1. postgres创建表:
1
2
3
4
5
CREATE TABLE IF NOT EXISTS t_dta_illegalweb20151211 (check(day = '2015-12-11')) INHERITS (t_dta_illegalweb);
CREATE TABLE IF NOT EXISTS t_dta_illegalweb20151210 (check(day = '2015-12-10')) INHERITS (t_dta_illegalweb);
CREATE TABLE IF NOT EXISTS t_dta_illegalweb20151209 (check(day = '2015-12-09')) INHERITS (t_dta_illegalweb);
CREATE TABLE IF NOT EXISTS t_dta_illegalweb20151208 (check(day = '2015-12-08')) INHERITS (t_dta_illegalweb);
CREATE TABLE IF NOT EXISTS t_dta_illegalweb20151207 (check(day = '2015-12-07')) INHERITS (t_dta_illegalweb);
  1. mysql导出数据:
1
2
3
select * from t_dta_illegalweb where day='2015-12-09' into outfile '/tmp/etl/t_dta_illegalweb20151209.sql'  fields terminated by '|';
select * from t_dta_illegalweb where day='2015-12-08' into outfile '/tmp/etl/t_dta_illegalweb20151208.sql'  fields terminated by '|';
select * from t_dta_illegalweb where day='2015-12-07' into outfile '/tmp/etl/t_dta_illegalweb20151207.sql'  fields terminated by '|';

数据在mysql服务器的/tmp/etl目录下面。如果mysql和postgres不在同一台机,需要把这些文件拷贝到postgres的服务器。

  1. 导入数据到postgres:

用psql登录dpi,然后执行copy命令把数据导入到对应的表。

1
2
3
\copy  t_dta_illegalweb20151209 from  '/tmp/etl/t_dta_illegalweb20151209.sql' using delimiters '|' ;
\copy  t_dta_illegalweb20151208 from  '/tmp/etl/t_dta_illegalweb20151208.sql' using delimiters '|' ;
\copy  t_dta_illegalweb20151207 from  '/tmp/etl/t_dta_illegalweb20151207.sql' using delimiters '|' ;

程序修改

程序修改是一件头痛的事情,虽然大部分都是SQL,但是MYSQL的比较宽泛,很多语句都兼容不报错也能出来想要的结果。但是这些语句在postgres下面执行是会报错的。比如说,select count(*)对所有数据count的时刻不能加order by(提示要groupby);再比如,mysql遇到字符串字段和数字比较会统一转换成数字比较,等等这些在postgres中都需要在SQL中显示的转换的。

那么postgres的类型转换怎么实现呢?两种形式cast(X as TYPE) 或者 X::TYPE。

由于程序是用hibernate来做数据库访问的,会遇到如下的问题

  • 如果用hql的话CAST函数hibernate首先会进行转换。(转换类型与hibernate对象的类型不匹配)
  • 而用X::TYPE会把:TYPE作为一个name parameter。
  • 不用hql用sql的话,要自己做对象转换,这是我们不愿意去做的事情(不然用hibernate干嘛)

各种尝试过后,修改PostgreSQLDialect来实现,添加一个自定义的hibernate函数,把字符串转成bigint即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import java.sql.Types;

import org.hibernate.Hibernate;
import org.hibernate.dialect.function.SQLFunctionTemplate;


public class PostgreSQLDialect extends org.hibernate.dialect.PostgreSQLDialect {
  
  public PostgreSQLDialect() {
      super();
      registerFunction( "bigint", new SQLFunctionTemplate(Hibernate.BIG_INTEGER, "cast(?1 as bigint)") );
  }

}

使用如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
StringBuilder hql = new StringBuilder("from IllegalWebInfo where 1=1 ");
List<Object> params = new ArrayList<>();

String domain = queryBean.getDomain();
if (StringUtils.isNotBlank(domain)) {
  hql.append(" and ").append("domain=?");
  params.add(domain.toLowerCase());
}
String houseId = queryBean.getHouseId();
if (StringUtils.isNotBlank(houseId)) {
  hql.append(" and ").append("houseId=?");
  params.add(houseId);
}
String day = queryBean.getDay();
if (StringUtils.isNotBlank(day)) {
  hql.append(" and ").append("day=?");
  params.add(day);
}
int threshold = queryBean.getThreshold();
if(threshold > 0){
  hql.append(" and ").append("bigint(visitsCount) >= ?");
  params.add(BigInteger.valueOf(threshold)); // 注意这里的类型转换,把int装成bigint
}

Object[] paramArray = params.toArray();
String detailHQL = hql.toString(); // + " order by bigint(visitsCount) desc ";
List<ActiveResourcesDomainInfo> hist = activeResourcesDomainDao.findPageable(detailHQL, currentPage, pageSize, paramArray);

String countHQL = "select count(*) " + hql;
long count = (long) illegalWebDao.findByHql(countHQL, paramArray).iterator().next();

定时任务,创建和更新触发器函数

函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
create or replace function create_partition_table_everyday (t TEXT) returns timestamp as $$
declare 
  i int;
  cnt int;
  stmt text;
  select_stmt text;
  day date;
  isInherit BOOLEAN;
begin

  day := now() + interval '-1 day';
  stmt := 'CREATE TABLE IF NOT EXISTS ' || t || to_char(day, 'YYYYMMDD') || '(check(day = ''' || to_char(day, 'YYYY-MM-DD') || ''')) INHERITS (' || t || ')';
  RAISE INFO '[DEBUG] %', stmt;
  EXECUTE stmt;

  day := now() + interval '-183 day';
  stmt := 'DROP TABLE IF EXISTS ' || t || to_char(day, 'YYYYMMDD');
  RAISE INFO '[DEBUG] %', stmt;
  EXECUTE stmt;

    -- try-catch
BEGIN
  day := now() + interval '-32 day';
  stmt := 'ALTER TABLE IF EXISTS ' || t || to_char(day, 'YYYYMMDD') || ' NO INHERIT ' || t;
  RAISE INFO '[DEBUG] %', stmt;
  EXECUTE stmt;
EXCEPTION WHEN OTHERS THEN
  RAISE INFO '[WARN] % %', SQLERRM, SQLSTATE;
END;

  i := 0;
  cnt := 6; -- 用于生成触发器分发最近几天的insert功能

  day := now() + interval '-1 day';
  stmt :=         ' create or replace function ' || t || '_insert_trigger() returns trigger as $' || '$ ';
  stmt := stmt || ' begin ';
  stmt := stmt || ' if (new.day = ''' || to_char(day, 'YYYY-MM-DD') || ''') then INSERT INTO ' || t || to_char(day, 'YYYYMMDD') || ' VALUES (new.*); ';
  while i < cnt 
  loop
      day := day + interval '-1 day';
      stmt := stmt || ' elsif (new.day = ''' || to_char(day, 'YYYY-MM-DD') || ''') then INSERT INTO ' || t || to_char(day, 'YYYYMMDD') || ' VALUES (new.*); ';

      i := i + 1;
  end loop;
  stmt := stmt || ' else raise exception ''DATE out of range. Fix the ' || t || '_insert_trigger() func!!''; ';
  stmt := stmt || ' end if; ';
  stmt := stmt || ' return null; ';
  stmt := stmt || ' end;  ';
  stmt := stmt || ' $' || '$ language plpgsql; ';
  RAISE INFO '[DEBUG] %', stmt;
  EXECUTE stmt;

  return now();
end;
$$ language plpgsql;

脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

vi update_dta_postgres.sh

#!/bin/sh

source ~/.bash_profile

psql -d dpi -c "select create_partition_table_everyday('t_dta_illegalweb')"
psql -d dpi -c "select create_partition_table_everyday('t_dta_activeresources_domain')"
psql -d dpi -c "select create_partition_table_everyday('t_dta_activeresources_ip')"

$ 
chmod +x update_dta_postgres.sh 
crontab -e


10 0 * * * sh ~/scripts/update_dta_postgres.sh >~/scripts/update_dta_postgres.log 2>&1

参考

–END

Comments